Whole Kernel Analysis

Context

  • The Linux kernel
  • Building the kernel
  • Intermediate Representation
  • [Whole Program Bitcode]
  • Analysis Tools
  • Static Analysis
  • Program Representation
  • Alias Analysis
    • Data-flow Analysis
    • Type-based Analysis

Various analysis tools use the LLVM bitcode representation of the source. Compiling a single source file into bitcode is straightforward.

  1. clang -S -emit-llvm <file.c> to get the readable bitcode.
  2. clang -c -emit-llvm <file.c> to get the binary bitcode.

However, compiling large projects like the Linux kernel into bitcode is not as straightforward.

At a high level, each *.c source file is compiled into an object file. Object files .o in a sub-system are linked together to into an intermediate built-in.a file. Finally, all the built-in.a files are linked together to get the final kernel image vmlinux.

To get the bitcode representation of the whole kernel, each source is compiled to bitcode and then linked in the same order the kernel build system does. One way to do it us a build system instrumentation tool like rizsotto/Bear to capture the build commands and execute the modified command or use a tool like my personal favorite trailofbits/blight which instruments the build system with pre and post build hooks to capture the build commands and modify them on the fly.

There are tools that already do something similar like travitch/whole-program-llvm and SRI-CSL/gllvm and work for all large projects not just the kernel. But these tools did not work for the kernel for reasons I'll have to investigate later.

For now, I use a simple python script not that generates the whole program bitcode for the linux kernel. The script is available at akshithg/whole-kernel-bitcode


andersens pointer analysis

StatementInclusion Constraint
a = bPointsToSet(b) ⊆ PointsToSet(a)
a = &bb ∈ PointsToSet(a)
a = *b∀c ∈ PointsToSet(b): PointsToSet(c) ⊆ PointsToSet(a)
*a = b∀c ∈ PointsToSet(a): PointsToSet(b) ⊆ PointsToSet(c)
InputsDescription
Aa set of n pointers
Sa set of m statements

m ≤ 4*n^2 unique statements

OutputDescription
ExhaustiveReturn PointsToSet(a) for all pointers a ∈ A
On-demandReturn if b ∈ PointsToSet(a) for a specific pair a, b ∈ A

  1. [POPL 2021] The Fine-Grained and Parallel Complexity of Andersen's Pointer Analysis
Last updated: February 12, 2023