Various analysis tools use the LLVM bitcode representation of the source. Compiling a single source file into bitcode is straightforward.
clang -S -emit-llvm <file.c>to get the readable bitcode.
clang -c -emit-llvm <file.c>to get the binary bitcode.
However, compiling large projects like the Linux kernel into bitcode is not as straightforward.
At a high level, each
*.c source file is compiled into an object file. Object
.o in a sub-system are linked together to into an intermediate
built-in.a file. Finally, all the
built-in.a files are linked together to
get the final kernel image
To get the bitcode representation of the whole kernel, each source is compiled to bitcode and then linked in the same order the kernel build system does. One way to do it us a build system instrumentation tool like rizsotto/Bear to capture the build commands and execute the modified command or use a tool like my personal favorite trailofbits/blight which instruments the build system with pre and post build hooks to capture the build commands and modify them on the fly.
There are tools that already do something similar like travitch/whole-program-llvm and SRI-CSL/gllvm and work for all large projects not just the kernel. But these tools did not work for the kernel for reasons I'll have to investigate later.
For now, I use a simple python script not that generates the whole program bitcode for the linux kernel. The script is available at akshithg/whole-kernel-bitcode