Various analysis tools use the LLVM bitcode representation of the source. Compiling a single source file into bitcode is straightforward.
clang -S -emit-llvm <file.c> to get the readable bitcode.
clang -c -emit-llvm <file.c> to get the binary bitcode.
However, compiling large projects like the Linux kernel into bitcode is not as straightforward.
At a high level, each
*.c source file is compiled into an object file. Object
.o in a sub-system are linked together to into an intermediate
built-in.a file. Finally, all the
built-in.a files are linked together to
get the final kernel image
I worked as a Research Intern at the Computer Science Laboratory at SRI International this 2022 Summer.
The main research objective we started of with was:
How do we protect the integrity of open-source software projects from malicious actors and influence operations within the community?
The motivation for this research comes from the fact that open-source software has become a critical part of our infrastructure. And we have seen multiple attacks on open-source projects that have resulted in supply chain attacks and other security incidents downstream. With this larger goal in mind, we first tried to tackle a smaller problem:Continue Reading →