Various analysis tools use the LLVM bitcode representation of the source. Compiling a single source file into bitcode is straightforward.

  1. clang -S -emit-llvm <file.c> to get the readable bitcode.
  2. clang -c -emit-llvm <file.c> to get the binary bitcode.

However, compiling large projects like the Linux kernel into bitcode is not as straightforward.

At a high level, each *.c source file is compiled into an object file. Object files .o in a sub-system are linked together to into an intermediate built-in.a file. Finally, all the built-in.a files are linked together to get the final kernel image vmlinux.

Continue Reading →

I worked as a Research Intern at the Computer Science Laboratory at SRI International this 2022 Summer.

The main research objective we started of with was:

How do we protect the integrity of open-source software projects from malicious actors and influence operations within the community?

The motivation for this research comes from the fact that open-source software has become a critical part of our infrastructure. And we have seen multiple attacks on open-source projects that have resulted in supply chain attacks and other security incidents downstream. With this larger goal in mind, we first tried to tackle a smaller problem:

Continue Reading →