CONSTRUCT: Reconstructing Control Algorithms from Embedded System Binaries
In the summer of 2021, I interned at the Intelligent Systems Laboratory at PARC, where I worked on a critical security challenge in cyber-physical systems:
Research Objective
How can we reconstruct control algorithms embedded in cyber-physical systems (CPS) to detect and remove hidden backdoors?
CPS rely on firmware — often distributed as opaque binaries — that control safety-critical operations. If compromised, these systems pose real risks. Our project, CONSTRUCT, uses program synthesis to automatically generate transparent, mathematical models of these control algorithms for both safety validation and forensic investigation oai_citation:0‡arXiv.
Approach
Our pipeline spans four core steps:
Binary Decompilation. We decompiled firmware contained in a Functional Mock-up Unit (FMU) using Ghidra, extracting symbolic variable names and their attributes from accompanying description files.
Isolating Mathematical Primitives. Through static analysis and a rule-based engine, we identified core mathematical operations (e.g., integrators, gains) amid decompiled code structures.
AST Generation & Model Translation. We converted code-level ASTs of control primitives into Modelica syntax ASTs — structurally mapping firmware logic into algebraic form.
Model Synthesis via Genetic Algorithm (GA). To map symbolic variables to the correct I/O names, we developed a correct-by-construction (CbC) GA. Unlike baseline stochastic methods (correct-by-test, CbT), our approach ensures each candidate model remains syntactically and semantically valid in Modelica — significantly enhancing both efficiency and accuracy.
Evaluation
We tested CONSTRUCT on three types of control algorithms—PI, PID, and LimPID—embedded in two real-world CPS: a Turtlebot Waffle Pi robot and a PX4 Quadcopter oai_citation:5‡arXiv.
Key results:
- The CbC approach consistently outperformed baseline CbT, achieving much lower mean squared error (MSE) compared to the true firmware outputs.
- In complex cases (PID, LimPID), CbT failed to produce any valid Modelica models, while CbC generated accurate, valid models with low error.
My Contributions
I focused on:
- Building decompilation workflows using Ghidra and FMU metadata.
- Developing static analysis to detect mathematical primitives.
- Designing AST transformations to translate code into Modelica.
- Exploring and implementing the GA-based model synthesis pipeline.
Takeaways
- Automation matters: CONSTRUCT streamlines a process that traditionally requires expert reverse-engineering, saving time and reducing error.
- Accuracy and scalability: Our correct-by-construction GA delivers valid models even for complex control logic.
- Practical impact: The synthesized models enable expert validation, security auditing, and forensic analysis of embedded controllers.
For the full methodology and results, check out the arXiv paper:
CONSTRUCT: A Program Synthesis Approach for Reconstructing Control Algorithms from Embedded System Binaries in Cyber-Physical Systems.