Automatic Discovery of Deviations in Binary Implementations
[Overview] [Publications] [Back to BitBlaze]


Many network protocols and services have several different implementations. Due to coding errors and protocol specification ambiguities, these implementations often contain deviations, i.e., differences in how they check and process some of their inputs. Automatically identifying deviations can enable:  1) the automatic detection of potential implementation errors and 2) the automatic generation of fingerprints that allow to distinguish among implementations of the same network service. The main challenge in this project is to automatically find deviations without requiring access to source code.

In this project, we have designed and implemented an approach that automatically identifies deviations using the binaries of two implementations of the same protocol specification, such as two HTTP severs or two NTP servers. Our approach is based on comparing how two implementations of the same protocol process the same input. We achieve this by building and solving symbolic formulas that characterize how different implementations process an input. The process consists of three major steps:

  1. Extracting Symbolic Formulas: First we send the same input to each implementation and monitor the processing of that input by each implementation. A symbolic formula is obtained for each implementation. Each symbolic formula summarizes all the instructions executed that affect the received input.  The symbolic formulas are obtained by using the weakest precondition technique and describe how an implementation processes the input.
  2. Solving the Symbolic Formulas to generate Candidate Inputs: Given the symbolic formulas for both implementations, F and G, we use a solver (e.g. a theorem prover) to obtain inputs that inputs that satisfy one of the formulas and do not satisfy the other. These candidate inputs, can potentially drive two implementations into different states.
  3. Validating the Candidate Inputs: Finally, the candidate inputs are sent to implementations to test whether they do trigger a deviation.

Our approach is precisely faithful to an implementation by automatically building symbolic formulas from it. It works on binaries directly without access to source code, and thus can be applied widely. Finally, our approach significantly reduces the number of inputs needed to find deviations.


Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation
David Brumley, Juan Caballero, Zhenkai Liang, James Newsome, and Dawn Song. In Proceedings of 16th USENIX Security Symposium, Aug. 2007.

Back to BitBlaze