Many network protocols and services have several different implementations. Due to coding errors and protocol specification ambiguities, these implementations often contain deviations, i.e., differences in how they check and process some of their inputs. Automatically identifying deviations can enable: 1) the automatic detection of potential implementation errors and 2) the automatic generation of fingerprints that allow to distinguish among implementations of the same network service. The main challenge in this project is to automatically find deviations without requiring access to source code.
In this project, we have designed and implemented an approach that automatically identifies deviations using the binaries of two implementations of the same protocol specification, such as two HTTP severs or two NTP servers. Our approach is based on comparing how two implementations of the same protocol process the same input. We achieve this by building and solving symbolic formulas that characterize how different implementations process an input. The process consists of three major steps:
Our approach is precisely faithful to an implementation by automatically building symbolic formulas from it. It works on binaries directly without access to source code, and thus can be applied widely. Finally, our approach significantly reduces the number of inputs needed to find deviations.