Protocol Reverse Engineering and Application Dialogue Replay

Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation without access to the protocol specification, has become increasingly important for network security. Knowledge of application-level protocol format is essential for many network security applications, such as vulnerability discovery, intrusion detection systems, protocol analyzers for network monitoring and signature-based filtering, fingerprint generation, application dialogue replay, detecting services running on non-standard ports, and mapping traffic to applications. Many protocols in use, especially on the enterprise network, are closed protocols (i.e., no publicly available protocol specification). Even for protocols with a publicly available specification, certain implementations may not exactly follow the specification.

Protocol reverse engineering aims to extract the application-level protocol used by an implementation, without access to the protocol specification. Thus, protocol reverse engineering is an invaluable tool for the above network security applications. Currently, protocol reverse engineering is mostly a painstaking manual task. Attempts to reverse engineer closed protocols such as the MSN Messenger and Samba protocols from Microsoft, the Yahoo Messenger protocol, or the OSCAR and ICQ protocols from AOL, have all been long term efforts lasting many years. In addition, protocol reverse engineering is not a once-and-done effort, since existing protocols are often extended to support new functionality. Thus, to successfully reverse engineer a protocol in a timely manner and keep up the effort through time, we need automatic methods.

With Polyglot, we propose the first binary analysis approach for automatic protocol reverse engineering. Previous work on automatic protocol reverse engineering extracts protocol information purely from network traces. Instead, Polyglot leverages the availability of a program binary implementing the protocol. Polyglot uses a unique intuition, the way that an implementation of the protocol processes the received application data reveals a wealth of information about the format of a received message. Compared to network traces, which only contain syntactic information, program binaries also contain semantic information about how the program processes and operates on the protocol data. In addition, they are the main source of information about the implementation of a protocol.

Dispatcher is our latest automatic protocol reverse engineering tool and superseeds Polyglot. It incorporates the techniques from Polyglot and also adds three important contributions. First, it implements buffer deconstruction, a novel technique to extract the format of messages being sent by the application implementing the protocol, when Polyglot only extracted the format of messages received by the application. Second, Dispatcher also extracts semantic information on the fields that comprise the received and sent messages. For example, it can identify if the field is a timestamp, a filename, or an IP address. Finally, Dispatcher is able to reverse engineer encrypted protocols by identifying the buffers holding the unencrypted received message after it has been decrypted by the application, and the buffers holding the unencrypted message about to be sent before it is encrypted by the application. Then, it applies the automatic protocol reverse engineering techniques on those buffers.

Replayer

The ability to accurately replay application protocol dialogs is useful in many security-oriented applications, such as replaying an exploit for forensic analysis or demonstrating an exploit to a third party.
A central challenge in application dialog replay is that the dialog intended for the original host will likely not be accepted by another without modification. For example, the dialog may include or rely on state specific to the original host such as its hostname, a known cookie, etc. In such cases, a straight-forward byte-by-byte replay to a different host with a different state (e.g., different hostname) than the original observed dialog participant will likely fail. These state-dependent protocol fields must be updated to reflect the different state of the different host for replay to succeed.

With Replayer, we aim to develop an automatic tool for application protocol dialog replay. As a first step, we formally define the replay problem and create the first sound solution to the replay problem: replay succeeds whenever our approach yields an answer.
To achieve this goal, Replayer is based on binary analysis, making a novel use of program verification techniques such as theorem proving and weakest pre-condition.

Publications

Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering.: Juan Caballero, Pongsin Poosankam, Christian Kreibich, and Dawn Song. In Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS), November 2009.

Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis.: Juan Caballero, Heng Yin, Zhenkai Liang, and Dawn Song. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS), October 2007.

Replayer: Automatic Protocol Replay by Binary Analysis.: James Newsome, David Brumley, Jason Franklin, and Dawn Song. In Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS), October 2006.

Polyglot and Dispatcher

Replayer

Publications