Protocol reverse engineering, the process of extracting the application-level
protocol used by an implementation without access to the protocol specification,
has become increasingly important for network security. Knowledge of
application-level protocol format is essential for many network security
applications, such as vulnerability discovery, intrusion detection systems,
protocol analyzers for network monitoring and signature-based filtering,
fingerprint generation, application dialogue replay, detecting services running
on non-standard ports, and mapping traffic to applications. Many protocols in
use, especially on the enterprise network, are closed protocols (i.e., no
publicly available protocol specification). Even for protocols with a publicly
available specification, certain implementations may not exactly follow the
Protocol reverse engineering aims to extract the application-level protocol used by an implementation, without access to the protocol specification. Thus, protocol reverse engineering is an invaluable tool for the above network security applications. Currently, protocol reverse engineering is mostly a painstaking manual task. Attempts to reverse engineer closed protocols such as the MSN Messenger and Samba protocols from Microsoft, the Yahoo Messenger protocol, or the OSCAR and ICQ protocols from AOL, have all been long term efforts lasting many years. In addition, protocol reverse engineering is not a once-and-done effort, since existing protocols are often extended to support new functionality. Thus, to successfully reverse engineer a protocol in a timely manner and keep up the effort through time, we need automatic methods.
With Polyglot, we propose the first binary analysis approach for automatic protocol reverse engineering. Previous work on automatic protocol reverse engineering extracts protocol information purely from network traces. Instead, Polyglot leverages the availability of a program binary implementing the protocol. Polyglot uses a unique intuition, the way that an implementation of the protocol processes the received application data reveals a wealth of information about the format of a received message. Compared to network traces, which only contain syntactic information, program binaries also contain semantic information about how the program processes and operates on the protocol data. In addition, they are the main source of information about the implementation of a protocol.
Dispatcher is our latest automatic protocol reverse engineering tool and superseeds Polyglot. It incorporates the techniques from Polyglot and also adds three important contributions. First, it implements buffer deconstruction, a novel technique to extract the format of messages being sent by the application implementing the protocol, when Polyglot only extracted the format of messages received by the application. Second, Dispatcher also extracts semantic information on the fields that comprise the received and sent messages. For example, it can identify if the field is a timestamp, a filename, or an IP address. Finally, Dispatcher is able to reverse engineer encrypted protocols by identifying the buffers holding the unencrypted received message after it has been decrypted by the application, and the buffers holding the unencrypted message about to be sent before it is encrypted by the application. Then, it applies the automatic protocol reverse engineering techniques on those buffers.
The ability to accurately replay application protocol
dialogs is useful in many security-oriented applications,
such as replaying an exploit for forensic analysis or
demonstrating an exploit to a third party.
A central challenge in application dialog replay is that the dialog intended for the original host will likely not be accepted by another without modification. For example, the dialog may include or rely on state specific to the original host such as its hostname, a known cookie,
With Replayer, we aim to develop an automatic tool for application protocol dialog replay. As a first step, we formally define the replay problem and create the firstsound solution to the replay problem: replay succeeds whenever our approach yields an answer.