Protocol reverse engineering, the process of extracting the application-level
protocol used by an implementation without access to the protocol specification,
has become increasingly important for network security. Knowledge of
application-level protocol format is essential for many network security
applications, such as vulnerability discovery, intrusion detection systems,
protocol analyzers for network monitoring and signature-based filtering,
fingerprint generation, application dialogue replay, detecting services running
on non-standard ports, and mapping traffic to applications. Many protocols in
use, especially on the enterprise network, are closed protocols (i.e., no
publicly available protocol specification). Even for protocols with a publicly
available specification, certain implementations may not exactly follow the
specification.
Protocol reverse engineering aims to extract the
application-level protocol used by an implementation, without access to the
protocol specification. Thus, protocol reverse engineering is an invaluable tool
for the above network security applications. Currently, protocol reverse
engineering is mostly a painstaking manual task. Attempts to reverse engineer
closed protocols such as the MSN Messenger and Samba protocols from Microsoft,
the Yahoo Messenger protocol, or the OSCAR and ICQ protocols from AOL, have all
been long term efforts lasting many years. In addition, protocol reverse
engineering is not a once-and-done effort, since existing protocols are often
extended to support new functionality. Thus, to successfully reverse engineer a
protocol in a timely manner and keep up the effort through time, we need
automatic methods.
With Polyglot, we propose the first binary
analysis approach for automatic protocol reverse engineering.
Previous work on automatic protocol reverse engineering extracts protocol
information purely from network traces.
Instead, Polyglot leverages the availability of a program binary
implementing the protocol.
Polyglot uses a unique intuition, the way that an implementation of
the protocol processes the received application data reveals a wealth of
information about the format of a received message.
Compared to network traces, which only contain syntactic information, program
binaries also contain semantic information about how the
program processes and operates on the protocol data. In addition,
they are the main source of information about the implementation of a protocol.
Dispatcher is our latest automatic protocol reverse engineering
tool and superseeds Polyglot.
It incorporates the techniques from Polyglot and also adds three important
contributions.
First, it implements buffer deconstruction, a novel technique to extract the
format of messages being sent by the application implementing the protocol,
when Polyglot only extracted the format of messages received by the application.
Second, Dispatcher also extracts semantic information on the fields that
comprise the received and sent messages.
For example, it can identify if the field is a timestamp, a filename, or
an IP address.
Finally, Dispatcher is able to reverse engineer encrypted protocols by
identifying the buffers holding the unencrypted received message after
it has been decrypted by the application, and the buffers holding the
unencrypted message about to be sent before it is encrypted by the application.
Then, it applies the automatic protocol reverse engineering techniques on
those buffers.
The ability to accurately replay application protocol
dialogs is useful in many security-oriented applications,
such as replaying an exploit for forensic analysis or
demonstrating an exploit to a third party.
A central challenge in application dialog replay is that the dialog
intended for the original host will likely not be accepted
by another without modification. For example, the dialog may
include or rely on state specific to the original host such as
its hostname, a known cookie,
With Replayer, we aim to develop an automatic tool for application protocol dialog replay. As a first step, we formally define the replay problem and create the first
sound solution to the replay problem: replay succeeds whenever our approach yields an answer.