Many security analysis tasks require understanding the high-level structure of a binary program in terms of both its control-flow and the data it operates on. To facilitate the automatic reverse engineering of such structure, we have introduced a new program representation, a hybrid information- and control-flow graph (HI-CFG). Our research explores algorithms to infer a HI-CFG from an instruction-level trace, without requiring source-level information or static analysis.
As an example application of the HI-CFG, we have considered the task of generalizing an attack against a program whose inputs undergo complex transformations before being used in vulnerable functionality. We exploit the information-flow structure shown in the HI-CFG, and its mapping to the binary code, to find the parts of the program that implement each transformation. Then we use this transformation structure to efficiently generate new attack inputs under a user-specified combination of transformations. This use of transformation structure allows a symbolic execution approach to scale to applications that are beyond the reach of monolithic symbolic execution.
By automatically generating attacks under a variety of transformations, we achieve a strong degree of attack polymorphism that demonstrates the insufficiency of any filter that does not support all the same transformations as the vulnerable application. In case studies, we show this attack capability against two large open-source applications, a PDF viewer and a word processor.