BitBlaze: Binary Analysis for Computer Security
[Research Statement and Overview]
[Software Release]
[Current
Projects] [Publications]
[News and Press]
[Members]
[Job Openings]
[Contact]
Binary analysis is imperative for protecting COTS (common
off-the-shelf) programs and analyzing and defending against the myriad
of malicious code, where source code is unavailable, and the binary
may even be obfuscated. Also, binary analysis provides the ground
truth about program behavior since computers execute binaries
(executables), not source code. However, binary analysis is
challenging due to the lack of higher-level semantics.
Many higher level
techniques are often inadequate for analyzing even benign binaries,
let alone potentially malicious binaries.
Thus, we need to develop tools and techniques which work at the
binary level, can be used for analyzing COTS software, as well as malicious binaries.
The BitBlaze project aims to design and develop a powerful
binary analysis platform and employ the platform in order to (1) analyze and develop novel
COTS protection and diagnostic mechanisms and (2) analyze,
understand, and develop defenses against malicious code. The
BitBlaze project also strives to open new application areas of
binary analysis, which provides sound and effective solutions to
applications beyond software security and malicious code defense,
such as protocol reverse engineering and fingerprint generation.
The BitBlaze project consists of two central research directions: (1)
the design and development of the underlying BitBlaze Binary Analysis
Platform, and (2) applying the BitBlaze Binary Analysis Platform to
real security problems. The two research focii drive each other: as
new security problems arise, we develop new analysis
techniques. Similarly, we develop new analysis techniques in order to
better or more efficiently solve known problems. Below, we give an
overview of the two research directions.
Here is an overview paper of the BitBlaze project.
Some of our tools are also available under an open-source license.
The BitBlaze
Binary Analysis Platform
The underlying BitBlaze Binary Analysis
Platform features a novel fusion of static and dynamic analysis
techniques, dynamic symbolic execution, and whole-system
emulation and binary instrumentation. The BitBlaze platform has
different components for each task: Vine, TEMU, and
Rudder. The three components in tandem provide the power for
effective analysis of real-world binary programs for various
applications.
- Vine, the static analysis
component.
Open source release available now.
Vine provides an an intermediate language for
assembly (ILA), and an infrastructure for analyzing programs
written in this language. ILA is a full language in
which programs can be written, type-checked, then compiled
down to assembly. We also provide analysis on the
ILA, such as abstract interpretation, dependency analysis, and
logical analysis via interfaces with theorem provers.
- TEMU, the dynamic analysis
component.
Open source release available now.
TEMU provides a dynamic analysis environment
through whole-system emulation and dynamic binary
instrumentation. TEMU is OS-aware (i.e., it understands
OS-level semantics) and enables various fine-grained
dynamic analysis to build upon, such as dynamic taint
analysis and fine-grained behavioral analysis.
- Rudder, the component for online
dynamic symbolic execution. Rudder is an engine for
online dynamic execution on binaries. At a
high level, with a specified set of input sources of
interest, Rudder can automatically explore different
execution paths in a program determined by the input
sources. It will automatically build logical formulas
representing the constraints on the chosen input to take the
followed paths.
Release Information:
We are now making some key parts of the BitBlaze Binary
Analysis Platform available under open-source licenses.
See a separate page for more
information.
In conjunction with our BlackHat 2010 presentation, we have also
made a demonstration binary
release of some tools for
trace-based crash analysis.
BitBlaze in Action: Security Applications
Using the BitBlaze Binary Analysis Platform, we have enabled new approaches and solutions to a suite of different security problems. These results demonstrate the utility and effectiveness of the BitBlaze approach and vision---binary analysis enables fundamentally new approaches to a broad spectrum of different security problems, often solving problems at their root cause; the underlying BitBlaze Binary Analysis Platform is extensible and powerful for a broad spectrum of different security applications.
In particular, we show below three classes of security applications: (1) vulnerability detection, diagnosis, and defense; (2) automatic in-depth malware analysis and defense; (3) automatic model extraction and analysis.
-
-
Hybrid Information- and Control-Flow Graph (HI-CFG)
Many security analysis tasks require understanding the high-level
structure of a binary program in terms of both its control-flow and
the data it operates on. To facilitate the automatic reverse
engineering of such structure, we have introduced a new program
representation, a hybrid information- and
control-flow graph (HI-CFG). Our research explores algorithms to
infer a HI-CFG from an instruction-level trace, without requiring
source-level information or static analysis.
-
Identifying Causal Execution Differences for Security Applications
A security analyst often needs to understand
two runs of the same program that exhibit a difference in
program state or output. This is important, for example, for
vulnerability analysis, as well as for analyzing a malware
program that features different behaviors when run in different
environments.
Differential Slicing is
an automatic slicing technique for the analysis of such
execution differences. The causal difference graph
it outputs captures the input differences that triggered
the observed difference and the causal path of differences
that led from those input differences to the observed difference.
-
Automatic Defense System against Zero-day Exploits and Worms
Worms such as CodeRed and SQL Slammer can compromise
millions of hosts within hours or even minutes and have
caused billions of dollars in estimated damage. How can
we design and develop effective defense mechanisms
against such fast, large scale worm attacks?
Sting is an automatic
worm defense system which proposes a suite of novel
techniques to automatically detect new exploits, perform
in-depth diagnosis, and generate effective anti-bodies
(vulnerability signatures and hardened binaries) to
protect vulnerable hosts and networks from further
attacks.
-
Automatic Patch-based Exploit Generation
Security patches are supposed to fix vulnerabilities in
programs. But what are the security implications of a
security patch?
In this work, we propose new
techniques and demonstrate that one could automatically
generate exploits from the patch binary and the original
vulnerable program binary and sometimes in minutes of time.
-
Loop-extended Symbolic Execution: Buffer Overflow Diagnosis and Discovery
Loop-extended symbolic
execution (or LESE) is a new technique that
generalizes the results of previous dynamic symbolic
execution techniques, which broadens the results with
effects of loops. LESE is a key enabler for powerful
automated discovery of security vulnerabilities, especially
buffer-overflows, which is highly inefficient with pure
symbolic/concrete execution. It also enables deeper
diagnosis of known vulnerabilities, which allows automated
signature generation tools to reason about variable-length
input or repeated elements in the input.
-
Measuring Quantitative Influence
Dynamic taint analysis is a fundamental tool for detecting
overwrite attacks, but it is limited to an all-or-nothing
distinction as to whether values are under the control of an
attacker, and suffers from both false-positive and
false-negative errors.
We propose quantitative
influence to more precisely characterize the degree of
control an attacker has over a value. A specialization of
the concept of channel capacity from information theory, we
show that quantitative influence can be computed precisely
using a decision procedure. Quantitative influence
accurately distinguishes real attacks from false positives
among warnings generated by a dynamic taint analysis tool on
vulnerable binary servers.
-
Statically-Directed Dynamic Automated Test Generation
Static analysis, dynamic analysis, and symbolic execution
have complementary strengths for exploring the space of
program executions, but on its own each has significant
limitations. How can we combine them to leverage the best
features of all three?
Our work on statically-directed
dynamic automated test generation explores a three-stage
process. It first performs dynamic analysis to build a
control-flow model, then performs static analysis to search
for potential vulnerabilities, and finally uses dynamic
symbolic execution to prove that warnings are true positives
by finding concrete test cases for them. In an evaluation on
a suite of buffer-overflow benchmarks extracted from real
applications, the results of the first two phases allowed
symbolic execution to trigger vulnerabilities it otherwise
could not, including all but one of the benchmarks.
-
-
Detection and
Analysis of Privacy-Breaching Malware
A myriad of malware such as keyloggers, Browser-helper
Objects (BHO) based spyware, rootkits, backdoors, accesses
and leaks users' sensitive information and breaches
users' privacy. Can we have a unified approach to
identify such privacy-breaching malware despite their
widely-varied appearance?
Panorama proposes a
unified approach to detect privacy-breaching malware
using whole-system dynamic taint analysis.
-
Hidden Code Extraction from Packed Executables
Code packing is one technique commonly used to hinder malware
code analysis through reverse engineering. Even though this problem
has been previously researched, the existing solutions are
either unable to handle novel samples, or vulnerable to various
evasion techniques.
Renovo
proposes a fully dynamic approach for hidden code extraction,
capturing an intrinsic nature of hidden code execution.
-
Detection and Analysis of Malware Hooking Behaviors
One important malware attacking vector is its hooking mechanism. Malicious programs implant hooks for many different purposes. Spyware may implant hooks to get notified of the arrival of new sensitive data. Rootkits may implant
hooks to intercept and tamper with critical system information to
conceal their presence in the system. A stealth backdoor may also place hooks on
the network stack to establish a stealthy communication channel with remote attackers.
HookFinder
proposes fine-grained impact analysis to automatically detect and analyze malware's hooking behaviors. Since this technique captures the intrinsic nature of hooking behaviors, it is well suited for identifying new hooking mechanisms.
-
Automatic Malware
Dissection and Trigger-based Behavior
Analysis
Malware often has embedded behavior which is only
exhibited when certain conditions are met. Such
trigger-based behavior includes time bombs, logic bombs,
and botnets programs which reacts to commands. Static
analysis of malware often provides little utility due to
code packing and obfuscation. Vanilla dynamic analysis
can only provides limited view since the trigger
conditions are usually not met. How can we design
automatic analysis methods to uncover the trigger
conditions and trigger-based behavior hidden in malware?
BitScope enables
automatic exploration of program execution paths in
malware to uncover trigger conditions (such as the time
used in time bombs and commands in botnet programs) and
trigger-based behavior, using dynamic symbolic
execution. BitScope also provides in-depth analysis of
the input/output behavior of the malware.
-
- Extracting security-related models from browsers for analysis and vulnerability discovery
In this work, we show how to use string-enhanced white-box exploration techniques to automatically extract security-related models from browsers and to automatically discover cross-site scripting (XSS) vulnerabilities by comparing the extracted models with websites' filters.
-
Deviation
Detection in Binaries
Many network protocols and services have several
different implementations. Automatically identifying
deviations in different implementations of the same
protocol/service can enable the detection of potential
implementation errors without protocol specification, and
can enable automatic generation of fingerprints to
identify an implementation remotely. How can we
automatically identify such deviations in binaries
implementing the same specification?
Deviation Detection
automatically identifies deviations in different
binaries to detect implementation errors and generate
fingerprints. It is achieved by building symbolic formulas
that characterize how each binary processes an input.
-
Protocol Reverse
Engineering and Application Dialogue
Replay
Many network protocols are proprietary or have no well
documented specification. However, many security
applications require protocol reverse engineering and
application dialogue (network trace) replay.
Dispatcher, Polygot and
Replayer automatically extract information about
network protocols and enables application dialogue replay
using binary analysis.
FPGate project got Microsoft BlueHat Prize Contest's Special Recognition Award in 2012.
FPGate stops attacks targeting function pointers by limiting indirect transfers to only those targets that are legal in the original program. When deployed together with other existing lightweight protections, FPGate can provide a level of protection comparable to CFI (Control Flow Integrity), stopping almost all control fow hijacking attacks including ROP. FPGate has two main advantages compared with previous solutions: it can inter-operate well with existing non-hardened libraries, so it can be deployed progressively; we also develop a method to recognize all sources and targets automatically in modern security-sensitive binary executables, thus FPGate can be applied directly on these binary files. The performance overhead of FPGate is only 0.36% in average measured using SPECint 2006. FPGate is a joint work of Lenx with Chao Zhang, Zhaofeng Chen, Lei Duan from Peking University, and Laszlo Szekeres, Stephen McCamant, Dawn Song from UC Berkeley.
Vulnerabilities Discovered
- CVE-2011-0904 Out-of-bounds Memory Access in Gnome VNC Vino Server
- CVE-2011-0905 Out-of-bounds Memory Access in Gnome VNC Vino Server
- OSVDB-66501 Stack-based and heap-based buffer overflow in Zbot trojan
News Coverage
The BitBlaze project is looking for developers to help extend and
enhance our state-of-the art framework for binary analysis in
security applications.
In particular, we're looking for developers/researchers with skills
and experience including computer security, languages and compilers,
assembly language, low-level operating system work, and decision
procedures.
We have openings for interns (for the summer or another similar
period), staff scientists/staff programmers, postdocs, and
open-source contributors. If interested, send a CV/resume and interest
description to bitblaze.jobs at gmail.com.
-
Faculty:
Dawn Song
-
Visiting Faculty:
Lenx Tao Wei
- Postdocs and research staff:
- Students:
- Former Members:
- Postdocs and research staff:
Domagoj Babic,
Stephen McCamant,
Zhenkai Liang,
Lorenzo Martignoni,
Daniel Reynaud
- Graduated Ph.D. students:
Steve Hanna,
Pongsin Poosankam,
Prateek Saxena,
Joel Weinberger
James Newsome,
David Brumley,
Juan Caballero,
Min Gyung Kang,
Heng Yin,
- Graduated M.S. students: Xeno Kovah, Cody Hartwig
- Staff: Ivan Jager
- Graduated Bachelor's students: Eric Li
For general questions regarding the BitBlaze project, please send email to bitblaze at gmail.com.
To receive announcements about code releases and other bitblaze related updates, please subscribe to the Bitblaze Announcement List