Dynamic taint analysis is a practical and popular technique for detecting violations of data integrity in computer systems, including commercial software that is only available in binary form. However, dynamic taint is an all-or-nothing attribute, and its imprecision leads to both false positive and false negative errors. Dynamic taint analysis suffers from false positives when programs properly sanitize initially untrusted data, and it suffers false negatives when it fails to track implicit flows such as load-address dependencies and effects mediated by control flow.
We propose the construct of influence to capture in more detail the control that input variables have over an output variable. A special case of the information-theory concept of channel capacity, influence measures how many values an attacker might cause an vulnerable variable to take. For instance, if untrusted inputs can only choose between two possible values for a variable, we say they have 1 bit of influence; but if a machine word could be arbitrarily overwritten, the attacker has 32 bits of influence.
To evaluate this measure, we present a practical approach for measuring influence, and implement it. Our approach is a family of strategies that complement each other, giving results that are either exact or have narrow error bounds. Our tool uses these strategies to measuring influence in commodity binary software. (By comparison, previous scalable approaches provide soundness without any guarantee of precision, and previous precise approaches do not scale to off-the-shelf software.) We then apply this tool to the post-analysis of dynamic taint analysis alerts. Our tool's results confirm some alerts as true positives (including the attacks exploited by the Slammer and Blaster worms), and show that others are false positives due to common data sanitization techniques.