Merging fun

Yesterday, commit number 379 and 380 introduced a renewed implementation of the constant-propagation, and new functionality for finding vulnerabilities. You guessed it, the merging of constant-propagation and vulnerability analysis has taken off! The cool things are that 1) the old tests all pass, 2) some new tests pass and 3) I get to write a lot more tests!

The main reason for starting the integration of both analysis was the fact that I saw a lot of code duplication popping up. This duplication was caused by the fact that the bookkeeping of internal structures is the same for all strategies, the code only differs for the properties of values.

With this last piece of information I started to merging. My first approach consisted of trying to pass a list of get-set strategies to the main-strategy and calling these strategies dynamically. This was obviously not a very good or nice start because it always seem to result in numerous segmentation faults.

Thinking about the problem made the second attempt somewhat more pragmatic. Instead of generalizing I just wanted to make it work for constant propagation in combination with a second analysis. So I rewrote the strategy to receive two strategies, one for getting the properties of literals and one for getting the properties of operators. The choice for these language constructs is based on the reasoning that these constructs are the only ones that make or manipulate the actual properties of values. The other language constructs manipulate the flow of the values instead of the actual values.

When this all seemed to work the more difficult challenge had to be solved, manipulation of variables and arrays. It turned out to be more simplistic then I thought because of the indirection between the variables and their values. For the purpose of dealing with aliasing, a variable does not point to a value but rather to a value-identifier. This identifier points to the actual value. This makes the creation of a reference easier, we just create a mapping from a variable to the value-identifier of the referenced variable. Because of this indirection we can simply make the value-identifier point to more then one property, implemented by a dynamic-rule for each property. This makes the merging of sets of dynamic rules a bit harder, but not impossible.

It might sound simple (or incomprehensible), but getting everything right was still a bit tricky. For example, when an assignment is made the property of the RHS must be known before the LHS can be assigned this property. So what happens when the constant propagation cannot compute a value, should we simply fail to assign a property?
The answer is no, the second analysis might still be successful. These kind of little problems made the implementation a little less straight-forward, and the code a little less beautiful.

However, the result of it all is that the following example is now flagged correctly:
$foo = $_GET['asdf'];
$bar = 1;
$bar =& $foo;
echo $bar;
In this case, echo $bar will be flagged by the latest php-sat.

I experienced one problem with the implementation that is related to the semantics of Stratego. My first attempt in adding an annotation to a term was something like this:
t{a*} -> t{annos}
where b* := a*
; annos := [PHPSimpleValue(val) | b*]
This works perfectly, the annotations are matched as a list by the *-syntax, and a list is added as an annotation to the term again. The only problem with this is that the second time this rule is applied it matches the annotations as a list of a list of annotations, which was not the behavior I desired. This problem is easily solved by also adding a * to build the term:
t{a*} -> t{annos*}
where b* := a*
; annos* := [PHPSimpleValue(val) | b*]
Now the list of annotations is not wrapped in an actual list anymore. I know it is documented somewhere, but this little explanation might save some others from an headache or a long debug-session.

The next step in the analysis for vulnerabilities is a rather important one: testing. Even though the basic parts of variables and assignments are already tested, there exists a large number of scenarios that need to be tested on this new integration-strategy.
But hey, testing is fun!

1 comment:

Martin said...

hm, yes, the relevance of the * in a variable name is indeed somewhat obscure.

The idea of list variables might be more clear if they would be forbidden outside of a list (so [a*] is ok, a plain a* is not).