Bouwers > Eric > Blog: May 2007

Some new features

Last Friday, I had a conversation with nEUrOO in #stratego about the results he was having with php-sat (see his blog for more details). After the conversation, and after looking at the test-file he mentioned in his blog, I got started and added 2 new features to php-sat.

The first new feature is aimed at the usability of the tool itself, and is visible by a new command-line option: -ra CODE. This option accepts one of MCV, COR, EI, STY or OPT as input, and makes sure that the only analysis that is run is the one that belongs to the given code. In other words, you can now run, for example, only the correctness checks by calling php-sat with: --ra COR. This gives you a somewhat coarse-grained control over the behavior of the tool. A plan for more fine-grained control (on the level of patterns) is also mentioned some time ago, but the implementation of this level of control requires some more thoughts. In the meantime, please enjoy running a single kind of bug-patterns :).

A second new feature is added to the analysis of safety-levels. Consider the following example:

<?php
  echo addslashes(htmlentities($_GET['name']));
?>

The default configuration for echo requires a parameter to have both the level EscapedHTML as well as EscapedSlashes. Furthermore, the default configuration defines the return-type of the functions as:

function: addslashes       level: escaped-slashes
function: htmlentities     level: escaped-html

So this piece of code should not be flagged by php-sat. Unfortunately, previous revisions did flag this piece of code!

The problem here is that the analysis uses the safety-level of a function that is mentioned in the configuration file without considering the parameter of the function. This behavior works well for most functions, but when functions only add a property to their parameter it becomes incorrect. Because of this behavior, the echo-statement is flagged because the call to addslashes is only annotated with EscapedSlashes.

Fortunately, the solution is not that complicated. Since we know that there are several functions that add a certain property to their parameter, we add the possibility to specify this in the configuration file. The new syntax for functions that add a safety-level is a '+' after the level of the function. This '+' forces a function to inspect the level of its (only) parameter and combine this with the level that is specified as its safety-level. The combination of the levels is the result of the function call. (Small note: this behavior is (currently) only supported for functions with a single parameter.)

So from now on, when the following configuration is used:

function: addslashes       level: escaped-slashes +
function: htmlentities     level: escaped-html +

the example above is not flagged anymore because the call to addslashes is annotated with its own safety-level (EscapedSlashes), as well as the safety-level of its parameter (EscapedHTML). A pretty useful feature I would say.

A scouts week

Those of you that (try to) read the posts of this blog every week might have noticed the two-week gap between this post and the previous one. This gap originates from the hobby I practice since I was a (very) little boy: Scouting. Within the past week I had the chance to go to two camps, with a three-day rest-period in between. This results in a large amount of dirty clothes, little sleep, a voice like a grinder and lots and lots of fun!

The first camp I attended was the 'clusterweekend' of the Dutch Contingent. My role in this camp was that of quartermaster of the Bontbekplevieren, one of the 25 troops visiting the World Jamboree this summer. Because it is almost impossible to get all of the Dutch participants together on one terrain, the 25 troops where divided into three clusters. There where eight troops in our cluster, each troop consisting of 40 persons. Adding a few people for camping staff, working staff and general staff we had about 350 people attending this camp.

Even though the weather forecast was a bit disappointing at first, the rain all fell down on us during the first night. This immediately showed that our new tents are water-proof, which is a good thing considering the normal weather conditions on England. During the rest of the weekend we played some games on the beach, sorted out a massive amount of badges, played in a casino and had a big party. Luckily, the sun was shining on Sunday so we could pack everything in dry conditions. The weekend was a great success, and of course there are some general photo's, as well as photo's of our troop.

When I came home on Sunday I quickly unpacked everything and washed some of the clothes. I worked a bit on Monday and Tuesday, but I also had to take care of some things for Ontmoeting, the second camp of this week. Ontmoeting is organized once a year for all the scouts of our region in the age of 11-15. They are mixed into sub-camps to compete for the first place. Each year the competition is wrapped into a certain theme, this years theme was 'Bond maakt het Bond'. The mission was to become the replacement of James Bond. The 153 participants where divided into 6 camps, each representing a 'superhero' of some kind. I was part of the Bassie en Adriaan-camp, a rather famous duo in the Netherlands.

This camp started on Wednesday with the packing of 'Diana', the trailer of our group. After loading all sorts of stuff we want to the campsite which was relatively small, but this made the whole thing kind of cosy. On Wednesday-evening we had a BBQ and a campfire with the staff, the children arrived on Thursday morning. The next three days where filled with all sorts of larger games, smaller games and some pretty cool ticket-activities. Naturally, you can check these out on some of the photo's, made by the same people that also made the camp-news-paper. It is always hard to tell about the whole camp in just a few words, but it all boils down to having a lot of fun. This year the games where really fun, thank you 'spelstaf', and our sub-camp has won the cup! To quote a famous clown: 'Alles is voor Bassie!'

These kind of weeks are a great way to relax, no worries about making deadlines or missing important stuff. It is simply not possible to read a real news-paper, so you just ignore the rest of the world for a couple of days. Also, it gives me a lot of energy to start working again, so let's get to it!

After catching some more sleep of course...

Merging fun

Yesterday, commit number 379 and 380 introduced a renewed implementation of the constant-propagation, and new functionality for finding vulnerabilities. You guessed it, the merging of constant-propagation and vulnerability analysis has taken off! The cool things are that 1) the old tests all pass, 2) some new tests pass and 3) I get to write a lot more tests!

The main reason for starting the integration of both analysis was the fact that I saw a lot of code duplication popping up. This duplication was caused by the fact that the bookkeeping of internal structures is the same for all strategies, the code only differs for the properties of values.

With this last piece of information I started to merging. My first approach consisted of trying to pass a list of get-set strategies to the main-strategy and calling these strategies dynamically. This was obviously not a very good or nice start because it always seem to result in numerous segmentation faults.

Thinking about the problem made the second attempt somewhat more pragmatic. Instead of generalizing I just wanted to make it work for constant propagation in combination with a second analysis. So I rewrote the strategy to receive two strategies, one for getting the properties of literals and one for getting the properties of operators. The choice for these language constructs is based on the reasoning that these constructs are the only ones that make or manipulate the actual properties of values. The other language constructs manipulate the flow of the values instead of the actual values.

When this all seemed to work the more difficult challenge had to be solved, manipulation of variables and arrays. It turned out to be more simplistic then I thought because of the indirection between the variables and their values. For the purpose of dealing with aliasing, a variable does not point to a value but rather to a value-identifier. This identifier points to the actual value. This makes the creation of a reference easier, we just create a mapping from a variable to the value-identifier of the referenced variable. Because of this indirection we can simply make the value-identifier point to more then one property, implemented by a dynamic-rule for each property. This makes the merging of sets of dynamic rules a bit harder, but not impossible.

It might sound simple (or incomprehensible), but getting everything right was still a bit tricky. For example, when an assignment is made the property of the RHS must be known before the LHS can be assigned this property. So what happens when the constant propagation cannot compute a value, should we simply fail to assign a property?
The answer is no, the second analysis might still be successful. These kind of little problems made the implementation a little less straight-forward, and the code a little less beautiful.

However, the result of it all is that the following example is now flagged correctly:

<?php
 $foo = $_GET['asdf'];
 $bar = 1;
 $bar =& $foo;
 echo $bar;
?>

In this case, echo $bar will be flagged by the latest php-sat.

I experienced one problem with the implementation that is related to the semantics of Stratego. My first attempt in adding an annotation to a term was something like this:

 add-php-simple-value(|val):
    t{a*} -> t{annos}
      where  b*    :=  a*
           ; annos := [PHPSimpleValue(val) | b*]

This works perfectly, the annotations are matched as a list by the *-syntax, and a list is added as an annotation to the term again. The only problem with this is that the second time this rule is applied it matches the annotations as a list of a list of annotations, which was not the behavior I desired. This problem is easily solved by also adding a * to build the term:

 add-php-simple-value(|val):
    t{a*} -> t{annos*}
      where  b*     :=  a*
           ; annos* := [PHPSimpleValue(val) | b*]

Now the list of annotations is not wrapped in an actual list anymore. I know it is documented somewhere, but this little explanation might save some others from an headache or a long debug-session.

The next step in the analysis for vulnerabilities is a rather important one: testing. Even though the basic parts of variables and assignments are already tested, there exists a large number of scenarios that need to be tested on this new integration-strategy.
But hey, testing is fun!

Passing time

After I stopped working on the GUI for the RFG I have been working in Haskell again. It has been quite a while since I have worked with this language, so it takes me some time to get used to it again. Since it is a strongly typed language, as apposed to both Stratego and PHP, some things take longer to implement, but some mistakes are found by the type-checker. Unfortunately, this means that I have not done anything terribly interesting for my thesis.

So lets look at the other interesting project, PHP-Sat. I have been working on the integration of a second analysis within the constant-prorogation. This is coming along nicely, but it requires some heavy thinking and careful considerations. I have already worked my way up to the expression-level, so I hope to finish the rest of the constructs that were already supported by the end of this weekend.

In order to tell at least one interesting thing, and to not waste your time completely, I wanted to point out some interesting video-presentations. The first one is also the first one I ever saw through the internet: Drupal, Joomla! & GSoC. The title explains why I wanted to see it, and it is interesting for anyone that wants to know more about the GSoc from a projects point of view. The second also comes from the Google Tech Talks and is called How Open Source Projects Survive Poisonous People (And You Can Too). It is given by the people behind subversion and I especially liked the story about the bikeshed. The third presentation comes from Bram Molenaar, the creator of VIM (yes, that is the editor I mostly use). He talks about 7 Habits For Effective Text Editing, be sure to check it out even if you do not use VIM.

The presentations mentioned above are (some of) the presentations I have already seen, the following are on my todo-list. Please let me know if anyone of them is super-great, or a total waste of time.

The paradox of choice
Education in the Digital Age
How to get paid to do open source (does this mention the GSoc?)