Pimping my environment(s)

You probably already noticed that the blog has been pimped. I have adopted the quote that was mentioned at the SUD, updated the links, added the logo and an overview of all the labels. The blog now represents more of what it already was, a place to write about the projects I am involved in. And yes, me is also a project I am involved in :)

Apart from the blog I also updated the PHP-Sat website. There is now some documentation for PHP-Sat and the bug-patterns. The documentation of PHP-Front will also be updated soon. The last thing I have to do is writing the friendly and catchy welcome page, always a difficult task.

We are traveling further away from the project when I tell you that I also updated my computer configuration. You might recall that I used to work on a virtual machine, which tend to be a bit slow. So my current configuration is a dual-boot system with Windows2000 and Fedora Core 6 (default).
The documentation for setting up the working environment was more or less a guide for myself to get everything working again. It was definitely worth the work of backing-up all my configurations. A complete compilation of PHP-Front and PHP-Sat now takes 10m47s instead of the old 23m37s.

The last environment that I have pimped has nothing to do with any of the projects, except the me-project. I have cleaned my room and moved some stuff around. It is surprising to see how many useless things I had and how much space you gain when you throw them out. Although I am one of those people that believes in: 'everything you throw away will be useful the next day, my trust in this claim is fading away. It has been two days now and I still do not need the four, 10 centimeters long, lightsabers collected from cereal-breakfast-boxes.

String representations

With revision 299 we (again) have a list representation for DoubleQuoted- and HereDoc-strings. So the string "hello \t world" is represented by:

ConstantEncapsedString(
   DoubleQuoted([Literal("hello ")
                ,Escape(116)
                ,Literal(" world")]
))


This not completely new because the first implementation already had this, but this representation had some problems. There was no way to model a hexadecimal escape without making things ambiguous. This problem can be solved by making the order of the internal parts explicit, but we then had a terrible representation of the string:

ConstantEncapsedString(
   DoubleQuoted(
     DQContent(Some("hello ")
              ,Escape(116)
              ,Some(" world"))
))

Note that this string is represented with 1 of those DQContent-thingies, the biggest one has three children. So every string with more then 3 parts has nested DQContents. Let me give you an example, the string "Hello \\\0123" looks like:

ConstantEncapsedString(
   DoubleQuoted(
   DQContent(Some("Hello ")
              ,DQContent(Escape(92)
                        ,None
                        ,OctaChar(48,49,50))
              ,Some("3"))
))

Terribly right?

So this had to be solved by a post-processing step. We have to walk over the tree bottom-up to rewrite these nasty DQContent's to a nice list. It is not nice to have such a post-process-step, but this was already required because of HereDoc-strings.

The problem with the HereDoc is analogous with the problem of the Dangling-else. If you have multiple HereDoc-strings with the same label you will have to choice where the first HereDoc ends. PHP always takes the shortest HereDoc so this piece of code has two variables:

<?php
   $foo = <<<BAR
     foo...
  
BAR;
  
   $bar = <<<BAR
     bar
  
BAR;
?>

As long as HereDoc is ambiguous this is easily solved by choosing the right amb-node.

But after the rewrite to the new internal implementation, HereDoc became unambiguous! This is a bit frustrating because it takes the longest HereDoc, which is wrong. I spend some time in trying to get it right, but I could not make it work (yet). So a new puzzle has entered the project, happy christmas! :)

P.S. As said in the last blog, I can hope, but maybe I should just read.

What's the deal with ... ?

What's the deal with the lack of updating of PHP-Sat?
The last few weeks were filled with all sorts of other interesting activities, at least for me. My days are mostly filled with doing research for my master thesis and writing my proposal. In the past two weeks my free time was filled with the SUD, the visit to Google and the Grammar Engineering Tools. So plenty of projects, but little time to work on PHP-Sat.

What's the deal with this thesis then?
The thesis is the final project/course I will have to finish before I graduate. I will do some research to validate a certain algorithm that will improve feedback in educational programs. A more detailed explanation of the subject/algorithm/approach will be put into a blog soon. It's a totally different subject, but still interesting for people in computer science as in education.

What's the deal with this Grammar Engineering Tools project?
As mentioned before, the paper that was the result of the project was already submitted. Some improvements are made and the tools are (going to be) updated. My work on this project was still in the interest of PHP-Sat though. We have gotten a great insight in which combinations of operators are valid and which are not. This information will be put on the web as soon as we have a reasonable format for it, so stay tuned!

What's the deal with those labels/link underneath the posts?
The new version of blogger allows you to have labels under a blog. I thought is would be nice to order the posts according to certain topics. People that only want to read about my thesis can look here, those that only want to read about PHP-Sat here. I hope that the people behind blogger will add a RSS-feed per label in the future, but I can only hope :)

Operator precedence

So what was the curious remark in the last blog? What is the interesting functionality that I have made? The title gives the area of the functionality away, 'Operator precedence'.

The problem with PHP, and most grammars in general, is that the operator precedence is usually ill-documented. There is some documentation in a table, but the real reference is the implementation itself. Martin has a really nice idea about how you can check two grammars on having the same operator precedence, simple yet elegant.
If you take two definitions of a grammar, for example one defined in YACC and one defined in SDF, you can extract what is allowed and what not. The process of extracting the precedence rules that encode the behavior from these formalisms is written down in the paper that is produced, but I am not sure whether or not this is put on the web before we know if it is accepted to LDTA 2007. I do not even know if I can explain this very clearly in one post. So I can not provide a link to the paper, but this might be an interesting subject to blog about for Martin. (Yes, this is a hint!)
After extracting the precedence-rules you have to rename constructs and filter extensions from the rules. Finding the exact rewrites that are necessary is easy if you first extract the production-rules that are possible. This reduces the set that you are looking at from 3000-5000 to 30-50 rules, which is much easier to examine. After the rewriting you can compare the two sets of precedence-rules by a simple diff, a built-in strategy.

So what did I do exactly? Well, I worked on the actual tools in order to get some result, and we definitely did! We found several (precedence) problems in C-Transformers and the SDF-library, which both target C. This demonstrates the great power of the tools, because I am not that familiar with C myself!

We also found some precedence problems in the PHC, which were fixed very shortly after the report was send in. I was aware of the fact that there are some problems in PHP-front regarding operator precedence. In fact, it was one of the reasons for making these tools. The number of about 400 precedence problems was a still bit overwhelming at first. But most of these problems are due to the same operators and just produce about 49 warnings because all of the other operators report an error on it. We still have a lot of work to do, but we now have tool-support for checking the precedence rules!

Back from Dublin

The trip to Dublin can only be subscribed as really great. My dad and I had a great time and I can really recommend it to everyone to visit this divers city.

We took an Aer Lingus flight from Amsterdam to Dublin and where only 30 minutes late, but this is something you get used to when you use the public transportation in Holland. The cab that we took drove straight through Dublin, the city is really huge! Traffic all over the place (on the wrong side of the road), almost no bikers and crowded streets. After we checked in we spend the afternoon walking around the main streets of Dublin and I went to Trinity to meet Edsko.

There seemed to be some confusion about the time of the meeting, but eventually Edsko and John showed up so we could get something to eat at the Mona Lisa. The food and the conversation was really nice. We talked about parsing PHP, the differences between the internal representations and living in Dublin. We concluded that the projects are not really compatible because the internal representations are really different. These internal representations of the PHP-sources can be made compatible from PHP-Sat to PHC, but not the other way around. The goals of the projects are also quit different, but we could probably learn something from each other. Although this conclusion is a bit of a disappointment I had a really nice time. Edsko also showed me some of the inside of Trinity College, which is definitely worth a visit. Thank you Edsko and John for a fun evening, I hope we meet again sometime.

The visit to Google was on Tuesday at 12 o'clock, so we still had to fill the morning with something useful. We visited to the national gallery which had an interesting exposition about the Irish culture in the last 200 years. They also have a (very large) collection of other paintings which were less interesting to me, but my dad seemed to like them.

And then it was finally time to go to the Google Office. Leslie could not make it because she was sick, but Rob Holland was kind enough to take over the coordination. We started the visit in the game-room which is filled with video-games, a snooker-table and a massage chair. We left our bags there and Rob showed us around the floor and the different teams. Everybody seemed to be busy, but they also took the time to say 'hi'.
After the tour we went downstairs to the restaurant. There was plenty of food (all free), drinks (all free) and ice-cream (again all free). The conversation with the engineers during lunch was very interesting. They have done some fun (and dangerous) experiments with various (expensive) toys, but they also work very hard.
We finished the visit with the lightning presentations of our projects. The topics of the projects were pretty far apart, but it is good to broaden your horizon.
The visit confirmed my expectations that working at Google is pretty cool, but you will still have to work hard. This is not so bad because the people seem to be very nice and intelligent and the atmosphere is great. Thank you Google for making this visit possible!

We flew back on Wednesday after visiting the Guinness-brewery, you just have to visit this brewery when you are in Dublin.

The overall conclusion is that the trip was fun, exciting and really interesting. It is hard to describe everything in words, but it was definitely cool!

But onto the next challenge. After spending about one hour figuring out all the dependencies for some yacc-converting tools I am going to hack some interesting functionality together. Stay tuned for more information about this incomprehensible remark.

SUD 2007 roundup

Warning: rather long story with a lot of my own opinions up ahead!

I attended my first Stratego User Days this week without really knowing what to expect. I had seen the titles of the talks, but some of them still made me wonder about what was going to be presented. So I took the train to Delft with sleepy eyes and a blank mind.

If you think that the SUD is like a conference then you are wrong, it is more of an informal gathering of people that use Stratego. They explain to each other what they do, how they handle problems and what they would like to see in the next release of Stratego. At first this gave me the idea that the room would be filled with _all_ of the users of Stratego, which was probably true for some of the earliest SUD's, but the first presentation of the day already proved that this idea was wrong. Martin Bravenboer started the SUD with a presentation about the current status of Stratego. He did not only explain how hard they worked on the 0.17 release, but also showed a list of papers and projects, including one complete slide about PHP-SAT, that use Stratego. Some of the people on this list use Stratego without any help from 'the core people', Martin and Eelco, so this probably shows that Stratego is catching on. Martin also mentioned that they are several (Phd) positions to fill, so if anybody is interested they should contact him.

Eelco Visser gave a presentation about the new compilation scheme of Stratego.The presentation was a bit too technical for me, but it showed some nice goals and resulted in a (short) discussion. The discussion ended at the moment that Eelco gave the right example by getting himself some coffee.

Martin continued after the break with a presentation about the new library structure of Stratego. He explained why almost all of the functionality is moved to the library in order to target the portability problem. The funny thing is that some of the presentations that where about to come would complain about this problem, which shows that the needs from the community are actually being fulfilled. I liked the fact that Martin gave lot's of examples that used php-front to illustrate the new features, always nice to see your own stuff used.

We had lunch a little late because we already hopelessly behind schedule, but it was still very nice. I have to say that I like the cafeteria in Delft, the money/food ratio is pretty good. The presentation of PHP-SAT started after the lunch and went very well. The people looked interested even after the moment that the laptop shut himself down because the battery was empty. Karl Trygve Kalleberg mentioned that IBM also had a project about static analysis of source code called Wala, but he did not remember whether it supported PHP. So I tried to find the PHP-part, but I could not find it so it is probably not supported.

Benoit Sigoure talked about his project in which he extended PRISM to deal with real life problems. I always enjoy it when somebody talks about projects that are really useful to them. He also mentioned some problems that he encountered and gave a 'wish list' of items he wanted to see in Stratego. I have experienced most of the problems he has and I totally agree with the fact that he wants some more static checking. One of the other wishes was a debugger and I would like to put this on top of the list. Being able to step through my Stratego program is something that would help me a lot, and others as well.

During the coffee break that followed the presentation of Benoit one of the girls in the room next-door asked me what kind of meeting we had. I explained the concepts of the SUD to here and she replayed with the phrase: ..I already thought that it was something with programming, you are all wearing those nerdy-code-t-shirts.., thank you very much indeed.

After I had probably been insulted by the girl, Mikal Ziane and Nicolas Pierron talked about Lutin. They use java-front and dryad to do some kind of code refactoring, but it wasn't completely clear to me. I think I might have understand things better if we weren't interrupted by the fire-alarm. It gave us a change to see some of the campus and the other people in the building, but it didn't help us to keep up with the schedule.

The presentation of Wouter Caarls about embedding Stratego in C showed another thing that can be done with Stratego. The techniques he used where not very complicated, most of them where also covered in program transformation course in Utrecht, but the combination was interesting.

Valentin David his presentation ended the day by giving an overview of the current C++-front-ends. This presentation was not very interesting to me because I (currently) do not use C++, but I can at least find it again when I need to. But during the talk he mentioned semantic designs, which offers support for analysis. They also have a front-end for PHP, so I probably should take the time to take a look at this.

The second day of the SUD started with a presentation about a system that comparable with Stratego, but written in Java. The system is called TOM and it has some very nice properties. They borrowed some features of Stratego and I hope we also will borrow some of their features. The small features, like matching on a sort without specifying the number of children or the not-match, are most likely not difficult to implement but useful additions to Stratego. They also showed an eclipse-plugin for their project and a graphical debugger, great things to have and very useful. Another project that is added to the list of things to checkout.

Another connection between Stratego and Java was presented by Karl Trygve Kalleberg. He showed the Spoofax project, which also holds a plugin for Eclipse. I haven't really thought about looking at this for the syntax highlighter for Context, but it might be a good idea to check how he did this.

Bernd Fischer gave a presentation about what he wanted to get from the Stratego community. Some of the ideas could also be useful for other Stratego developers, but most of his wishes could probably be solved by implementing a separate library for ACI1-terms. Some of the problems that he mentioned are actually handled by MathPert, so he might want to take a look at it.

The talk of Alexandre Borghi about vectorization was a bit to technical for me. I think I understand why you want it, but could not figure out completely how everything worked. I do not have a problem with this, I do not intend to use it in the near future, but other people will certainly find it interesting.

A presentation that was really interesting for myself was the presentation of Bogdan Dumitriu. He did his master thesis on improving support for data-flow transformations for Object-Oriented programs. Many ideas from his thesis and his talk are very useful for me and the PHP-Sat project, so I am definitely going to read the complete thesis. His support for break- and continue-constructs can easily be added to PHP-Sat and his ideas about customized transformations are definitely cool. As soon as a get a copy of the thesis, and some spare time, I will write a blog about these cool subjects.

My own version of the SUD ended with the presentation of Karl Trygve Kalleberg and Valentin David in which they present some ideas for extending Stratego. Some of the extensions, like an attribute grammar system, are already implemented in Transformers and look interesting. Other proposal are still in the 'this-would-be-a-nice-idea'-stage, so we will have to wait and see what the future brings us.

To conclude this roundup I wanted to say that the whole experience was very cool. It was nice to see what other people are doing with Stratego and was a good opportunity to do some feature-requests. I am already looking forward to the SUD of next year.

The SUD also gave raise to a great quote coming from Pierre-Ettienne Moreau which I probably will going to use more often. The quote displays a great sense of a pragmatic attitude which really appeals to me. During the presentation about TOM Pierre-Ettienne explained some side-effect that could occur during the application of a strategy. After someone asked him whether it was pure he replied:

..it is not pure, but it is practical.