[Trivia] Pentaho Mondrian Slow MDX Query

Finally, another trivia-post. I could not find the answer easily on the web, but there might be people that are struggling with the same problem we had.

While working on my current project we ran into a problem with a MDX-query. In short, it retrieved about 14,000 rows from a database and displayed a subset of these rows on the screen in a variable ordering. The problem we where facing was that it took about 4-5 minutes before the data showed up.

We knew that the query was not completely trivial, but 14,000 rows is not that much data. Furthermore, one would expect that an OLAP-database could handle this kind of volume very easily. In the beginning we thought that it took so long because all of the data was first retrieved from the database and then sorted within the Java-code. On the other hand, sorting a list of 14,000 items should not take that long. An other theory we had was that the bottleneck was the number of queries that is generated (about 4000) to retrieve the 14,000 rows. This was not the case, every query only took 1-2 ms to complete.

So we fired up YourKit, a Java memory/CPU-profiler. I can highly recommend this tool to anyone who wants to profile there code. It provides detailed reports about all sorts of interesting things and comes with a nice interface.

While going to the profiling output we noticed that the code was spending about 50 percent of the time in a 'Sort'-method, and 40 percent of the time in a 'Debug'-method. Since we didn't see that much debug-output we did what any programmer would do when has not written the tool (s)he works with: we assumed that the profiler was wrong and concentrated on the 'Sort'-method.

Examining the profiler output some more we discovered that 40 percent of the time in the 'Sort'-method was actually spent by calling the 'Debug'-method above! So we looked up the code and found out that the people of Mondrian where actually calling this 'Debug'-method every time they compared two objects.

Question: When sorting 14,000 objects, how many lines of debug-output would you get?
Answer: A lot.

The funny thing was that we did see some debug-output, but not as much as you would expect from the code. Furthermore, the 'Debug'-method was only called when the log4J-level was set to DEBUG. We did use a log4j-configuration, the default that was shipped with the application, but this file did not mention Mondrian anywhere. It did mention a file called 'server.log' which we eventually found. The size of the log-file was about 1 gigabyte, which might explain where all the debug-lines went to.

Our current solution is to simply remove the log4j-configuration from our project. This still gives us enough feedback to track the usual things, but it also prevents Mondrian to go into debug-mode. It reduced the running time from 4-5 minutes to 4-5 seconds, a considerable speed-up don't you think?

The new release plan

Last week I made a list of things that I wanted to do after reading Production OSS. Today I took the time to implement two of the items on the list, which reduces the list to only thirteen items. Luckily, the tasks on the list are not difficult to implement, but they do take some time to think about the best way to implement them.

The first thing on the list was bringing the website up to date. It still had some pointers to the build-farm in Utrecht and some tips to work around missing functionality that actually exists nowadays. I also tried to make it more clear what the intention of the project is, although I think it can still be made more explicit. Fortunately, this is also an item on the list :)

A second item on the list was the cause for a large number of mails from Jira send to the psat-commit-list. I made up a new release plan of PHP-Sat / PHP-front and also removed some issues/features that where no longer wanted. Let us take a look at the new milestones for the two main projects:

The 0.1 release must be capable of parsing, pretty-printing and reflection on PHP4 and PHP5 code. Furthermore, constant propagation need to be supported for procedural code without function calls. Within the 0.2 release this support is to be extended towards function calls and object creation. Support for more information about the environment of PHP, for example the types of internal functions, is planned for version 0.3.

The release structure is closely related to that of PHP-Front. Within the 0.1 release it must support procedural code without function calls and object creation. These language constructs must be supported within release 0.2. The purpose of the 0.3 release is to add many new bug patterns.

In both projects we can see that the number of issues that need to be fixed before the first release is quit low. So hopefully we can make a first release around the end of the year!