Detecting Cross-language Dependencies Generically

Most systems I see are not written in just a single technology. Instead, dedicated technologies  are used for the front-end (think JavaScript, HTML, JSP), the server-side business logic (think Java, C#, C, insert-your-favorite-programming-language-here) and the data-storage (think any flavor of SQL or NoSQL). 

To get an initial understanding on how all of these technologies work together in the system it is needed to determine what the dependencies between these technologies are. Where does the JavaScript connect to the server? And how is the stuff we see on the front-end connected to the stuff stored in the database?

For the most common cases there is tool-support available to detect these dependencies (semi)-automatically. However, extending these tools with support for a new language is not trivial because the techniques used requires you to deeply understand the full language of the new technology.  

Until now...

Last year I had the pleasure of co-supervising Theodoros Polychniatis for his internship at the Software Improvement Group on the subject of Detecting dependencies across programming languages. After diving into the available literature and soliciting requirements from consultants he managed to create a prototype tool which implements a relatively simple algorithm to detect dependencies across source-code modules written in different technologies. 

The first evaluations show that the algorithm is capable of producing a relatively good recall (e.g. finding the dependencies that you want) and a reasonable precision (e.g. finding only actual dependencies). As always, the behavior of the algorithm greatly depends on the parameters used, but the initial results are positive enough to continue exploring this idea.

Apart from producing the thesis Theodoros was also the driving force behind the paper:

This paper will be presented (and published) at the 17th European Conference on Software Maintenance and Reengineering (CSMR 2013). If you are considering attending CSMR make sure you drop by at the Software Quality and Maintainability (SQM) workshop for a chat and a very interesting key-note!

As a closer, here is the abstract of the paper:
In order to evaluate large, heterogeneous information systems (i.e., comprising modules developed in diverse programming languages) a method to detect dependencies among these modules is needed. Although there is a variety of methods that can detect dependencies within a single programming language, the available cross-language detection methods use extensive language specific information to parse and analyze modules written in different languages.
In this paper, a new method for detecting cross-language dependencies is proposed. This method is generic, yet accurate and can support new languages with minimal effort. To evaluate the method, a tool was created and a series of experiments was conducted on a small case study for which dependencies had been extracted manually. The evaluation shows that the method is effective, extensible and easily explainable.