TL;DR: We studied which static architecture metrics are correlated
with a high ratio of local changes (i.e. changes made to only a single
component). An analyses of 10 open-source systems shows a positive
relationship between the percentage of code only used within a component
and local change. We conclude that having small, clearly defined
interfaces for your components lead to more local changes, which are
easier to implement and test.
This week I had the pleasure to present our paper Quantifying the Encapsulation of Implemented Software Architectures at the 30th International Conference on Software Maintenance and Evolution. What follows is the high-level story I have presented (using these slides), if you want all the details you can find the complete paper here.
Inspecting the title of our paper we see that it is about quantifying the encapsulation of implemented architectures. To understand what this paper talks about let's start by examining these concepts more closely.
This week I had the pleasure to present our paper Quantifying the Encapsulation of Implemented Software Architectures at the 30th International Conference on Software Maintenance and Evolution. What follows is the high-level story I have presented (using these slides), if you want all the details you can find the complete paper here.
Inspecting the title of our paper we see that it is about quantifying the encapsulation of implemented architectures. To understand what this paper talks about let's start by examining these concepts more closely.
Implemented software architectures
As a whole, software architecture is defined as:
the organisational structure of a software system including components, connections, constraints, and rationale.
If we focus on the implementation within the code, we can only observe the components and the connections, stuff like constraints and rationals are normally defined in the documentation.
As an example, consider the figure on the right, which depicts a hypothetical system. We can clearly see the high-level components which (hopefully) implement a distinct functionality, and the connections that exist between these components. To get such a high-level overview of the system you can normally open up the source-code repository or look for it in the documentation. Should that fail you can always fall back to a whiteboard, a marker, and a software engineer working on the system, I still have to meet the software engineer who cannot draw such a picture of their system.
Quantifying encapsulation
Encapsulation revolves around localizing the design decisions which are likely to change (a process also known as information hiding). If done correctly, we would see that the changes to a system are done to source-code modules which are located near each other, preferably in the same component. This makes it easier to implement the change (since we do not have to jump between components) and easier to test the change (since we have less components to test).
Given a system, a definition of its components, and all changes made in the past years we can easily determine whether the process of encapsulation has been successful by using the concepts of local change and non-local change as introduced by Yu et al.
As a first step we classify each change-set in the history of a system (e.g. all commits or pull-requests) as either local or non-local. When a change-set contains source-code files from only a single component it is considered to be local, if more than one component is touched the change-set is considered to be non-local. The figure on the left shows an example of each type of change-set, blue for local and brown for non-local.
After this classification we can quantify the success of the encapsulation by simply dividing the number of local change-sets by the number of total change-sets. For example, the figure on the left shows a change-set series containing ten change-sets of which seven are local, leading to a quantification of 0.7 for encapsulation.
As explained above we would like to see as many local change-sets as possible, so we want this number to be as high as possible. However, since we also expect to see some non-local changes for cross-cutting concerns such as logging we would not expect to see a ratio of 1 that often. To get a feel for which numbers are good we can calculate this metric for many systems, thus creating a benchmark which can tell us whether this 0.7 is relatively good or bad compared to other systems.
Up until now we have only seen concepts introduced by others, which makes a rather sub-standard research paper. So what is the problem here?
As explained above we would like to see as many local change-sets as possible, so we want this number to be as high as possible. However, since we also expect to see some non-local changes for cross-cutting concerns such as logging we would not expect to see a ratio of 1 that often. To get a feel for which numbers are good we can calculate this metric for many systems, thus creating a benchmark which can tell us whether this 0.7 is relatively good or bad compared to other systems.
Up until now we have only seen concepts introduced by others, which makes a rather sub-standard research paper. So what is the problem here?
The timing problem(s)
The main issue with calculating the encapsulation of an implemented architecture using the concepts above is that it can only be done after a project has been finished. Although nice to know at that point in time, it would be nicer if we could calculate a metric on the project which provides some sort of indication of the encapsulation of the system now. Given that the current literature lists over 40 software architecture level metrics (an overview can be found here) we should be able to find something right?
So we designed an experiment to see which software architecture metric we can calculate on a single snapshot of the code (i.e. snapshot-based encapsulation) is correlated with the encapsulation calculated over time (i.e. the historical encapsulation).
The first set-up was straight-forward, select some systems, calculate the snapshot-based metrics, calculate the historical encapsulation, run the statistics, and bob is your uncle. The figure on the right shows a sketch of the outline of this set-up, using the number of components as an example of a snapshot-based metric. At a glance this set-up seems correct, but after a while we figured out there it contains a (rather serious) flaw, any thoughts?
Notice that we calculated the number of snapshots based on the situation after the last change-set. But this change-set makes a change to the system, and can also change the number of components!
More graphically, consider the chart on the left which shows the number of components on the x-axis and the change-sets on the y-axis. We see that there is a period where we have 2 components, then a period where there are 5 components, only to drop to 4 components in the last change-set. Trying to correlate the historical encapsulation of 0.7 with a number of four components is clearly incorrect, since most of the time the number of components was either 3 or 5.
To remedy this problem we adjusted the design of our experiment. Instead of using all of the history to calculate the historical encapsulation based on all of the change-sets we instead calculate the historical encapsulation based on the periods in which the snapshot-based metric is stable.
In the example above this gives us two pairs of numbers, (2, 0.6) and (5, 0.75), to indicate the number of components and the historical encapsulation for that period. Note that we do not calculate a pair for when there are four components, since we do not consider a single change-set a 'period'.
The results
Now that we know how to do the experiment we can execute it. First we selected 10 open-source software systems to investigate, giving us over 60 years of historical data. Secondly, we filtered the snapshot based software architecture metrics available down to a list of twelve metrics. This list includes simple metrics such as the number of cyclic dependencies or the number of binary dependencies, but also more involved software metrics such as the metrics which form the basis for our dependency profiles.
These last metrics are (unfortunately) not yet widely known, so let me explain them quickly. In a dependency profile we divide the source-code modules within a component into four distinct categories based on the dependencies from and to other components. We can calculate the profile by calculating the percentage of code of the system in each category, thus a profile (50, 20, 25, 5) indicates that 50% of the code is internal to components, while 20% is depended upon from other components, 25% of the code depends on code from other components, leaving 5% in the last category of code which is depended upon and depends upon code from other components.
After crunching all the numbers the result is that there is a positive correlation between the historical encapsulation and the percentage of internal code. In other words, we observed that systems which contain a higher percentage of internal code also exhibit periods with a higher ratio of local changes.
So what can I do with this result?
Given that more internal code is related with a higher ratio of local changes I would argue that you should strive towards an implementation with as much internal code as possible.
One way to achieve this is to define clear, small, and specific interfaces for your components. While this is often done correctly for the incoming interface of a component, the outgoing interface is often overlooked, leading to a large outgoing interface with a higher risk of needing change when other components are touched.
More details ...
Interested in reading more about the design of the experiment? Or do you want to know how well other metrics correlate? (spoiler: they don't) Want to know more about our ideas about how the inspected software architecture metrics can be improved? Download the full paper here!