One step towards a software metrics catalog

A little over a year ago my proposal was accepted in the Tiny Transaction on Computer Science. If you have not read it, the body of the publication is:

Unfortunately, such a catalog has not materialized instantly :(

However, last week we did take a small step forward during the Workshop on Emerging Trends in Software Metrics. During this workshop, Arie van Deursen presented our proposal for a Software Metrics Catalog Format. This format is specifically designed to provide a concise, yet meaningful overview of a software metric, while also showing the relationships a software metric has with other metrics.

You can read the complete description of the Software Metrics Catalog Format in our publication, but it is probably more appealing (and fun!) to visit our demo implementation using a semantic wiki hosted at referata.com.

Naturally, all comments, questions, remarks and contributions to the catalog are more than welcome!

Defending the propositions

As explained before, a thesis in Holland is accompanied by a set of propositions. These propositions are considered to be a part of the thesis, which means that the members of the doctoral committee are allowed challenge them if they desire.

This is why the regulations state that the propositions:

'...shall be academically sound, shall lend themselves to opposition and be defendable by the PhD candidate, and shall be approved by the promotor.'

Furthermore, at least six of the propositions should not concern the topic(s) of the thesis, and at most two propositions can be playful in nature. 

So let's see whether we succeeded, there is the list of my propositions (each one linking to this blog-post which contains an explanation for the proposition):
 So what do you think, did I succeed? And do you agree with them all?

The propositions explained

This post explains all of the propositions listed here.

To enable the effective application of software metrics, a pattern catalog based on real-world usage scenarios must be developed.


This proposition is actually a complete publication, it was published in the Tiny Transactions on Computer Science, Volume 2. As explained in that paper, there has been a vast amount in research in the area of software metrics. Many software metrics have been designed and validated over the past decades, but only a few software metrics are used by project teams to identify and solve problems in a timely manner.

One reason for this lack of adaption is that it is currently hard to decide which software metric should be used in which situation. Documenting the benefits and limitations of metrics makes this decision easier, which ultimately leads to more successful software projects.

 

 

The software architect should take the responsibility for the implementation of the system.


According to the global IT Architect Association:

"The software architect has mastered the value, use, development and delivery of software intensive systems. They have developed skills in software development lifecycles, software engineering and software design. They are also responsible for communicating software concepts to all levels of management and for ensuring that expected quality attribute levels are achieved. "

On different occasions I have encountered a software architect which only concerns him/herself with the design of the system, not with the actual implementation. In other words, the architect does not communicate with the development team. This way of working is based on the assumption that the design is such that all of the quality attributes are achieved. Thus, when the implementation follows the design, the implemented system will also achieve the desired quality attributes.

Unfortunately, the implementation only follows the design in very rare cases. During the implementation developers will run into nasty problems with the used technologies, unexpected events and border-cases, or simply with errors in the design.  It is crucial that the development team can rely on the software architect reflect upon these situations and make the decisions that are necessary. In other words, the software architect should be an integral part of the development team, and the one person that makes all final decisions.

 

If software engineering PhD students spend 20% of their time 'in the field', their research will be based on more realistic assumptions.


The field of software engineering research should reflect upon the way in which professionals design, construct, test and maintain (e.g. 'engineer') software systems. In my opinion, the best way to do this is to observe professionals to identify some of the problems that they are facing. The researcher then develops solutions for these problems and verifies whether these solutions indeed solve the identified problem.

For this last step it is crucial that the research has been based on realistic assumptions about the data available, the effort people want to invest or the processes that can be changed. If any of these assumptions are incorrect it is going to be hard to a) get professionals to apply the solution for the verification, and b) to get wider acceptance for the solution after the initial validation.

By spending time together with professionals in industry, a PhD student gets an idea of what constraints are put upon these professionals in terms of time, resources and data. This knowledge can immediately be used to test the assumptions for potential solutions, avoiding the development of unnecessary or unrealistic ones. 

Making the names of reviewers public will make reviewers more inclined to write better reviews, which increases the quality of the overall review process.


In the majority of cases the review process for conferences and journals is a closed process. The input is a submitted paper and the output is a decision and (possibly) a set of reviews. Most of the time the paper has been read by two or three reviewers who wrote a review, and in some cases these reviews are discussed by the program committee (either online or in person).

Within the current process the names of the reviewers are known to the rest of the committee and the chairs, but the authors normally do not know who wrote the reviews. In theory this makes sure that the reviewers can be honest in their reviews without being afraid that their feedback gets back to them in undesired ways. Unfortunately, this also enables reviewers to reject papers on loose claims or false beliefs. In addition, this cloak of anonymity provides the reviewers an opportunity to be less civilized than they could be. Lastly, being anonymous decreases the reward for writing a very good and detailed review, since only a few people witness and appreciate it if you do.

Especially this last problem can be solved by making the names of reviewers public, since the authors know who to thank for reviewing their paper. In addition, just as a paper should not contain claims without evidence, a reviewer should be less inclined to make a loose claim if his name will be known. Of course, an author still needs to accept a 'reject'-decision, but this should be possible if the feedback in the review is honest, civil and supported by facts.

 

Hiring a skilled typist is an often overlooked option during the design of an automated process.


When I just learned to program during my university training I wanted to automate everything, all repetitive behavior was to be captured in scripts, macro's or programming tools. Using this strategy you quickly run into a situation in which the effort to automate some steps is way bigger than the eventual costs savings. 

This XKCD-comic actually summarizes pretty clearly how much time you can spend on the automation of a certain tasks before you spend more time on the automation instead of the task itself. For some of the systems I have seen this table would have saved quit some time, money and irritation of the peoples involved.

 

Understanding your goal makes it easier to deal with unpleasant chores.


In every job, study or other day-time passing activity there are chores that are not fun to do. For example, in this PhD project my initial reaction to a complete restructuring of a paper is always annoyance. Yes, it might be a good idea. Yes, it will make the paper easier to read. Yes, it does make the story better. But most of all, it requires me to work another three evenings and it requires me to throw away two days of work! Did I mention that the deadline is just three days away?

At these points in time I always tried to take a step back and look at the overall goal, which is to write a nice PhD thesis. The restructuring is gonna cost me now, but the realization that the paper becomes better (okay), which gives a higher chance of acceptation (good), which means that I can finish the overall project on time (awesome!) usually makes me want to do the restructuring anyway.

So, whenever an unpleasant chore comes along I try to ask myself: 'what goal am I getting closer to by completing this task?' In most cases, this helps me to do the task anyway. And in those cases that I cannot figure out why the task helps in reaching a goal, this helps me to not feel bad about not doing the task at all.

 

The replication of experiments becomes easier when all PhD students must replicate an existing study during their research.


Replicating an experiment is performing the same experiment with slight variations in terms of data or set-up. This is in contrast with reproducing an experiment, which is geared towards reproducing the exact same results. Within academia, replication of experiments is needed to confirm earlier results and to broaden the common body of knowledge. Replication of an experiment can be viewed as performing a double-check of the work, making it less likely that an error has been made earlier.

By replicating an experiment, a PhD student learns about the choices that need to be made during an experiment, and the possible effect(s) of these choices on the outcome of the experiment. In addition, the PhD student probably finds out that an experiment cannot be (easily) replicated because of missing data, too few details in the description of a procedure, or the absence of running source-code. Because of this first-hand experience (and frustration) about missing details the documentation produced by the student about his own experiments will be of higher quality, thus making the experiments easier to replicate.

 

Using only metrics as acceptation criteria leads to undesired optimization

 

This proposition is based on chapter 5 of the thesis, which in turn is based on the article called 'Getting What You Measure'. This article describes four pitfalls I have seen over and over again when metrics are being used in a project management setting. The most widespread one of these is probably 'treating the metric', e.g. making changes to a system just to improve the value of a metric. 

In some cases this is not problematic because improving the value of the metric also helps in reaching a goal. For example, if you want people to write more code you could require them to check-in 2000 Lines of Code each day. However, you probably want them to write useful code, something which is hard to capture in a metric. And even if you could, there would a long list of other characteristics that are desirable, but you didn't think of specifying at the start of a project.

Therefore, I think that using metrics as a formal acceptation criteria is fine, but it should always be clear why a specific value of the metric is desirable. In other words, always communicate the overall  goal along with the formal metrics, and focus the acceptation on the goal itself instead of the metrics.

 

The most important objective in [Boy Scout] training is to educate, not instruct. (cf. Lord Baden-Powell)


I spend a large part of my life being a boy-scout, only the last few years I have been a bit busy with a different project. By joining the scouts movement I have had many great times, I have traveled to many different places and have meet a wide range of interesting people. Because of all this, I figured that a quote from the founding father of the scouts movement should be in my list of propositions.

My interpretation of this quote is that you should not strive to tell somebody on what to do next, but that you should help the person to understand the current situation such that he can derive useful actions himself. I think the best way to summarize the benefits of this approach is to refer to the old saying: 'give a man a fish and he can eat for a day, teach a man to fish and he can eat for a lifetime'.

 

The fact that the McChicken tastes the same everywhere, proves that it is possible to have distinct teams produce the same results.

 

I have personally sampled the McChicken in many places in the Netherlands, and this PhD project allowed me to sample them in Spain, Portugal, Canada, the USA, Germany, Belgium, Italy and Switzerland. I am always amazed that this piece of fast-food does not only look similar, but also has the same taste (or lack according to some, 'lack of taste') in all those locations. 

Unfortunately, I do not have any insight into how this distribution/production process works. And although I probably do not want to know all of the details, I would really like to understand which conditions have to be met in order to replicate this achievement in other fields.    

Time to defend the dissertation

Since October 2008 I have been introducing myself as 'a technical consultant and a PhD student'. On the 28th of this month, around 16:15 hours I hope to drop the second part of this sentence!

Because June 28th (at 15:00 hours) I will start defending my dissertation against the eight members of my doctorate committee. The ceremony lasts for a little over an hour and is carried out according to a strict set of rules, which prescribe everything from the way in which everybody needs to be addressed up until the clothes that will be worn by the committee, my paranymphs and me.
 
The subject of the defense is my dissertation, titled 'Metric-Based Evaluation of Implemented Software Architectures'. If you have followed this blog you know most of the content by now since it is basically a compilation of my previous publications. If not, the easiest thing to do is to read through the summary enclosed in the PDF version of the dissertation. 

At first I was a bit skeptical about the usefulness of bundling all of the papers together since they are already published. However, writing down the overall story felt pretty good, and I must say that it was quite a joy to unpack the printed copies of the resulting book! (BTW, there are still a few copies left, so you can probably still get one of them at the defense).

Apart from asking questions about the dissertation itself, the members of the doctoral committee are also allowed to ask questions about one of the ten propositions that accompany the dissertation. Although I am not sure whether I can pull off a 'Project #tweetprop' (e.g. a short blog-post per proposition), but I'll definitely discuss the propositions in a later entry. So stay tuned!

When do you consider a software metric useful?

Do you use software metrics in your project? Which ones? Why do you use those software metrics?

The answer to question one is probably 'Yes'. The answer to question two may vary, but hopefully the answer to question three is: "because I find them useful".

For me, the usefulness of a software metric is determined by two properties. On the one hand the software metric should be a correct quantification of what I want to measure, while on the other hand the value of the metric should provide enough information to make a decision.

To verify whether a metric measures what you want it to measure you can examine the value of a metric for a small number of cases, or you can conduct a more quantitative experiment to understand the statistical behavior of a the metric on a large group of systems/components/units. The nice thing about such an experiment is that you can conduct it in a relatively safe lab-environment using open-source systems.

Because of its relative easiness this type of evaluation has been done extensively over the past years.  Virtually every scientific paper on software metrics includes at least one or two case studies, but often researchers also examines the statistical relationship between the value of the (newly proposed) metric and other desirable attributes. For example, we did this for our Component Balance and Dependency Profiles metrics.

To understand whether a metric can be effectively used in a decision making process is more complicated. First, you need to ensure that the metric is available for a large number of projects for an extended period of time. Secondly, you need to observe the people involved in the projects and record discussions/decisions involving the metric. Lastly, the gathered data needs to be analyzed to extract usage patterns and identify areas for improvement.

This second type of evaluation requires quite some time, patience, access to a wide range of software projects in various stages of development, and you need to be able to communicate with the people involved in these projects. Basically, you need to find a company which allows you to conduct this type of research, which might be the reason why I did not find any study which evaluates software metrics in this way.

You can probably guess which company allowed me to conduct this research. Indeed, within the environment of the Software Improvement Group me and my co-authors were allowed to study the usefulness of our architectural metrics. The full details of the evaluation design and the results are available in our ICSE 2013 SEIP paper:
which is going to be has been presented at the ICSE conference in San Fransisco! The slides of this presentation can be found by clicking this link.

Naturally, I am very proud of this paper. In particular because it takes the evaluation of the software metrics one step beyond the usual statistical validation. What do you think, should all metrics be validated like this or should we look at other aspects as well?