Metrics converted to goals lose their focus

One of the common discussions I have had with my colleagues over the last years was about the creation of metrics to express the productivity and the quality of an individual developer. At first glance, the idea of expressing the ability of a programmer, with a small set of easy to digest figures, seems both attractive and feasible. After all, this approach appears to work nicely in sectors like the stock market, weather forecasting or customer satisfaction, so it is very logical to assume that we can accomplish something similar when it comes to software developers.

Yet, I am reluctant to believe that it is possible to create an effective metric to measure the quality and effectiveness of a computer programmer. Although something similar, might be feasible for other kind of activities, mostly those involving very tangible production based on some kind of a repetitive routine, writing software presents a completely different paradigm, which is extremely difficult to quantify.

Think for example of what kind of metric would have even be capable of comparing a poetry masterpiece, consisting of just a few words to a New York City telephone book!  Software development resembles more to a poem rather than a pedestrian document such as telephone book, which simply gathers similar information and presents it in a purely mechanical format.

As a developer, I tend to view code as the means to express a model for a tangible problem to be solved. Given this definition, the ultimate “metric” to measure a solution, has to be related to how well it approaches the real world, its ability to function for long periods of time without interruptions and finally its adaptability to future needs, that will allow it to evolve as time goes by.

Another serious problem I can see when it comes to programmers’ performance metrics, lies in the fact that once the latter are introduced to a team, each individual tends to adopt to them, essentially shifting his focus from developing quality software to an effort to maximize his metrics, since they will represent the most comprehensible way for his management to evaluate his performance.

In other words, metrics will eventually be converted to goals causing the team to lose its focus, which of course is the delivery of a working solution and not the maximization of some arbitrary metric.

I think that the best way to measure the abilities of a developer depends on how well his application can cover the needs of its users, given the cost and time constraints that had to be met while creating the solution. Successful developers should build systems that outlive their expectations; they should have the ability to easily adapt to future needs with minimal effort and at the same time allow for easy maintenance and bug fixing during the entire lifespan of the project.


  1. Metrics can be used for good or for evil, but it is very hard to use them for good, and very easy to use the for evil. For example, we are using PHPStan to enforce static typing rules. If a programmer were to find the simplest way to make the PHPStan errors “go away”, then this is using it for evil. If they make the code better in a way that has a side effect of making a PHPStan error go away, this is using it for good. The code will actually get worse unless we think about why PHPStan enforces that rule, and then use that to make the code better.

    The same thing can happen with code coverage. One of the points of code coverage is to show you parts of the code you did not expect to be missing tests. But this only helps if you are focused on writing good tests, and only use the code coverage as a way to show you what you missed. If your focus is just to increase code coverage, then your tests are more acting like an expensive compiler than something that really tests your code’s behavior.

    But assuming that people are using the metrics for good, the metrics themselves can be used not only to point out little flaws while you are working, but the overall metric results can give you an idea of the quality of the code or skill of the programmers. So they work best on very small teams where everyone understand this concept, talks about it, and they must do code reviews to point out “cheating” the metrics.

    We occasionally change our metrics to shine spotlights on the most pressing problems. Nobody gets rewarded for high metrics. We started asking ourselves, what is the purpose of a programming team? Ultimately they write, change, and maintain features. Writing a new feature is not somehow more important than maintaining the correct working behavior of an old one. So we came up with a way to measure our team’s current “capacity”, which we hope to increase over time. Every passing acceptance assertion gives us a positive point. Every error that occurs on production subtracts a point. We are careful not to write extra assertions just to raise this number. We are careful to actually fix underlying problems in the code instead of just changing the code to make an error not get logged. Currently we are on level 1 of the game. We established what “score” we need to achieve in a single iteration in order to “beat” level 1. Then we will move onto level 2, which will incorporate more error tracking.

    I heard that recently Wells Fargo discovered they had lots of fake bank accounts. One of their metrics was how many new accounts an employee can get people to open. So apparently they started opening up fake accounts just to raise this metric. So it seems metrics only work if people don’t actually care about them…

  2. The Wells Fargo example illustrates the case I am trying to make pretty well. In the end of the day, what really counts is the effectiveness of the solution and it durability and ability to evolve without a rewrite; although some metrics can be useful in some cases, their abuse not only leads to erroneous conclusions but it also focus those who will be measured by them to the wrong direction.

Leave a Reply

Your email address will not be published.