The Danger of Development Metrics

Categories: Architecture

I recently saw a reference to a McKinsey article on measuring software developer productivity. While their article is actually pretty good (if rather generic), I just wanted to emphasise out the danger of using metrics to reward or punish work.

The problem is that many metrics are simply too easy to “game”. If you give incentives for people to optimise any particular metric then there are many ways to do that - only some of which are actually beneficial for the business.

Goodhart’s Law summarizes this nicely:

When a measure becomes a target, it ceases to become a good measure.

A prime example is “code test coverage” - it is quite easy to add tests which increase code coverage while not bringing any benefit at all. And in the end, test-coverage is not a desirable business goal in itself; it’s just something that under some circumstances can produce more reliable software. It also carries costs, so not all projects should have the same target value. Other metrics such as “number of commits” or “lines of code written” have similar issues.

Even the DORA metrics are not safe, despite initially appearing somewhat business-benefit-centric. Deployment frequency can be increased by deploying disabled features in multiple steps (something that also reduces the change failure rate).

The only software metrics that are safe to use in this way (for reward or penalty) are ones that directly measure business benefits, eg profit or customer satisfaction - and even then it is important to be sure that the benefit is a long term one. Optimise for customer satisfaction scores and a lot of customer features may be pushed out in a short time - but implemented in a way that is not maintainable over the long term, or with security and scalability issues which don’t become apparent until later. These issues are not new, being the same as the problems with setting goals for top-level executives1. The Tyranny of Metrics addresses the problems of metrics in general; the introduction presents many more such cases (preface/introduction viewable online for free).

This doesn’t mean that metrics which measure non-business goals are useless. However they (and particularly the DORA metrics) must be seen as side-effects of good practices rather than as goals in themselves. This means that while a team whose metrics “don’t match expectations” can be requested to review their existing practices, there should be no positive or negative implications to that request. There should be no bonuses paid to teams which achieve desired metrics. Perhaps more importantly, there should be no “list of shame” for teams whose metrics look different; any team which can present good arguments for their practices should be praised and an effort should be made to develop alternative metrics which apply in their circumstances. Taking this approach encourages dealing with the root cause of issues rather than encouraging direct manipulation of the metrics via non-productive actions.

The McKinsey article initially mentions metrics such as:

  • number of customer-reported defects
  • employee experience scores
  • customer satisfaction ratings

These are all good business-centric values (as long as consideration is taken to long-term effects, ie not encouraging behaviour which optimises these scores over a short time period only).

McKinsey then talk about DORA metrics (which I have mentioned above); they do say these are “a signal to investigate” which is an appropriate view. The reference to SPACE metrics (centered around employee satisfaction) are interesting, and seem like a good idea. Many of the other ideas that are described in the article can be traced back to the simple concept of “asking the employees how they feel” - which is indeed usually a good idea.

The McKinsey article also addresses the issue of “simplistic metrics” - exactly the points described above.

The following quotes can be found in the introduction to The Tyranny of Metrics:

The problem is not measurement, but excessive measurement and inappropriate measurement - not metrics but metric fixation.

and

There are things that can be measured. There are things that are worth measuring. But what can be measured is not always what is worth measuring; what gets measured may have no relation to what we really want to know. The costs of measuring may be greater than the benefits. The things that get measured may draw effort away from the things we really care about. And measurement may provide us with distorted knowledge - knowledge that seems solid but is actually deceptive.

The excellent Dave Farley has a review of the McKinsey report on his Continuous Development Youtube channel. He is very critical of it, and seems to see this as trying to measure individual performance - which I would agree would be a bad idea. However I didn’t see the McKinsey document that way at all; it seems to me to instead be attempting to measure overall performance or group performance - and often by asking developers how they feel. That’s a fair approach in my opinion - even if McKinsey’s insights here aren’t (IMO) very original or deep.

Further Reading

Footnotes

  1. Stephen Elop’s contract as CEO of Nokia included a bonus if the company was purchased during his management term. His subsequent actions led to a massive drop in company share price - after which it was bought. The result was a payout under this term in his contract but a poor return for stock-holders. While no proof of intent exists, the existence of this clause was perhaps not wise.