Goodhart’s Law and Measuring Success without Gaming Yourself

In the late 19th century, Paul Doumer, the French Governor-General of Indochina, had grand plans to modernize the ancient Vietnamese city of Hanoi.

Hanoi was to be an exemplar of the positive influence of French colonial intervention.

Part of Doumer’s modernization scheme included a network of underground sewer pipes, which quickly became fertile breeding ground for rats.

The sewers not only offered the rats a protected breeding area, but also served as a subterranean transit system. Soon, the rats boomed in numbers and expanded their reach throughout the city.

Before long, cases of bubonic plague began to rise in Hanoi.

In 1902, the war on rats began, and platoons of rat catchers patrolled the sewers. The first week of the mighty hunt, 8,000 rats were turned in. The tally soon rose to 4,000 rats per day, and on June 12, the rat assassins logged a single day kill of just over 20,000 rats.[1]

Despite the impressive carcass totals, however, this battle raged for a couple of more years and the rats were still winning.

So, the colonial administration decided to incent vigilante ratters from the general populace by offering a bounty on rat tails. Presenting entire corpses, it was decided, would prove too much of a burden on already overworked municipal health authorities.

The bounty program immediately brought in many thousands of tails.

Eventually, though, officials began to notice an alarming number of tailless rats scurrying about the streets of Hanoi. Many people were simply cutting off the rats’ tails to collect the bounty, and then setting the rats free to breed and produce more rats with valuable tails.

Worse, health inspectors in Hanoi’s suburbs found that people were actively breeding rats to collect the bounty.

The fraud was so widespread that colonial authorities eventually abandoned the bounty program altogether.

The above is a short summary of a longer account by historian Michael Vann, who compiled his narrative from a dossier he found titled “Destruction of Hazardous Animals” while doing graduate research on the socioeconomic impact of French colonialism on Hanoi.

Vann’s account of the Hanoi rat massacre is an example of the effect of an adage formulated by economist Charles Goodhart.

Goodhart’s law

As originally formulated, Goodhart’s law states:

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

This may be all well and good for people who live in the world of economic policy.

But anthropologist Marilyn Strathern later offered a variation of Goodhart’s original maxim, and this is how it is more commonly understood:

When a measure becomes a target, it ceases to be a good measure.

And there is one more useful adaptation of Goodhart’s law, from Jerry Z. Muller, author of The Tyranny of Metrics:

Anything that can be measured and rewarded will be gamed.

Certainly, Goodhart’s law and its common variations apply in the case of the great Hanoi rat massacre.

When the French colonial authorities made rat tails the measure of success, and offered a bounty, they incented people to give them rat tails. The problem is, a rat’s tail is not nearly as convincing evidence of a dead rat as, well, a dead rat.

The authorities also failed to recognize one of the unintended consequences of the bounty program: that it would entice people to game the system in favor of short-term gain over the long-term goal.

There are dangers to letting metrics become targets, and when they do, people will inevitably try to game the system.

Here are some other commonly cited examples of the effects of Goodhart’s law:

  • When call centers use average handle time as a target, representatives may be more concerned with getting the customer off the phone than with resolving the customer’s issue.
  • When lines of code written per day is used as a software development target, programmers may write bloated code to meet the metric instead of taking the time to produce elegant, efficient functions and routines.
  • When warehouse managers use pick rate as a target, order pickers may cut corners in other areas to make rate, which leads to damaged products and equipment, safety violations, and workplace injuries.
  • When standardized test scores are the target, children devote too much time to practicing for tests and improving their test-taking ability, and teachers “teach to the test” instead of devoting time to improving students’ critical thinking and problem-solving skills and their overall breadth of knowledge.
  • When hospitals use successful patient outcomes as the target, some doctors will simply turn away patients with a poor prognosis.
  • When the mayor wants the city to look safer and makes crime rate the target, the police department may take steps to fudge the real crime statistics, such as misclassifying some felony offenses to misdemeanors, leaving some offenses off police reports, failing to file reports, or reporting a series of crimes as a single event.
  • When the number of citations for published academic articles is the target, some scholars may form “citation circles,” informally agreeing to cite each other’s work whenever possible.

Why metrics can lead us astray

People – and systems – will optimize, whether it’s toward the intended outcome or not. When metrics become targets, people change their behavior to optimize for that target, often with undesired results.

As Steve Jobs explained, “Incentive structures work. So you have to be very careful of what you incent people to do, because various incentive structures create all sorts of consequences that you can’t anticipate.”

People will often see it as in their best interest to change their behavior to hit the target, whether this is really in their best interest or not.

There are several ways that metrics can lead us astray and become bad targets.

Inappropriate – or inappropriately applied – metrics can distort information. Here are three ways metrics can distort, according to Jerry Muller:[2]

  • When we measure the most easily measurable. We may have a natural tendency to try and simplify everything, but what is most easily measured is not usually what is most important.
  • When we measure the simple, even though the desired outcome is complex. Most of us have multiple responsibilities and goals, When we focus measurement on only one goal, we will frequently get deceptive results.
  • When we degrade information quality through standardization. In our need to quantify and compare everything, we may oversimplify. Trying to make things comparable often means stripping them of their context, history, and meaning.

Muller also discusses a few common ways that people may game metrics:[3]

  • We can cherry-pick targets or seek less challenging situations that make it easier to reach the metric.
  • We can simply lower the standard.
  • We can omit inconvenient results to avoid measurement.
  • We can outright cheat.

Almost any type of metric – when used as an incentive target – will have some perverse effect.

Muller explains some of the common perils of inappropriate metrics:[4]

  • Inappropriate metrics can cause people to focus on satisfying the metric at the expense of more important goals.
  • Inappropriate metrics can cause people to advance short-term goals over long-term considerations.
  • Inappropriate metrics can cause people to get bogged down in compiling, processing, and reporting metrics – even when nothing significant is really happening.
  • Inappropriate metrics can reward or penalize people for outcomes that are independent of their efforts.
  • Inappropriate metrics can have a chilling effect on initiative and make people unnecessarily risk averse.
  • Inappropriate metrics can discourage or impede innovation, which inherently involves experimentation, risk, and frequent failure.

We often think of Goodhart’s law – and the widespread misuse of metrics as performance indicators – as applying mostly to larger organizations and systems.

But as individuals, we also use – and misuse – metrics to measure our own progress, performance, and success.

Millions of people around the world commonly use a metric that isn’t really grounded in science and may or not fit their long-term objectives. They also have the technology to measure their own performance, and it changes their behavior, for better or for worse.

Maybe because of the illusory truth effect (they’ve just heard it so many times they think it’s true), these people believe that there is some valid scientific reason for them to take 10,000 steps each day.

They also have devices like Fitbit to track this metric, and they make significant behavioral adjustments to meet this arbitrary 10,000-step target.

Now, is taking 10,000 steps a day a bad thing? Heck no. Movement is good.

But 10,000 is not some magic number of steps. It’s entirely arbitrary, and it isn’t necessarily the best target for everyone’s fitness goals or lifestyle.

The number came from a 1960s Japanese marketing campaign designed to promote – you guessed it – the world’s first wearable step-counter. The name of the device was the manpo-kei, which translates to – you guessed it again – “10,000-step meter.”[5]

Fitness targets should meet the needs of the individual. People have different fitness goals and needs, and a lot of people have specific health conditions that should be the major factor in determining their exercise requirements.

Staying motivated matters, too. Any fitness regime works best if it’s maintained over time. Some people may find it difficult to fit 10,000 steps into their daily routine, but 7,000 may be workable. Research indicates that anything over 5,000 is beneficial for most people.

More may be better for some people, but a beneficial amount is always better than an insufficient amount.

And what about intensity (or cadence)? Research suggests that to meet the minimum threshold for moderate physical activity, we should take a minimum of 100 steps per minute.[6]

So, might not 7,000 steps at a cadence of 120 steps per minute be healthier than 10,000 steps at 90 steps per minute?

And there is always the possibility of becoming so obsessed with the step count metric that you neglect other aspects of your overall fitness program.

People have been known to continuously bump up their step count goal – particularly if they are sharing this information with other people through an app – and neglect things like weight training or swimming or cycling just so they can crank up their step count.

Don’t game yourself

Measurement obviously isn’t all bad. Metrics can provide useful performance feedback.

It isn’t the use of metrics that causes problems. It’s their misuse and overuse that gums up the works.

If we can reliably measure what we actually intend to measure, and if we can apply sound judgment and reason to the results, then metrics can provide us with a tremendous amount of helpful information.

Remember that just because something can be measured doesn’t mean that it should be. And not everything that is important can be measured.

Remember, too, that there is likely no single or simple best measure of performance or productivity. When we are assessing our own (or anyone else’s) progress, we need to use a combination of different metrics to get a complete picture.[7]

Also, our activities and goals evolve over time, so the measurements we use to track and analyze our progress need to evolve as well. Measurements need to be tailored to the context, the situation, and the specific goal.[8]

As Jerry Muller explains, “Measurement is not an alternative to judgment. Measurement demands judgment.”

Specifically, Muller continues, we need to use judgment to decide:[9]

  • Whether to measure.
  • What to measure.
  • How to evaluate what is measured.
  • Whether rewards or penalties will be attached to the results.
  • Who will have access to the measurements.

We should never be substituting metrics for judgment. We should be using metrics to inform judgment. This includes:[10]

  • Knowing how much weight to give metrics.
  • Recognizing the characteristic distortions of metrics.
  • Appreciating what can’t be measured.

We need to keep our goals aligned with our purpose and vision. We don’t want to game our own system.

If we allow ourselves to obsess over a target like daily step count, we may neglect other important fitness activities like swimming or weight training. Our larger goal is to be healthy, not just to walk 10,000 steps a day.

Simply being aware of Goodhart’s law can help us keep perspective whenever we feel the need to start keeping some sort of score of our personal progress.

And we can largely avoid gaming ourselves by:

  • Choosing useful metrics, not just easy ones.
  • Using more than one relevant metric.
  • Using metrics to inform our judgment, not as substitutes for judgment.
  • Not sacrificing long-term purpose for short-term gain.

The use of metrics can give us the power to improve, but only if they advance the goals of our own process and purpose.

Metrics are intended to give us feedback and guidance, not to hijack our vision.

Notes

1. Michael G. Vann, “Of Rats, Rice, and Race: The Great Hanoi Rat Massacre, an Episode in French Colonial History,” French Colonial History 4, (2003): 196. http://doi.org/10.1353/fch.2003.0027

2. Jerry Z. Muller, The Tyranny of Metrics, (Princeton, NJ: Princeton University Press, 2018), 23.

3. Ibid, 24.

4. Ibid, 169-171.

5. David Cox, “Watch Your Step: Why the 10,000 Daily Goal is Built on Bad Science,” The Guardian, September 3, 2018, https://www.theguardian.com/lifeandstyle/2018/sep/03/watch-your-step-why-the-10000-daily-goal-is-built-on-bad-science

6. Catrine Tudor-Locke et al., “How Fast is Fast Enough? Walking Cadence (steps/min) as a Practical Estimate of Intensity in Adults: A Narrative Review,” British Journal of Sports Medicine 52, no. 12 (2018): 787. https://doi.org/10.1136/bjsports-2017-097628

7. Thomas Fritz, “Measuring Individual Productivity,” in Perspectives on Data Science for Software Engineering, ed. Tim Menzies, Laurie Williams, and Thomas Zimmerman (Burlington, MA: Morgan Kaufmann, 2016), 68.

8. Ibid, 69-70.

9. Muller, Tyranny of Metrics, 176.

10. Ibid, 182.

Scroll to Top