On measurements, goals, and how systems are shaped by the metrics you chase

Noah here. If you’ve spent any time in the corporate world you’ve surely encountered Goodhart’s Law. Named after British economist Charles Goodhart, it is typically described as "When a measure becomes a target, it ceases to be a good measure." In other words, as soon as you turn a metric into a goal, its usefulness decreases. This is not an argument against measurement, but a warning about how the act of measuring can change perceptions, incentives, and, most importantly, actions.

The most common version of Goodhart’s is best told through a story of a possibly fictional Soviet-era nail factory:

Once upon a time, there was a factory in the Soviet Union that made nails. Unfortunately, Moscow set quotas on their nail production, and they began working to meet the quotas as described, rather than doing anything useful. When they set quotas by quantity, they churned out hundreds of thousands of tiny, useless nails. When Moscow realized this was not useful and set a quota by weight instead, they started building big, heavy railroad spike-type nails that weighed a pound each.

David Manheim, who has written a bunch on Goodhart’s Law called this kind of fudging the numbers “munchkin-ing”. “You have to put a lot of things in place to make sure that you can trust the numbers that come out of a system where you're paying people or motivating them,” he explained on an episode of the podcast Rationally Speaking from last year. “Or in the Soviet case, threatening to throw them in the Gulag if they don't manage to do what you want them to. It's really hard to get them not to play games then.” 

When we talk about systems that involve motivating people at some level (which almost all do), we can’t ignore the effects of Goodhart’s Law on the output. If you’ve worked in a product organization that focuses on features instead of impact, or ever taken a class where your paper was measured on page count instead of depth of understanding, you can attest to the ways people will game the system. That is, they’ll build more features and extend the margins to ensure they deliver on the measured goal.

Why is this interesting?

As our national focus shifted from COVID to the police, I was reminded of Reply All’s excellent two-parter on CompStat, the system developed in the mid-90s by the NYPD for measuring and managing crime. The system involved tracking all the crime in the city and then regularly reporting it back to the bosses of the city’s police department. Here’s how the meetings were described on Reply All:

Police chiefs, like 50-year-old men, would vomit in the bathroom before CompStat meetings. They would try to find friends in the department who could tip them off to see if they were up next. These were guys who lived in neighborhoods where they ran little armies of 300 men who had to obey every single one of their orders, who could never question them about anything. And now, they had to go to this other room, where they stood in front of a guy in a bowtie, surrounded by everybody they'd ever wanted to impress, 200 of their scariest peers, and they just got their lives nitpicked apart. They got asked the kind of follow-up questions you ask somebody on their first day of the job when you're convinced they know nothing. And if they couldn't answer those questions right, they were humiliated. And then they were fired. One chief told a reporter, "If they're going to keep having these meetings, they should really have us check our guns at the door." 

But the system worked, or at least it appeared to, and crime in New York City plunged. Murders were down almost 20 percent in a year. From the outside, it seemed that sunlight really was the best disinfectant. But, as always happens when you tweak complex systems, the second-and third-order effects started to kick in, and what at first might have started with good intentions, turned into a system that focused on specific crimes and communities that could produce stats and misreported some terrible things that might embarrass their boss in front of the city’s top brass. Here’s a bit from Reply All again:

But some of these chiefs started to figure out, wait a minute, the person who's in charge of actually keeping track of the crime in my neighborhood is me. And so if they couldn’t make crime go down, they just would stop reporting crime. And they found all these different ways to do it. You could refuse to take crime reports from victims, you could write down different things than what had actually happened. You could literally just throw paperwork away. And so that guy would survive that CompStat meeting, he’d get his promotion, and then when the next guy showed up, the number that he had to beat was the number that a cheater had set. And so he had to cheat a little bit more. 

“The chiefs felt like they were keeping the crime rate down for the commissioner,” the episode continues. “The commissioner felt like he was keeping the crime rate down for the mayor. And the mayor, the mayor had to keep the crime rate down because otherwise real estate prices would crash, tourists would go away. It was like the crime rate itself became the boss.” (Emphasis mine.) Fans of The Wire will recognize this pattern through the show’s five seasons.

Mayor de Blasio, Police Commissioner Bratton Announce CompStat 2.0

That is how Goodhart’s Law works. It’s not that measuring things or setting goals are bad, but in an environment where promotions and pay are attached to those measures and goals, you can be assured people will bend the numbers and the system to ensure they look good. As I said in the Systemic Edition, this isn’t to say that that we shouldn’t hold those individuals inside the police department responsible for their actions, but interventions need to be focused much deeper in the structure if there is to be any hope of creating lasting change. (NRB)

