Is all the truth we need in the data?
Is all the truth we need found in the numbers? Can the stats always chart a better course? Although it seems savvy to defer to “the data,” the devil is in the details.
1. “Eliminating all police bias,” calculates Sendhil Mullainathan (The New York Times) wouldn’t materially reduce police killings of African-Americans. Nationally, African-American = 28.9 percent of arrestees vs. 31.8 percent of police-shooting victims. If racism “were a big factor,” Mullainathan would expect “a larger gap.”
2. That’s a common stats weakness. The national data hide huge variations — 70 police forces arrest African-Americans at rates 10 times higher than other groups. In such places, the national stats are irrelevant. Stats help when they’re representative of the particulars. Otherwise the satanically slippery stats can mislead even experts (e.g., Mullainathan, and responses here and there miss the main relevance issue).
3. Here’s a medical example — Stephen J. Gould’s “The Median Isn’t The Message.” He knew his cancer’s median mortality of eight months didn’t necessarily mean he’d “probably be dead in eight months.” His particulars weren’t well-represented by the stats. He lived another 20 years.
4. More funnily… mixed types can mangle data — humans on average have one testicle and one ovary.
5. Surely doctors know better? Despite being the leading cause of death in ovary-carriers, they’re underrepresented in coronary research (30 percent, +see FDA’s gender efforts).
6. “Evidence-based” medicine’s Randomized Clinical Trials (RCT) can’t “even in principle” always deliver. RCTs “return average effects.” Great for sufficiently homogenous populations, but riskier with subpopulations of differing types/responses. Larger samples with inhomogeneous types can weaken relevance (Mullainathan above).
7. Human behavior varies more than human physiology, suggesting RCT issues in social sciences (e.g., economies have inhomogeneous behavioral mechanisms/processes).
8. Statistics, crucial to science, are also perhaps its “tragic … flaw.” Relying on the “statistical significance” recipe doesn’t ensure real-world importance (and not everything is bell-curved). Bad stats and other data biases contribute to ills in many fields (e.g., neuroscience, psychology, economics). Heaven help journalists (or these earthier volunteers).
9. Even plain, un-statistical numbers can lose context and real-world sense logic, causing “spreadsheet madness” — Larry Summers knows experts who’d argue electricity is “4 percent of the economy,” so losing much of it couldn’t hurt. Dollars especially can seem too easily comparable — risking “the spice error,” small factors ≠ unimportant.
10. Tools must match the domain. Stats excel in physics, where behaviors are stable (nothing in physics chooses). But people aren’t biological billiard balls. Our games are complicated by our choosing and changing how we choose.
11. Sports are simpler than economics and life. And we know stats in sports aren’t a sure bet. Sports and life are too polycausal (see oli- vs. poly-causal sciences). High “causal density” can subvert the utility of stats.
12. Turning the world into numbers is tricky. Never forget what the numbers really refer to.
Numbers have no monopoly on precision or truth. We’ll always need non-numerical logic (the quality of quantitative reasoning rests on good qualitative distinctions).
Many of life’s patterns remain beyond “the numbers.”
Illustration by Julia Suits, The New Yorker cartoonist & author of The Extraordinary Catalog of Peculiar Inventions