Why We Need to Look Critically at Eye-Catching Studies

This content is locked. Please login or become a member.

5 lessons • 17mins
1
Let Data Drive Your Decision Making
04:46
2
Why We Need to Look Critically at Eye-Catching Studies
03:53
3
How Vulnerable Are You to Fake News and Urban Legends?
03:46
4
Heighten Your Sensitivity to Rhetorical Tricks
02:39
5
Follow Rapoport’s Rules
02:50

Is This Research Preliminary?: Why We Need to Look Critically at Eye-Catching Studies, with Gary Marcus, Professor of Psychology, NYU, and Author, Rebooting AI

Oftentimes people report a study, and probably you should call it preliminary research, there’s one study here. And you probably need to do hundreds of studies to really know. There’s a huge crisis in science right now, and the people call it the replicability crisis. So the replicability crisis is you do a study and someone else tries it and they get a different result. If you do a study once, and it’s interesting, it gets reported in the media as fact, but just because it gets reported as fact doesn’t mean that it really is fact. There are many, many studies that don’t replicate. There are statistics that suggest that something like half the studies in major publications don’t replicate. Now it’s complicated, so you could have a study that’s real and someone else tries it and it doesn’t come through for all kinds of different reasons. So you actually need to do multiple studies.

One technique is called meta analysis, when you put together multiple studies. And then nowadays the trendy, and I think the sensible way to do this is to use Bayesian statistics to try to combine the studies in a particular statistical fashion. But the idea is you need many studies in order to know, and this is why, for example, we have many phases of clinical trials when we give people serious medications. So first you just make sure it does no harm. And then you want to notice, does it really help people? And you need replication all the way down. And if you don’t have replication, you need to be worried.

AI is on the verge of having a similar crisis. A number of people have pointed out in the last few years that what typically happens in AI, is somebody reports a result, but they don’t tell you all the conditions that it required to make it. And so it worked at least once, but for example, people have things called random seeds so that the random numbers generated in their network will start in a particular place. And it turns out sometimes things work with the random seed that was reported, but not with other random seeds, or it works if you have exactly 10,000, quote, neurons, but it doesn’t work if you have 9,999 and nobody knows exactly why. Or people will try 10 different runs of things and report the one that worked the best. That is begging for replicability crisis. This is how psychology and medicine got in trouble is that the sexiest results get reported in the literature and the sexiest results are often not representative. They’re not quite what normally happens.

So you try this study, you go home and try to do the same thing, it doesn’t work. Some people in Montreal and elsewhere have really started pointing out this problem in AI. It’s a serious problem right now that I don’t think anybody has a solution for yet, except that we need to be more careful and not so much in a hurry. So people need to define, in advance, what they’re going to be testing statistically, and they need to report all the results and not just the sexy one that’s in the cover of nature.