How Data Broke Science

Conor Digan
4 min readJan 7, 2020

I want you all to imagine a coin. It looks like any normal coin except that this coin is special. This is the magical truth coin. The way this coin works is that you generate a theory about something, you whisper this theory to the coin and if your theory is correct, the coin will land on heads. Otherwise, it lands on tails. Now, there is one flaw with this coin in that it will randomly show up heads 1 out of every 20 flips irrespective of whether or not your theory is true. But even still, it’s a pretty useful coin if it’s going to tell you the truth 95% of the time isn’t it!

And ever since humans invented this coin, we’ve been (unsurprisingly) using it for research. Scientific journals have been filled with theories that, when told to the magical truth coin, it turns up heads and we publish these new findings. Fields like social science, economics, psychology and countless others have been provided with a huge amount of insights from this coin. Now, we still have that pesky problem of 1 in 20 flips showing up heads whether or not the theory is true. But still, 1 in 20 publications being wrong isn’t all that bad given all the information we gain from the other 19. Right?

Well that would probably be acceptable if that were the case — just 1 in 20 publications are in fact False. However, in recent years, people have started noticing that it’s not simply 1 in 20 that are in fact False but approximately 60%. WTF happened!

Well, upon investigation, it turns out that there are two major causes. The first is that researchers have started asking the coin the same question in slightly different ways. So, if you had a theory that increasing the minimum wage also increased unemployment, you can ask it: “Does increasing the minimum wage cause unemployment to increase?”, “Does increasing the minimum wage cause more people to be unemployed?” and so on. If you ask it the same question enough times, the coin eventually toss up that random 1 in 20 head and your theory will be published. The second major cause is that it turns out that there are many orders of magnitude more coins in the world than there were a few decades ago with lots of different people flipping them. So, if you have lots of researchers with the theory that increasing the minimum wage actually decreases unemployment and they all ask different coins that question, one will randomly show up head and that lucky researcher will get published.

Okay Conor. That’s great. But we don’t have these magical truth coins and that’s not how we do research! Well, it actually is. Each flip of this coin is the same as those wonderful statistical significance tests that so much of modern scientific research is based off. You gather your data, propose a theory about the data and statistics can tell you if that theory is correct. Except that, 1 in 20 times it will randomly tell you that the theory is correct when it’s not. Same as the magical truth coin.

And historically, doing the data analysis required to test one of these theories took quite a lot of work. Thus, researchers generally only got to propose one or two theories of any dataset so the amount of random 1 in 20 false positives was quite low. And also, these datasets used to be very hard and expensive to generate so there was a limited amount of chances to flip coins.

However what’s happened nowadays is that computing power has gotten far better so it’s now possible for researchers to try tonnes of slightly different theories of any given dataset and see which ones come out to be “statistically significant”. This is akin to asking the magical truth coin the same question in lots of slightly different ways until you get a heads. And also, now that we have many orders of magnitude more datasets floating around these days, there are also tonnes more people proposing theories and some randomly getting lucky with the 1 in 20 heads and getting published for it.

From a scientific journals perspective who are tasked with choosing who should be published and who shouldn’t, there’s no easy way to tell which coin flips are a genuine head because a theory is true vs a random 1 in 20 head. This is what’s known as the replication crisis. And, if you want to see this for yourselves, you can try to figure out what the effects of increasing the minimum wage would be.

--

--

Conor Digan

Data Scientist intrigued by our new world of big data and the strange effects it can have on our society