Interrogating the Data Until it Confesses

For a (possibly NSFW, depending on your tolerance for swear words and off-color references) layperson introduction to questionable research practices, I highly recommend John Oliver’s treatment of this from the HBO infotainment / news show Last Week Tonight. He talks about the lack of publication of null findings, researcher incentivization, and statistical errors at a surprising level of sophistication:

As researchers, we want to both advance our personal scientific production and careers as well as move science forward. Sometimes intentional or accidental research mistakes can lead us to make inferences that serve one of these motivators – scientific production – to the detriment of the other – true generalizability and increased scientific knowledge.

There’s a reason why research protocols, grant applications, and manuscript pre-registration require researchers to state their scientific aims and statistical approaches ahead of time. Knowing what we’re looking for and how we intend to try to find it keeps us from participating in broad fishing expeditions that produce meaningless incidental findings. Still, even well-defined protocols can lead to statistical misbehavior. Let’s look at a few possibilities.

  • p-hacking. P-hacking consists in manipulating data (for example, removing recalcitrant research data in the guise of eliminating outliers that shouldn’t be studied) in order to get p-values to an acceptable level. John Oliver puts it nicely: “P-hacking … basically means collecting lots of variables and then playing with your data until you find something that counts as statistically significant but is probably meaningless.”
  • HARKing. HARKing is “hypothesizing after results are known”. This is the classic “fishing expedition”, where data is collected and only post-hoc does a researcher say “oh, yes, I predicted this all along”.
  • Fabrication. Research fabrication usually makes a splash in the media when a researcher is found to have simply made up data to support a manuscript. However, there are smaller fabrication effects that could take place when, say, a researcher has to figure out how to account for missing data. When there are several methods available for interpolating missing values, it can be tempting to choose the method that gives the most favorable result for the research hypotheses.
  • Inadequate statistical methodology. Was your randomization sufficient? Did you account for confounders? Did you overfit your model to the data in hand, giving artificially high predictive power? Did you use parametric measures where assumptions did not hold? This can be the trickiest problem to detect, especially if you’re not a professional statistician.

It’s important to allow room for serendipity and accidental discoveries of interesting relationships. Interesting findings resulting from fishing expeditions aren’t terrible as long as they are disclosed as such and not made to look like confirmation of pre-established hypotheses. Munafò et al. might say it best, remarking, “a major challenge for scientists is to be open to new and important insights while simultaneously avoiding being misled by our tendency to see structure in randomness. The combination of apophenia (the tendency to see patterns in random data), confirmation bias (the tendency to focus on evidence that is in line with our expectations or favoured explanation) and hindsight bias (the tendency to see an event as having been predictable only after it has occurred) can easily lead us to false conclusions…”

Why does this matter? Because studies that are irreproducible because of statistical and methodological flaws can not only stall science, but move public support (and funding) for science backwards. To close with another Oliver quote: “In science, you don’t just get to cherry-pick the parts that justify what what you were going to do anyway! … And look, this is dangerous… that is what leads people to think that manmade climate change isn’t real or that vaccines cause autism….”