American Consequences - May 2018

pollster to generalize from the smaller sample to that larger population. But the subjects in social science experiments are almost never randomly selected. They are often laughably unrepresentative of the general population, and their number is usually very small, for reasons of time and money. In her brilliant monograph, The Cult of Statistical Significance , co-written with Stephen Ziliak, the economist Deirdre McCloskey showed why the difference between the two selection methods is important. If a group of subjects isn’t randomly selected, then you can’t accurately generalize and “scale up” your findings. So social scientists have found a workaround. As a measure of whether an experimental finding is “true,” they have substituted the standard of “statistical significance” for the standard of common sense. That is, instead of gathering a large, random sample, researcher take the data generated by their small, nonrandom samples and subject it to various kinds of statistical manipulation. Then, when the data show, or seem to show, some kind of “significant” pattern the researchers claim their finding is valid. Like Ioannidis, McCloskey showed that such methods always run the risk of confusing faulty data (“statistical noise”) for meaningful data. Indeed, she and Ziliak said, an obsession with making the numbers appear statistically significant can obscure a vast array of methodological flaws. McCloskey’s book all by itself should have caused an about face in social- science research. Instead, it was published,

The process of social science – by which bad methods lead to bad experiments, which lead to bad findings, which lead to bad papers published in overrated journals, which lead to bogus stories on NPR and in the Washington Post – has been called “Natural Selection for Bad Science.” It’s as if an invisible hand were guiding researchers into faulty practices at each stage. The headwaters of this process is known as “publication bias.” For the young social scientist in a tenure- track job at a university, “publish or perish” is a pitiless mandate. Editors of academic journals want to publish papers that bring favorable attention from journalists who crave As a measure of whether an experimental finding is “true,” they have substituted the standard of “statistical significance” for the standard of common sense. blandly praised, and never rebutted. For convenience’s sake, it was tossed down the memory hole, where it was later joined by the Reproducibility Project. The statistical weakness has even become the subject of satire. In 2011, researchers from the University of Pennsylvania and UC Berkeley assembled a group of 20 undergraduates and played them Beatles records, gauging their reactions with a series of questions before and after. Pushing the data to a point of “statistical significance,” the researchers were able to “prove,” ridiculously, that listening to “When I’m Sixty-Four” actually reduced the calendar age students’ by an average of 18 months.

74 May 2018

Made with FlippingBook - Online magazine maker