Home » Posts tagged 'P-hacking'
Tag Archives: P-hacking
Roger Peng and Jeffrey Leek of John Hopkins University claim that “ridding science of shoddy statistics will require scrutiny of every step, not merely the last one.”
This blog post originally appeared in Nature on April 28, 2015 (see here).
There is no statistic more maligned than the P value. Hundreds of papers and blogposts have been written about what some statisticians deride as ‘null hypothesis significance testing’ (NHST; see, for example, go.nature.com/pfvgqe). NHST deems whether the results of a data analysis are important on the basis of whether a summary statistic (such as a P value) has crossed a threshold. Given the discourse, it is no surprise that some hailed as a victory the banning of NHST methods (and all of statistical inference) in the journal Basic and Applied Social Psychology in February.
Such a ban will in fact have scant effect on the quality of published science. There are many stages to the design and analysis of a successful study. The last of these steps is the calculation of an inferential statistic such as a P value, and the application of a ‘decision rule’ to it (for example, P < 0.05). In practice, decisions that are made earlier in data analysis have a much greater impact on results — from experimental design to batch effects, lack of adjustment for confounding factors, or simple measurement error. Arbitrary levels of statistical significance can be achieved by changing the ways in which data are cleaned, summarized or modelled2.
To combat the practice of p-hacking, the editors of Basic and Applied Social Psychology (BASP) will no longer publish p-values included in articles submitted to the journal. The unprecedented move by the journal’s editorial board signals publishing norms may be changing faster than previously believed, but also raises certain issues. In a recent article published by Rutledge, editors of BASP, David Trafimow and Michael Marks, bring up 3 key questions associated with the banning of the null hypothesis significance testing procedure (NHSTP).
Question 1: Will manuscripts with p-values be desk rejected automatically?
Answer 1: No […] But prior to publication, authors will have to remove all vestiges of the NHSTP (p-values, t-values, F-values, statements about ‘‘significant’’ differences or lack thereof, and so on).
Question 2: What about other types of inferential statistics such as confidence intervals or Bayesian methods?
Answer 2: Analogous to how the NHSTP fails to provide the probability of the null hypothesis, […] confidence intervals do not provide a strong case for concluding that the population parameter of interest is likely to be within the stated interval. Therefore, confidence intervals also are banned from BASP.
In a recent post on Data Colada, University of Pennsylvania Professor Uri Simonsohn discusses what do in the event you (a researcher) are accused of having altered your data to increase statistical significance.
It has become more common to publicly speculate, upon noticing a paper with unusual analyses, that a reported finding was obtained via p-hacking.
For example “a Slate.com post by Andrew Gelman suspected p-hacking in a paper that collected data on 10 colors of clothing, but analyzed red & pink as a single color” [.html] (see authors’ response to the accusation .html) or “a statistics blog suspected p-hacking after noticing a paper studying number of hurricane deaths relied on the somewhat unusual Negative-Binomial Regression” [.html].
Instinctively, Simonsohn says, a researcher may react to accusations of p-hacking by attempting to justify the specifics of his/her research design but if that justification is ex-post, the explanation will not be good enough. In fact:
P-hacked findings are by definition justifiable. Unjustifiable research practices involve incompetence or fraud, not p-hacking.
Guest Post by Anja Tolonen (University of Gothenburg, Sweden)
Seventeen excited graduate students in Economics met at the University of Gothenburg, a Monday in September, to initiate an ongoing discussion about transparency practices in Economics. The students came from all over the world: from Kenya, Romania, Hong Kong, Australia and Sweden of course. The initiative itself also came from across an ocean too: Berkeley, California. The students had different interests within Economics: many of us focus on Environmental or Development Economics but there were also Financial Economists and Macroeconomists present.
The teaching material, which was mostly based on material from the first Summer Institute, organized by BITSS in June 2014, quickly prompted many questions. “Is it feasible to pre-register analysis on survey data?”, “Are graduate students more at risk of P-hacking than their senior peers?”, “Are some problems intrinsic to the publishing industry?” and “Does this really relate to my field?” several students asked. Some students think yes:
BITSS is currently holding its first summer institute in transparency practices for empirical research. The meeting is taking place in Berkeley, CA with 30+ graduate students and junior faculty in the attendance.
Ted Miguel (Economics, UC Berkeley), one of the founding members of BITSS, started with an overview of conceptual issues in current research practices. Across the social sciences, academic incentives reward striking, newsworthy, and statistically significant results at the expense of scientific integrity. This creates several issues, including publication bias and an incomplete body of evidence. Fortunately, new norms and practices are emerging, driven mostly by bottom up efforts among social science researchers.
Scott Desposato (Political Science, UCSD) followed with a fascinating talk on the issue of ethics in field experiments. “Social scientists are venturing overseas to conduct an increasing number of experiments, which are increasingly larger in scope.” Yet many of these experiments are illegal under local legislation, involved unconsenting subjects, and generate risks for bystanders. “One day, a researcher is going to push too hard and things will get out of hand. This is likely to create a backlash, which would limit our future ability to access a lot of important information […] You can’t outsource ethical judgment to the IRB – you need to think carefully about what you are doing and what the consequences will be.” Ignoring those issues has potentially serious consequences to subjects, enumerators, investigators, and entire scientific disciplines.
From Jerry Adler in the Pacific Standard—on the credibility crisis in social science research, publication bias, data manipulation, and non-replicability. Featuring BITSS aficionados Brian Nosek, Joe Simmons, Uri Simonsohn and Leif Nelson.
Something unprecedented has occurred in the last couple of decades in the social sciences. Overlaid on the usual academic incentives of tenure, advancement, grants, and prizes are the glittering rewards of celebrity, best-selling books, magazine profiles, TED talks, and TV appearances. A whole industry has grown up around marketing the surprising-yet-oddly-intuitive findings of social psychology, behavioral economics, and related fields. The success of authors who popularize academic work—Malcolm Gladwell, the Freakonomics guys, and the now-disgraced Jonah Lehrer—has stoked an enormous appetite for usable wisdom from the social sciences. And the whole ecosystem feeds on new, dramatic findings from the lab. “We are living in an age that glorifies the single study,” says Nina Strohminger, a Duke post-doc in social psychology. “It’s a folly perpetuated not just by scientists, but by academic journals, the media, granting agencies—we’re all complicit in this hunger for fast, definitive answers.”
Read the full article here.