Overconfidence and fixation on statistical significance when reporting scientific research leads to problems with reproducing results, writes Steven N. Goodman MD, MHS, PhD, associate dean of clinical and translational research, and professor of medicine and health research & policy at Stanford University in Palo Alto, CA, in Science.1

Confusion over the true meaning of the P-value “creates the illusion that the P-value alone measures the credibility of a conclusion, which opens the door to the mistaken notion that the dividing line between scientifically justified and unjustified claims is set by whether the P-value has crossed the ‘bright line’ of significance, to the exclusion of external considerations like prior evidence, understanding of mechanism, or experimental design and conduct.”

Even randomized clinical trials, the gold standard for assessing drug efficacy, are not immune to the pitfalls of giving too much credence and promotion to statistically significant findings. While it’s true that randomization of treatment limits selection bias, scientists and readers should still be watchful for other forms of bias and error that may not be readily apparent in well-designed studies.


Continue Reading

Readers should look for “anything that adds uncertainty to the analysis, including uncertainty in the measurements, differences in dropout rates among arms, missing data, and how they were handled in the analysis,” recommends Dr Goodman in an interview with Cancer Therapy Advisor.

“I also look really hard at the quality and capture of adverse events, and the total sample size compared to the number of participating centers, because quality control is difficult to maintain across sites.”

These factors were relevant for determining the efficacy of pembrolizumab, an immunotherapy that targets PD-L1, as second-line therapy for metastatic non-small cell lung cancer (mNSCLC) in patients with lower PD-L1 expression, as defined by a tumor proportion score (TPS) of 1-49%.2

Pembrolizumab 2mg/kg administered intravenously every 3 weeks was granted accelerated approval by the United States Food and Drug Administration (FDA) in October 2015 for patients with mNSCLC who progressed after platinum-based chemotherapy and whose tumor expressed PD-L1 in at least 50% of sampled cells. The KEYNOTE-010 registration study evaluated overall survival (OS) benefit, contrasted to docetaxel, to fulfill the FDA requirements.

Patients with a TPS of at least 1% were randomized evenly among 3 treatment arms: pembrolizumab 2mg/kg, pembrolizumab 10mg/kg, or docetaxel 75mg/m2. Each patient was followed for survival and radiographic progression-free survival (rPFS). The randomization was stratified by ECOG performance status, world region, and PD-L1 expression (TPS ≥ 50% versus TPS 1-49%).

RELATED: Vaccines, Immunotherapy May Enhance Survival of Patients With NSCLC

Because the PD-L1 TPS stratification cutpoint of 50% was not determined until after the trial started accruing, 591 of the 1033 patients (57%) in the final analysis had a TPS of 1-49%.

The study predefined the type I error rate for all primary outcomes as P = .025, which was further divided to account for multiple analyses of OS and rPFS. For final analyses, a 1-sided P-value of less than .001 was used to demonstrate rPFS superiority by pembrolizumab over docetaxel, and P = .00825 for OS. One-sided tests will miss any effect in the opposite direction, for instance if docetaxel was more efficacious than pembrolizumab, and a 1-sided type I error rate of .025 is equivalent to a 2-sided P-value of .05.