Tuesday, November 29, 2016

Implausible, unreliable results found in leading journals (caveat emptor)

A systematic review and statistical analysis of 33 randomized controlled trials coauthored by a Japanese neurologist and published in such leading journals as the Archives of Internal Medicine, the Journal of the American Medical Association, and Neurology has found evidence of implausible, unreliable data in some or all of them.

Twelve of the papers, including the three published in Neurology, have already been retracted. The review and analysis of the 33 papers is being published online in the November 9 issue of
Neurology. Spurred by questions raised in journal correspondence regarding some of the papers, a group led by a New Zealand internist used advanced statistical methods to show that data in the papers failed by a wide margin to follow the degree of random variation in baseline and control-group variables expected.

Furthermore, the systematic analysis found, “There were multiple examples of inconsistencies between and within trials, errors in reported data, misleading text, duplicated data and text, and uncertainties about ethical oversight.”

The papers date from as far back as 1997 to as recently as 2012. Some have been cited over 300 times. They examine the effects of supplements and drugs (including alendronate and risedronate) on bone strength and function in elderly patients diagnosed with stroke, Parkinson's disease, Alzheimer's disease, and amyotrophic lateral sclerosis.

The first author or coauthor of all 33 papers is Yoshihiro Sato, MD, PhD, a neurologist at Mitate Hospital in Japan. He has taken responsibility for the 12 articles retracted so far, conceding that they are “fraudulent.” In July, Neurology reported that in response to the journal's request for an explanation on one of the papers, Dr. Sato had said that he “accepts full responsibility for this fraudulent paper and maintains that none of the coauthors participated in any misconduct and appeared as authors on an honorary basis only.”

In response to an emailed request for comment from Neurology Today, Dr. Sato stated that 12 of the 21 papers not previously retracted are “valid.” Dr. Sato has published seven additional papers since 2013, beyond the span covered by the review, and many other reports with other study designs before 2013.

The questionable papers by Dr. Sato were identified in the course of conducting systematic reviews in osteoporosis. Dr. Sato's group stood out for having published a large number of randomized controlled trials that “collectively have substantially influenced relevant systematic reviews,” the paper stated.

Initial concerns had been raised in journal correspondence regarding some of the papers. For instance, a 2003 paper in the Journal of Neurology, Neurosurgery and Psychiatry about the efficacy of methylprednisolone pulse therapy on neuroleptic malignant syndrome in Parkinson's disease prompted a letter from a British neurologist the following year.

“I was astonished to find that Dr. Sato and colleagues were able to identify 40 cases of neuroleptic malignant syndrome in patients with Parkinson's disease from a single institution over three years,” wrote Carl E. Clarke, MD, professor of clinical neurology at the University of Birmingham in the United Kingdom.

“At a recent neurosciences grand round in Birmingham, which has an interest in Parkinson's disease research, we could only recall two such cases in living memory,” Dr. Clarke wrote, noting that Dr. Sato and colleagues defended the high numbers as being drawn from a tertiary care hospital. The Neurology paper's first author, Mark J. Bolland, MBChB, PhD, associate professor of medicine at the University of Auckland, New Zealand, told Neurology Today that his group concluded that the data in Dr. Sato's papers were unreliable for two reasons.
                     
“First, we used different statistical approaches that showed that the baseline data from the treatment groups presented in the papers were much more similar than expected if the groups had been formed by chance,” Dr. Bolland said in an email. “In other words, the likelihood that randomization resulted in the very similar treatment groups was very low. Second, we identified a very large number of concerns about the trials, including fairly incredible productivity and recruitment rates, implausibly positive outcome data, concerns about ethical oversight, plagiarism, and many logical and other errors in the papers.”…        
                                      
Although most of the techniques used by Dr. Bolland's group required analysis of many papers at once, some involved nothing more than close examination of the text for consistency, and could be applied to single papers. For instance, in one study, “participants were eligible for inclusion if they had sustained a stroke at least 3 months before the study began, but the mean duration of illness at baseline in both randomized groups was 90 days, or slightly less than 3 months, which appears implausible.”…

“Peer review, being a human process, is not perfect,” he said. “I wanted to alert our readership that we are on guard as much as possible, and to correct the literature.”

Asked to respond to Dr. Sato's assertion that 12 of the 33 papers are “valid,” Dr. Gross said: “I have no way of knowing if he's correct or not. The problem of course is when someone admits to fraudulent behavior, it's difficult to take at face value that some other papers are okay. I know what he said about the three papers he retracted in Neurology, and he wasn't willing to go much farther than that with us.”

Dr. Bolland's paper is careful not to call the 33 papers “fraudulent.”

“The statistical techniques we used cannot prove data are made-up,” he said. “Instead they assess how consistent the distribution of data is with the expected distribution. But made-up data is one reason why the distributions of data might be inconsistent with expectations.”…

Clifford Saper, MD, PhD, FAAN, the editor-in-chief of Annals of Neurology, said he found it especially concerning that the 33 papers had been published over such a long time span.

“There could be any number of cases out there that haven't yet been detected,” he said. “The problem is we don't know the percentage of people who get caught."

Even so, he said that he does occasionally see papers submitted in which the data look too good to be true.“When the error bars are too small and the statistics are too good, we get concerned,” Dr. Saper said. “It's really important that reviewers work as hard as they can on the statistics. If they have any questions at all, the editor's responsibility is to have a statistical consultant available.”

http://journals.lww.com/neurotodayonline/Fulltext/2016/11170/Evidence_of_Implausible,__Unreliable__Results.1.aspx



1 comment:

  1. Mark J. Bolland, Alison Avenell, Greg D. Gamble, Andrew Grey. Systematic review and statistical analysis of the integrity of 33 randomized controlled trials. Neurology. In press. http://neurology.org/lookup/doi/10.1212/WNL.0000000000003387.

    ABSTRACT

    Background: Statistical techniques can investigate data integrity in randomized controlled trials (RCTs). We systematically reviewed and analyzed all human RCTs undertaken by a group of researchers, about which concerns have been raised.

    Methods: We compared observed distributions of p values for between-groups differences in baseline variables, for standardized sample means for continuous baseline variables, and for differences in treatment group participant numbers with the expected distributions. We assessed productivity, recruitment rates, outcome data, textual consistency, and ethical oversight.

    Results: The researchers were remarkably productive, publishing 33 RCTs over 15 years involving large numbers of older patients with substantial comorbidity, recruited over very short periods. Treatment groups were improbably similar. The distribution of p values for differences in baseline characteristics differed markedly from the expected uniform distribution (p = 5.2 × 10−82). The distribution of standardized sample means for baseline continuous variables and the differences between participant numbers in randomized groups also differed markedly from the expected distributions (p = 4.3 × 10−4, p = 1.5 × 10−5, respectively). Outcomes were remarkably positive, with very low mortality and study withdrawals despite substantial comorbidity. There were very large reductions in hip fracture incidence, regardless of intervention (relative risk 0.22, 95% confidence interval 0.15–0.31, p < 0.0001, range of relative risk 0.10–0.33), that greatly exceed those reported in meta-analyses of other trials. There were multiple examples of inconsistencies between and within trials, errors in reported data, misleading text, duplicated data and text, and uncertainties about ethical oversight.

    Conclusions: A systematic approach using statistical techniques to assess randomization outcomes can evaluate data integrity, in this case suggesting these RCT results may be unreliable.

    ReplyDelete