It’s easy enough to verify that something is going wrong with medical studies by simply looking up published findings on virtually any question in the field and noting how the findings contradict, sometimes sharply. To cite a few examples out of thousands, studies have found that hormone-replacement therapy is safe and effective, and also that it is dangerous and ineffective; that virtually every vitamin supplement lowers the risk of various diseases, and also that they do nothing for these diseases; that low-carb, high-fat diets are the most effective way to lose weight, and that high-carb, low-fat diets are the most effective way to lose weight; that surgery relieves back pain in most patients, and that back surgery is essentially a sham treatment; that cardiac patients fare better when someone secretly prays for them, and that secret prayer has no effect on cardiac patients. (Yes, these latter studies were undertaken by respected researchers and published in respected journals.)

Biostatisticians have studied the question of just how frequently published studies come up with wrong answers. A highly regarded researcher in this subfield of medical wrongness is John Ioannidis, who heads the Stanford Prevention Research Center, among other appointments. Using several different techniques, Ioannidis has determined that the overall wrongness rate in medicine’s top journals is about two thirds, and that estimate has been well-accepted in the medical field.

A frequent defense of this startling error rate is that the scientific process is supposed to wend its way through many wrong ideas before finally approaching truth. But that’s a complete mischaracterization of what’s going on here. Scientists might indeed be expected to come up with many mistaken explanations when investigating a disease or anything else. But these “mistakes” are supposed to come in the form of incorrect theories—that a certain drug is safe and effective for most people, that a certain type of diet is better than another for weight loss. The point of scientific studies is to determine whether a theory is right or wrong. A study that accurately finds a theory to be incorrect has arrived at a correct finding. A study that mistakenly concludes an incorrect theory is correct, or vice-versa, has arrived at a wrong finding. If scientists can’t reliably test the correctness of their theories, then science is in trouble—bad testing isn’t supposed to be part of the scientific process. Yet medical journals, as we’ve seen, are full of such unreliable findings.

Another frequent claim, especially within science journalism, is that the wrongness problems go away when reporters stick with randomized control trials (RCTs). These are the so-called gold standard of medical studies, and typically involve randomly assigning subjects to a treatment group or a non-treatment group, so that the two groups can be compared. But it isn’t true that journalistic problems stem from basing articles on studies that aren’t RCTs. Ioannidis and others have found that RCTs, too (even large ones), are plagued with inaccurate findings, if to a lesser extent. Remember that virtually every drug that gets pulled off the market when dangerous side effects emerge was proven “safe” in a large RCT. Even those studies of the effectiveness of third-party prayer were fairly large RCTs. Meanwhile, some of the best studies have not been rcts, including those that convincingly demonstrated the danger of cigarettes, and the effectiveness of seat belts.

Why do studies end up with wrong findings? In fact, there are so many distorting forces baked into the process of testing the accuracy of a medical theory, that it’s harder to explain how researchers manage to produce valid findings, aside from sheer luck. To cite just a few of these problems:

Mismeasurement To test the safety and efficacy of a drug, for example, what researchers really want to know is how thousands of people will fare long-term when taking the drug. But it would be unethical (and illegal) to give unproven drugs to thousands of people, and no one wants to wait 20 years for results. So scientists must rely on animal studies, which tend to translate poorly to humans, and on various short-cuts and indirect measurements in human studies that they hope give them a good indication of what a new drug is doing. The difficulty of setting up good human studies, and of making relevant, accurate measurements on people, plagues virtually all medical research.

David H. Freedman is a contributing editor at The Atlantic, and a consulting editor at Johns Hopkins Medicine International and at the McGill University Desautels Faculty of Management.