Percentage error

    22 November 2014

    Medicine is a science, but not an exact one. Despite its inexactitude, however, its progress has been startling, and no sane person would want to go back to the days before anaesthetics or antiseptic surgery.

    One of the greatest advances in medicine was the discovery that most of what doctors did had no scientific justification at all. Most doctors worked on the hypothesis that if an ill person recovered after being given a medicine, he recovered because he had been given that medicine. It took a long time to realise that a thousand seeming confirmations of false reasoning did not make it true.

    Nowadays, our practice is supposed to be evidence-based, that is to say based on sound evidence, not just ex cathedra statements based on the supposed experience of a lifetime. But, having the leisure since my retirement to read the medical journals more closely, I now realise how difficult it is for doctors to decide what the evidence is. Mostly it is equivocal.

    In the first place, most scientific papers comparing new treatments with old now use such complex statistics that only professional statisticians could understand them. In other words, not more than one in a hundred doctors, probably far fewer, is qualified to understand the scientific evidence put before him. The rest must take the statistical reasoning on trust, and many surveys of medical literature have shown that most papers contain statistical errors.

    Even without this difficulty, it is possible to see that the evidence in favour of something is more equivocal than commonly supposed. All trials comparing medicines are flawed, sometimes grossly so. But recommendations potentially involving millions of people are made on their basis.

    There are many pitfalls on the path to therapeutic rationality. I can describe only a few of them, their number being so great that it is surprising progress survives them.

    The first is the concept of statistical significance. Differences in results are statistically significant when they are unlikely to have arisen by chance, but unfortunately the term ‘significance’ in everyday language has other connotations. There is no reason why a statistically significant result should be significant, that is to say non-trivial, in any other way, but that is often forgotten.

    Publication bias is another problem. Many trials are funded by drug companies which then publish only the results that are favourable to their products. Doctors are thus misled into overestimating the therapeutic value of those products. This explains why 10 per cent of the population of many western countries are now taking the nearly useless antidepressants that have been so forcefully promoted by drug companies and so universally prescribed
    by gullible doctors. Publication bias is a declining problem since, after an outcry, the results of practically all trials will now be published, but there are many other things in the journals to look out for.

    One is the size of the trial. At first sight, a huge trial involving many tens of thousands of patients is impressive, at the very least as a feat of organisation. Size, moreover, is a guarantee of the statistical robustness of the findings, whatever they may be. But if it takes huge numbers (and often complex statistical manipulations) to demonstrate the superiority of one treatment over another, it is prima facie unlikely that the superiority is very great. In other words, a mountain has given birth to a mouse. The value of penicillin, by contrast, was established on a handful of patients.

    The devil is in the detail, which few doctors have the time to examine. In one trial whose report I read recently a small number of patients died in both
    the experimental and control group. One might have expected the authors to evince some interest as to why: for if the people in the experimental group died of an illness caused by the treatment and those in the control group of an incidental cause, this would have been very significant (in the non-statistical sense). But those who died were simply excluded from the analysis.

    An opposite problem is that of all-cause death in trials of screening procedures. A screening procedure is shown to be effective in reducing the death rates from a certain disease: great rejoicing! Only later is it found that it does nothing to reduce the total or overall death rate, other diseases taking up the slack as it were.

    Papers which emphasise relative risk to the exclusion of absolute risk are common and usually suspect. This is because the halving of a trivial risk is itself trivial, however statistically significant it may be. When you have to work out the absolute risk for yourself, you know that the authors have something to hide, namely the inconsequentiality of their work.

    All of this escapes the busy working doctor. He reads the summary of the paper rather than the substance of it. A closer reading often shows that the conclusions, let alone the recommendations, do not follow from the data. It is a miracle that progress is real and any is made. Eppur si muove.