When the Chief Medical Officer, Sally Davies, lowered the drinking guidelines for men last year, she cited a report from the Sheffield Alcohol Research Group (SARG) as supporting evidence. SARG had been commissioned by Public Health England to help define a ‘safe’ level of alcohol consumption in October 2014 after using their computer model to predict the impact of minimum pricing on several occasions in the past.
The SARG report was published on 8 January 2016, the same day as the Chief Medical Officer announced the new ‘limits’. Its authors stressed that it was not their job to recommend specific limits, but nevertheless concluded that ‘the implied weekly guidelines in this report vary between 7 and 13 units per week for males and 13 and 15 units per week for females’. These figures were significantly lower than the ‘safe level’ implied by epidemiological evidence, but they were consistent with the new advice which lowered the male guidelines from 21 units a week to 14 units a week (the female guidelines remained at 14 units).
But there is another version of the SARG report tucked away on the Department of Health website that few people have ever seen. Along with a series of e-mails released under the Freedom of Information Act, it shines a light on the process that led to the Chief Medical Officer telling the nation that there is no safe level of drinking.
A year before the guidelines were changed, a draft of the SARG report was sent to Public Health England that was very different to the final publication. For example, it contained a graph (see below) based on SARG’s model showing the relationship between the amount of alcohol consumed and the risk of alcohol-related death.
Reflecting the epidemiological evidence, mortality risk is lower for light drinkers than for teetotallers but it then rises. According to this graph, drinkers’ mortality risk rises to that of a teetotaller at 17.6 units per week for women and 21.2 units per week for men. On this analysis, a guideline of 21 units for men was appropriate and a guideline of 14 units for women was slightly over-cautious.
But that graph was never published. When the SARG report was released in January 2016, the findings had been altered and the graph now looked like this…
All of a sudden, the implied safe limit for men was barely half of that shown in the original and was now lower than the implied limit for women. The health benefits of moderate drinking, which were downplayed in the original, had almost disappeared for men and only applied at very low levels for women.
Without this volte-face from the Sheffield team, the Chief Medical Officer would have found it difficult to justify changing the guidelines. As reported in the Sunday Times yesterday, e-mails sent between SARG and government agencies strongly suggest that this change was forced on the Sheffield team by Public Health England. Now the full story can be told.
On 22 December 2014, the Sheffield team sent Public Health England the first draft of their report. Several revisions were suggested and a second draft was submitted on 14 January 2015. In an e-mail that accompanied the second draft, the Sheffield team made it clear that they did not expect to make any significant changes to their findings. Explaining that some of the team had been off sick, the author of the e-mail said that they would ‘like to go over the text again before committing to a final public version’ but that ‘[w]e do not expect to make any further changes to the numbers’. Although the team had made substantial changes to the text of the report since the first draft was reviewed, the basic conclusion had remained the same: a safe level of alcohol consumption was ‘between 12 and 21 units per week for males and 15 and 18 units per week for females’.
At this stage, it seems that SARG expected the Chief Medical Officer to keep the guidelines at 14 units for women and 21 units for men. These ‘limits’ had been criticised in the past, with Richard Smith, the former editor of the British Medical Journal who had sat on the original guidelines panel, famously claiming that they were ‘plucked out of the air’. The SARG report would give them some scientific credibility, even if only from a theoretical model. As the Sheffield team stated on page 6 of the document: ‘These implied guideline thresholds are generally similar to those in the current UK lower drinking guidelines’.
Public Health England passed the draft report onto the Guidelines Development Group (GDG) who were ultimately responsible for formulating the government’s advice. On 21 January, the GDG held a meeting at which SARG’s John Holmes and Colin Angus presented their findings. The minutes of this meeting contain the first mention of an idea that would have a profound impact on the whole project. It was suggested that SARG researchers should ‘estimate risk curves without threshold effects for wholly alcohol-attributable chronic conditions’.
To grasp the significance of this, it must be understood that researchers distinguish between diseases that are wholly caused by alcohol and those for which alcohol is only one risk factor. Alcoholic liver cirrhosis, for example, is a wholly alcohol-attributable chronic condition. You cannot get it unless you are a heavy drinker and every case of it is caused by drinking. Breast cancer, by contrast, is a partially attributable chronic condition. Although drinking increases the risk of breast cancer, a woman does not have to drink alcohol to get breast cancer and there are many other risk factors. There are relatively few chronic conditions that are 100 per cent caused by alcohol (ten are listed by SARG), but they are responsible for a large proportion of alcohol-related deaths.
It is generally accepted that there is a threshold above which a person needs to drink to put themselves at risk of these diseases. If you only have one drink a day, for example, you are at no more risk of alcohol-induced pancreatitis than a teetotaller. You have to drink above a certain threshold. This is not just common sense, it has been shown empirically.
Removing these thresholds from the Sheffield model, as the GDG suggested, was bound to make moderate drinking look more dangerous than it is. It would make it appear that there was no safe level of alcohol consumption for several of the most serious alcohol-related diseases. It would force the computer to assume that any amount of drinking caused these diseases and, therefore, that some moderate drinkers were dying of them.
Although there was no scientific justification for such a change, Public Health England followed up the idea in an e-mail to SARG on 9 February, asking the team if they were ‘able to deliver additional work to support the Guidelines Development Group’. The deadline was 11 March when the Chief Medical Officers were due to review the evidence. PHE had six amendments to the report in mind. Point 4 was ‘Threshold effects – a sensitivity analyses [sic]’. Point 5 was ‘Threshold effects – a new base case’.
A sensitivity analysis is basically an alternative scenario. When your model is heavily dependent on certain assumptions – as the Sheffield alcohol model is – it can be useful to see what happens to the results when new assumptions are fed in. Sensitivity analyses show scientists how sensitive their findings are to different scenarios. Generally speaking, if the results do not change a great deal, the model is considered robust.
Public Health England wanted SARG to do a sensitivity analysis with threshold effects removed. Although there was no obvious reason to model such an unrealistic scenario, it could, arguably, be excusable if done as an academic exercise tucked away in the appendix of the report. But point 5 was more serious. PHE were suggesting that SARG remove the assumption of a threshold from their core model (the base case) so that this unrealistic scenario dictated the study’s main findings.
This idea evidently concerned the Sheffield team. Writing back the following day, they said that ‘the first four items on the list are not a problem and can be done before 11th March for £7,800 including VAT’. But with regards to Point 5 they were ‘unclear exactly what was being requested here and why it was requested in addition to item 4 (a sensitivity analysis on threshold effects).’
Could PHE really be asking them to rip up their model and begin again from a patently false premise? Initially, SARG stood their ground, saying: ‘Our view remains that it does not seem right to assign people drinking at very low levels a risk of acquiring alcoholic liver disease and similar conditions. Unless there are strong opposing views, we think it better to keep the threshold in the base case.’
In an attempt to meet PHE half way, they instead proposed to do the following work:
‘Base case: Threshold effect for wholly attributable chronic [diseases] only
Sensitivity analysis 1: Threshold effect for wholly attributable chronic [diseases] and for all acute conditions.
Sensitivity analysis 2: No threshold effects for any condition’
Although clearly unhappy with PHE’s proposal, the agency was SARG’s sole funder for this research and the team let them know that they would capitulate if PHE were insistent, writing: ‘If you remain keen for us to change the base case, please let us know and I can quickly update the costs and timing. As noted previously, this carries some extra costs as changing the base case means updating the whole report.’
This was all the encouragement PHE needed. At 10.40pm that evening, they e-mailed back to say: ‘Thank you for your swift response. Could you provide costs/timing for changing the base case please?’
The following day, SARG replied with some quotes for the new work but were clearly still keen to dissuade Public Health England from changing the base case. Their e-mail to PHE gave the agency one last chance to change its mind. It reads, in full: ‘Please see attached a revised costing. As creating a new base case removes the need for a further sensitivity analysis on threshold effects, I have presented two options – a new base case OR a new sensitivity analysis.’
But it was obvious that PHE were not interested in a mere sensitivity analysis. They wanted to change the headline findings and one of their employees wrote back on 13 February to announce that ‘I have now secured PHE funding to proceed with option 2’. By 19 February, SARG had received a letter of intent from PHE and had started work on the new model.
Over the next few weeks, the Sheffield team complained about problems that emerged from their attempts to adjust the model to fit the new assumptions. They were doing something that they had never done before. It is unlikely that they had even contemplated doing it before. Moderate drinkers do not develop diseases such as alcoholic liver cirrhosis and they knew it. As a result of dealing with a ‘problem in adapting the pre-existing model to undertake the drinking guidelines analysis’, the 11 March deadline was missed and it was not until 25 March that SARG could provide PHE with an update. Assuring their funder that they were ‘now satisfied that we have identified and fixed the problems with the model’ SARG wrote: ‘The headline message from the new base case analysis is removing all of the threshold effects and remedying the problems with the model leads to implied guideline thresholds which are around 30% – 50% lower than those in the previous base case.’
For male drinkers, some of the thresholds had nearly halved: ‘For example, under the Canadian approach [to defining a ‘safe level’], the implied daily guideline for males in the new base case vary between 1.2 and 2.4 units per day depending on number of drinking days per week. In the old base case the equivalent figures were 2.3 to 4.5 units.’
One suspects that this was music to the ears of PHE and the GDG. The new figures not only seemed to justify the government’s existing recommendations for women but could be used as a reason to lower the guidelines for men. Nobody in the guidelines committee was likely to object to such a change. As I have previously described, the committee was packed to the gunwales with temperance campaigners.
In so far as there was dissent, it came from the Sheffield team, but they were careful to voice their misgivings in a low key. In a meeting held on 8 April, SARG’s John Holmes presented the revised model to the GDG. According to the minutes, he pointed out that the new, linear risk curves were ‘not precisely consistent with the literature’, thereby producing the peculiar result that ‘lower risk guideline levels for women were now higher than for men’. The minutes note that: ‘John Holmes felt that the overall message from the different analyses was that the new base case should not be taken as definitive.’ He also argued that ‘it would be possible for [the new] guidelines to be little different from the current ones’.
A briefing note from SARG explaining the revised findings gently attempted to nudge the GDG away from the new base case. After stating that the effect of removing threshold effects had been to ‘markedly lower the implied guideline thresholds’, SARG invited the committee to take a middle path between the original model and the one that PHE had forced upon them. Moreover, they actively discouraged the GDG from basing the guidelines on their new research. ‘The true risk function is likely to lie somewhere between these two scenarios’, they wrote, ‘and highlights the residual need for expert judgement’. In case they had not got their message across, they added: ‘There are not strong reasons for preferring the base case over these alternative analyses and this challenges the rationale for deriving guidelines directly from the results of the base case.’
These hints fell on deaf ears at Public Health England and among the guidelines committee. It appears that they had got the result they wanted. When the SARG report was published in January 2016, most of the text was identical to that of the original draft with only the numbers changing – the opposite of what SARG had expected to see happen when they submitted the draft a year earlier.
Reading the two documents side by side gives a glimpse of what might have been if SARG had stuck to their guns. For example, on page 6 of the original report, the Sheffield researchers wrote:
These implied guideline thresholds are generally similar to those in the current UK lower drinking guidelines (assuming at least three drinking days per week) and are also similar to those selected in Canada and Australia.
In the same section of the final report (page 7) this has become…
These implied guideline thresholds for males are generally lower than those in the current UK lower risk drinking guidelines (assuming at least three drinking days per week) whereas for females they are similar to the current guidelines. The implied guidelines thresholds are also lower than those selected in Canada and Australia.
In the original they say:
Assuming drinkers consume alcohol at least three times per week, implied weekly guidelines in this report vary between 12 and 21 units per week for males and 15 and 18 units per week for females.
But thanks to the dropped thresholds, this is changed in the final version to:
Assuming drinkers consume alcohol between three and five times a week, the implied weekly guidelines in this report vary between 7 and 13 units per week for males and 13 and 15 units per week for females.
When discussing the sensitivity analyses, the original report stressed how robust the model was, with different estimates being within five units of each other, except in the highly unrealistic scenario of there being no health benefits from moderate drinking. They could no longer make such a claim in the final report because their main alternative scenario (ie. their original model) produced results for men that were twice as large as those produced by the new base case.
This is the original text:
In most cases, the sensitivity analyses suggest the results are moderately sensitive to alternative assumptions. Under different analyses, implied guideline thresholds for mean weekly consumption vary by up to five units per week. A key exception is the results for the sensitivity analysis where all evidence of protective effects was removed.
And this is the published version…
For most sensitivity analyses (e.g. modelling a ten year time period, assuming lower CVD mortality rates, varying the threshold within the Australian approach) the size of variation in implied guideline thresholds from the base case is of the order of three units per week. However, for other sensitivity analyses (e.g. reintroducing threshold effects used in previous versions of SAPM, assuming no cardioprotective effects from moderate alcohol consumption) the variation in results from the base case are larger and of the order of ten units per week.
At this point in the text, SARG could not resist making the point that they had made repeatedly to Public Health England and the GDG in private:
These results suggest the base case should not be accepted uncritically as the implied guideline thresholds are sensitive to alternative assumptions and baseline data and there are not strong arguments for preferring the base case specifications over those used in the sensitivity analyses.
This was a brave statement to make in such an influential public document, although its significance was not noticed by the media at the time. The original draft had included a similar statement about there being no strong arguments for preferring the base case over the sensitivity analyses but that was when the model was only ‘moderately sensitive to alternative assumptions’ and the implied guidelines varied ‘by up to five units a week’. Now there was a huge difference between the base case and the alternative case presented in the sensitivity analysis. If the real figure was twice as high as the base case suggested – and there were ‘no strong arguments’ to think it wasn’t – the Chief Medical Officer might as well pick numbers at random.
At various stages in the report, the reader gets the impression that the authors are trying to distance themselves from their work. On more than one occasion, they stress that they would not normally program the Sheffield Alcohol Policy Model (SAPM) in the way they had. On page 28, they explain that ‘threshold effects normally included within SAPM were also removed for wholly-attributable acute and chronic conditions’, and on page 32 they add a new section, saying:
‘…previous analyses using SAPM have included threshold effects within risk functions for acute conditions and wholly-attributable chronic conditions such that risk only begins to increase above a pre-specified consumption level. At the request of the commissioners (Public Health England), this threshold effect was removed for the base case analysis’.
In another new section on page 55, they almost seem to be winking at the reader:
Although the implied guidelines thresholds presented here are lower than those within previous studies, they remain of the same order of magnitude and different assumptions examined within the sensitivity analyses, particularly the reinstatement of threshold effects used in previous versions of SAPM, bring the implied guideline thresholds close to those found elsewhere. The methodological differences described should not be overlooked as these may, in part, be responsible for both the lower estimates presented here and the general similarity of findings in terms of order of magnitude.
When reading the published report and considering the new guidelines, it should be remembered that they are built on an assumption that the Sheffield team told PHE ‘does not seem right’. When PHE commissioned SARG to produce the report (from a shortlist of one; nobody else applied) they were buying access to the computer model. Whatever the merits and flaws of that model, it had been used in alcohol research since 2008. But when it failed to produce the results needed to justify a change in the guidelines, PHE told SARG to program it in a way that it had never been programmed before, using an assumption that had no scientific basis and about which the Sheffield team had obvious, well-founded reservations.
These facts, which have only come to light as a result of Freedom of Information requests, give the lie to the idea that research funded by government is necessarily more neutral or ‘independent’ than research funded by other means. SARG were clearly not allowed to use their own judgement. Instead, Public Health England and the guidelines group leant on the Sheffield team to get a report that was more to their liking than the document they were originally presented with.
The more that we learn about the process that generated the new guidelines, the more questions are raised about Public Health England. Far from being an honest broker in this story, the agency seems to have acted more like an activist group working towards a particular conclusion. Its relationship with the anti-drink lobby, which extends to holding its Alcohol Leadership Board meetings at the offices of a temperance group, is worryingly cosy for a state agency. Its decision to appoint leading anti-alcohol campaigners such as Ian Gilmore and Katherine Brown (both of the Alcohol Health Alliance) to the guidelines committee shows that it has become politicised.
This bias was on display again at the start of this year when Public Health England published an error-strewn policy document which it released it to the media with a headline claim that was so incorrect that it had to be retracted. That report was put together by the same familiar faces who dominated the guidelines review process. The revision of those guidelines may seem a relatively minor achievement for the anti-drink lobby. You can ignore them, after all. But, as the minutes of one GDG meeting say, it is ‘important to bear in mind that, while guidelines might have limited influence on behaviour, they could be influential as a basis for Government policies’. That is why the guidelines are important and, I would suggest, it is why Public Health England went to such lengths to change them.
Duncan Selbie, the Chief Executive of Public Health England, disputes claims made in this article. He requested that we publish the following letter:
Christopher Snowdon’s piece, ‘The new drinking guidelines are based on massaged evidence’, is grossly incorrect and misrepresents Public Health England’s (PHE) role in the guidelines’ development.
PHE emphatically refutes any suggestion that we intervened in some way to influence the evidence made available by Sheffield University to an independent expert group, the Guideline Development Group (GDG), which was set up by the UK Chief Medical Officers to help develop the alcohol guidelines.
As part of the secretariat to the group, we commissioned the analysis, as requested by the GDG, from Sheffield University. Any emails from PHE to Sheffield commissioning additional modelling and evidence were based on the GDG’s decisions and at their request, as is clearly shown by the publicly available minutes of their meetings.
This has been confirmed by Sheffield University’s Alcohol Research Group, which has said:
“Minutes from the subsequent GDG meeting on 21 January 2015 state that, after hearing Sheffield’s presentation of their work, the GDG concluded: ‘A holistic, expert judgement on guideline levels would be needed, taking account of uncertainties and issues not fully modelled’. This demonstrates that the group recognised there was considerable scientific uncertainty present and that no single piece of evidence or modelling decision used in isolation would determine the final guideline.
“As noted in the Royal Statistical Society’s consultation response: “This is a contested area of science with considerable uncertainties” (paragraph 1.1). The change to the base case analyses related to a point of scientific uncertainty. The Sheffield Alcohol Research Group were happy with the decision taken whereby the base case analysis was revised but the original modelling assumptions were retained as one of a series of sensitivity analyses.
“Those analyses explored major areas of uncertainty within the underlying evidence and their implications for the Guideline Development Group’s work. The group considered those sensitivity analyses in detail and took them into account in their decision-making.”
Mr Snowden also refers to The Public Health Burden of Alcohol and the Effectiveness and Cost-Effectiveness of Alcohol Control Policies: An evidence review, which PHE published in 2016. The facts speak for themselves; this unprecedented and comprehensive evaluation of the evidence had an extensive three-stage peer-review involving UK and international academics. The abridged version was also subject to further peer-review processes before its publication by The Lancet.