Sunday 15 February 2015

"Bitter row" over plain packaging studies

The Observer reports a "bitter row" in which anti-smoking campaigners are demanding the University of Zurich withdraw two studies which looked at smoking prevalence before and after plain packaging in Australia. Both studies, which were funded by Philip Morris (PMI), failed to find any effect on smoking rates from the policy.

I mentioned this research, by Ashok Kaul and Michael Wolf, last year. Although The Observer says the working papers were "widely disseminated in the media", I don't recall them getting much coverage at all and they were conspicuously ignored by Cyril Chantler when he conducted his review.

Pascal Diethelm, an anti-smoking campaigner who unveiled a not-very-scientific graph when I saw him speak in November, claims that the research contains seven errors. “Taken individually," he says, "most of them are sufficient to invalidate the findings of the papers. Collectively, they are damning.”

Kaul and Wolf have hit back strongly at this allegation with two rebuttals. In the second and most detailed of these, they conclude:

The authors of the annex have set out to discredit our research by providing a list of seven (so-called) errors and a list of seven issues. But they have clearly failed in their mission. Although there are some (minor) points of debate, there is not a single “extremely serious” error in our two working papers, as we have explained in detail in this reply.

Instead, the authors of the annex (i) have shown a surprising lack of basic statistical knowledge and (ii) have made false statements about the content of our working papers. What they have achieved to discredit, therefore, is only themselves. Perhaps this serves to explain the anonymous nature of the annex.

We welcome a constructive debate based on substance and comporting to scientific stan- dards and decorum. We regret that the authors of the annex have repeatedly overstepped the limits of scientific debate (i) by misrepresenting our approaches, methods, and find- ings; (ii) by engaging in personal attacks; and (iii) by hiding behind the cloak of anonymity. Although we firmly believe that constructive criticism will further scientific discovery and support good policy-making, we cannot accept defamatory statements and unsubstantiated attacks both on our academic institutions and on us as individuals.

You can read the whole back-and-forth here if you have the knowledge and inclination. Most of the issues are of a technical nature and go over my head, but the first of the seven supposed 'errors' is straightforward and it seems clear to me that it is not an error at all, let alone one that "invalidates the findings of the papers".

Kaul and Wolf's anonymous critics argue that absence of evidence is not conclusive evidence of absence. They complain that PMI issued a press release which said that 'The plain packaging experiment in Australia has not deterred young smokers, professors from the Department of Economics at Zurich University and the University of Saarland found in a report released today'. They also complain that BAT said the studies show that 'there has been no change in the pre-existing trend in youth or adult smoking since the introduction of plain packaging'.

They argue that Kaul and Wolf did not prove conclusively that there was no effect on smoking rates, only that their studies failed to find an effect. This might seem a pedantic distinction—in casual conversation, this is exactly how we might refer to such evidence—but it is a fair point. However, Kaul and Wolf have always been careful to make this distinction and have never oversold the findings. Their studies are very cautiously worded and they never ruled out the possibility that plain packaging had an effect that were were unable to measure.

The complaint of Kaul and Wolf's critics is essentially that other people drew a firmer conclusion from their research, but that is an absurd justification for retraction. I don't recall any demands for retraction when this study was reported with headlines such as 'Australia’s plain cigarette packaging has not given a boost to the illicit tobacco trade' and 'Cigarette plain packaging fear campaign unfounded, Victoria study finds'.

To take a more recent example, the BMJ published a study on Tuesday that was widely reported as showing that 'Alcohol has no health benefits after all' (The Times). This is not what the authors of the study said, nor is it what the study showed, but nobody would suggest it should be retracted because of inaccurate reporting by third parties. If that were the criteria, the public health literature would be very thin indeed.

I'd be interested to read comments from any readers who have the statistical qualifications to make a judgement on the other points, but if the rest of the critique is of the same standard as this I doubt Kaul and Wolf have anything to worry about. The intention seems to be to throw a little mud so that readers of The Observer—which has never mentioned the studies until now—get the impression that they have been debunked.


Christopher Snowdon said...

I don't want to spend the next 10 hours with the text, but I did read the critique. It seems to be irrelevant. Without the statistical babble, it sums up to "well, you did not find anything, but you may be wrong with that". Which is true. And irrelevant, because it's most probably right (with, obviously, a 95% chance).

All the "power" and "significance" talk is used to discredit the findings (of no effect), but if you do not find an overall effect, there probably is none, and if you re-check on monthly data and only find one effect within only a 10% error level and have even lesser effects at higher error levels in the other months, it is reasonable to assume that the effect you found was just a statistical coincidence.

What solid science would do is that the authors tested the hypothesis "there is an effect of pp on smoking" and came up with no solid evidence that this is true, which should be enough to stop claiminig it were. What the authors did was, I think, quite different, they tried to test hypothesis "there is NO effect of pp on smoking", which is a lot harder to test. They came to the result that the hypothesis is probably "true", and they did not find any evidence to the contrary..

To sum up:

- There is no conclusive evidence that PP has the desired effects
- There is no conclusive evidence against the claim that PP has no effects.
- It is therefore lunatic to claim PP has any effects. We cannot be sure about that, but we can statistically show it is less lunatic to say that PP has NO effects.
- The "critique" is scientifically irrelevant. You cannot statistically prove something true, which is what the critics demand. I assume they do that because they know that all evidence speaks to the contrary of their position.

It's like trying to prove that humans cannot survive underwater. You can, of course, drown 100 people and conclude that your hypothesis is correct, and as no one survives cou could state that with a very low error level (the authors did that). Now, if we interpolate that only .01% of the population dies by drowning naturally, maybe also only .01% can survive underwater. This would make our sample very selective and raise our error level (critique #2 and #3). Still, it would be reasonable due to the evidence we have. It would also be reasonable to start a retest by drowning another 100 people, preferrably those critical of our findings. This would give a rise to our credibility as the results match. But, as we did two tests, we get into other problems (the disjunctive grouping in critique #4). We could drown these critics, too, but that would not help with the problem; it would rise. Critique #5 does not make any sense to me, and #6 (when looking at the numbers) is like complaining about only drowning people in saltwater. Yes, it is selective. Though we cannot be sure if people drown less in freshwater without checking, right? Yes, so, let's drown these critics, too. Especially for the graphics they made; Figure 4 is dishonest as shit.

As to #7 - if you work with a 95% confidence interval, that means that out of 20 datasets, one will probably be out of your expected values. This is not really a critique. This is just not understanding statistics. But we could drown all mankind, just to be sure that our hypothesis is right...

Christopher Snowdon said...

Just out of curiosity I searched Google news archive for this 'widely disseminated' study. If they can't even get that right....

The only stuff that is widely disseminated these days is the miserabilist BS that they spout.

Christopher Snowdon said...

Thanks to Daniel for his comments.

Dan says that he read the critique. I did so too, but I also read the authors' reply:


I found the critique somewhat baffling, which is not surprising since, according to the the authors of the study, the authors of the critique don't know what they are talking about mathematically. [NB the authors of the critique decided to be anonymous, and so their qualifications were not known] However, the above reply is pretty clear, despite the tricky mathematical systems involved.

It would be far too messy to try to make sense of all the blather, so I'll pick out just one:

3.4 Issue # 4: Non standard, ad-hoc method.

Authors' response: "With one exception, all the techniques that we use are standard and implemented in any decent statistical software. The various algorithms call for a number of individual techniques used sequentially, so it does not come as a surprise that any of such algorithms is not found in a textbook in its entirety."

As an example of the meaning of the above, the authors use a footnote which reads:

"To use an analogy, consider a new recipe for cooking a food dish that only requires standard techniques (such as pan-frying and baking). Since the recipe is new, it cannot be found in any previous cookbook in its entirety, but that does not mean that it cannot be replicated by a cook without any special, professional skills
or machinery. This is opposed to a recipe that requires professional skills (such as many recipes of the famous restaurant El Buli) or professional equipment (such as an expensive sous-vide oven)"


I have on many occasions called for the intervention of 'proper' statisticians to check epidemiological papers. In particular, studies of SHS and cot death are woeful from a statistical point of view. As we have said many time before (and been totally ignored), One hundred times Relative Risk means nothing if the Risk is, for all intents, zero.