INTRODUCTION

The Tobacco Products Directive (TPD) 2014/40/EU defines enhanced reporting obligations applying to 15 priority additives added to cigarettes and roll-your-own (RYO) tobacco: carob bean, cocoa, diacetyl, fenugreek, fig, geraniol, glycerol, guaiacol, guar gum, licorice, maltol, menthol, propylene glycol, sorbitol, and titanium dioxide (TiO2)1,2. For these priority additives, comprehensive studies had to be carried out by the tobacco industry, to examine whether the additives contribute to or increase the toxicity and addictiveness of the products, result in a characterizing flavor, facilitate inhalation or nicotine uptake, or lead to the formation of substances with CMR properties.

To meet these reporting obligations, 14 reports were submitted to the European Union Member State (EU MS) regulators under the umbrella of a Priority Additives Tobacco Consortium of 12 international tobacco companies3. No report was provided for diacetyl. The submitted reports contain the results of literature searches, smoke chemistry assessments, toxicity studies, a human clinical study, and a sensory assessment. A synthesis of these reports has also been published in three journal articles4-6.

The TPD states that the Commission and the MS may require these reports to be peer-reviewed by an independent scientific body, particularly regarding their comprehensiveness, methodology and conclusions. In line with this, an independent review panel consisting of 10 scientific experts in various relevant fields was established under the Joint Action on Tobacco Control (JATC). This panel worked together with JATC partner institutes under work package WP9 (‘Additives Subject to Enhanced Reporting Obligations’) to review the industry reports. The evaluation performed by the panel was based on a scientific perspective without legal expertise. To facilitate the review process, the review panel developed an assessment framework for the chemical and toxicological evaluation of the industry reports (Supplementary file).

The complete work of the review panel is published as a report7. In addition, the key findings are now presented in two peer-reviewed publications. The current article describes the identified methodological shortcomings of the industry reports, while the other (part A, Havermans et al.8) summarizes the general outcomes and conclusions of our review and the specific recommendations for the 15 additives.

METHODOLOGICAL APPROACH

Ambiguity of the TPD

During the review of the industry reports, it became clear that the presented interpretation of the TPD differs from the interpretation of the review panel. This appears to be due to the ambiguity of the wording in Article 6.2.a and the conflicting content of Articles 6.2.a and 7.9.

Article 6.2.a states that the industry reports need to assess whether an additive: 1) contributes to the toxicity or addictiveness of the products concerned, and 2) whether this has the effect of increasing the toxicity or addictiveness of any of the products concerned to a significant or measurable degree at the stage of consumption. We interpret these as two independent sentences that require evidence for both: 1) the additive contributing to the endpoint studied, and 2) the additive increasing the effect size of the endpoint studied. In contrast, the industry interpreted the second point as a specification and only presented evidence for the additives increasing the effect size of the endpoint studied. Based on discussions within the panel and with legal experts, it was not possible to conclude on the correct interpretation. Thus, a revision of the phrasing of Article 6.2.a seems to be required to remove any ambiguities.

Article 7.9 requires regulatory actions by the EU MS if the additives ‘increase the toxic or addictive effect, or the CMR properties of a tobacco product at the stage of consumption to a significant or measurable degree’. Thus, any regulatory actions of the EU MS regarding toxicity or addictiveness can only be based on provided data regarding the second part of Article 6.2.a. The only available experimental approach for toxicity to assess an increase is the comparative testing approach. However, comparative testing lacks discriminative power in the case of products with high toxicity, such as cigarettes, as pointed out in the SCHEER opinion9,10. Moreover, although it is possible to derive information regarding an additive’s contribution to addictiveness, e.g. based on assessments of the mode of action (MOA), no tests are available for assessing whether tobacco additives increase the addictiveness to a significant and measurable degree. This is also acknowledged by SCHEER9,10. Thus, there are currently no established methods available to provide the evidence required for regulatory action regarding toxicity and addictiveness based on Article 7.9. Therefore, a revision of the phrasing of Article 7.9 of the TPD is required to reflect the limitations of the current scientific methodology.

Article 6 describes the required content of the industry reports and the role of an independent review panel, and states that the information received from the review panel ‘shall assist the Commission and Member States in taking the decisions pursuant to Article 7’. Our interpretation of this statement is that the work of both the industry and the review panel should be based on Article 6, and not on Article 7. However, the industry chose to use Article 7 as an argument for their comparative testing approach and only addressed the second part of Article 6.2.a. That is, they only assessed whether the priority additives increased the effect size of the endpoint in question to a significant degree, instead of also assessing whether it contributes to the effect.

Overview of the content of the industry reports

The industry reports are based on a literature review and a set of laboratory studies and generally include the following sections:

  • A literature review covering toxicity of the additive itself, addictiveness, facilitation of inhalation, pyrolysis, MSS chemistry (including transfer rates in the case of volatile additives), and toxicity of the additive when used as a tobacco additive (testing of MSS).

  • Smoke chemistry studies analyzing the World Health Organization (WHO) list of 39 priority emissions, plus tar and water, in mainstream cigarette smoke using the ISO smoking regime11. A comparative chemical analysis was performed for test cigarettes containing three additive levels (low, max, and max-plus) and an additive-free control cigarette. In addition, three mixed cigarette batches with different mixtures of the priority additives were produced and tested.

  • In vitro toxicology testing in terms of Ames test using total particulate matter (TPM) for mutagenicity, neutral red uptake in the Chinese hamster ovary (CHO) cell line (using TPM and gas vapor phase) for cytotoxicity, and the micronucleus assay (using TPM) for genotoxicity. As for the chemical analysis, comparative testing based on these in vitro assays was performed for the three different test cigarettes, the mix cigarettes, and an additive-free control cigarette.

  • A clinical study to assess the effect of the additives on the facilitation of inhalation and nicotine uptake. This was a controlled, double-blind study using a randomized crossover incomplete block design.

  • A sensory study to determine whether priority additives give cigarettes a characterizing flavor other than tobacco. A step-wise procedure comprising different sensory methodologies [clustering, ‘in/out’ test, and CATA (check all that applies) testing] was used in this study.

  • A concluding section providing a summary of the literature review covering chemistry and toxicity, as well as the main findings of each of the four laboratory studies (smoke chemistry, in vitro toxicity, clinical, and sensory).

LIMITATIONS OF THE INDUSTRY REPORTS

Limitations of the overall approach

Lack of integrated discussion

The enhanced reporting obligations for priority additives specified in the TPD require the industry to carry out comprehensive studies. According to the Oxford Dictionary, the meaning of comprehensive is ‘including or dealing with all or nearly all elements or aspects of something’. Thus, comprehensive studies of e.g. toxicity, addictiveness or characterizing flavor, should address a wide range of aspects for each topic. A comprehensive evaluation of a priority additive would require a discussion of these different aspects within each topic (e.g. toxicity), as well as an overall integrated discussion. However, the results from the newly performed laboratory studies are not discussed in light of the literature review in the industry reports. Moreover, the concluding section only contains separate summaries of the main findings from the different sections of the reports rather than an integrated discussion. For instance, in the evaluation of toxicity, the results from the comparative analysis (own data), the pyrolysis experiments (literature) and chemical analysis (literature and own data) are only presented separately and not discussed relative to each other.

Comparative testing approach

We identified several limitations in the overall approach and study design applied by the industry in their reports. Some of these shortcomings seem to arise from the industry’s interpretation of the TPD, as described above. A consequence of the industry’s choice to rely on Article 7 is that the first part of Article 6.2.a is not addressed in the industry reports; whether the priority additives contribute to toxicity of the additive. Thus, the submitted reports are mainly based on comparative testing and do not cover the assessment of inhalation toxicity of the additives, new pyrolysis experiments, or a toxicological evaluation of the identified pyrolysis products. In our opinion, these data are required to fulfil the reporting obligations specified in Article 6 in terms of the additive’s contribution to toxicity. This is a significant limitation in the approach chosen by the industry.

Lack of attractiveness assessment

Several of the priority additives and/or their pyrolysis products are known flavorings and/or sweeteners (including carob bean, cocoa, fenugreek, fig, guaiacol and licorice, but also diacetyl, geraniol, guar gum, menthol, and sorbitol)12. Moreover, priority additives may increase humidity and/or palatability (e.g. propylene glycol, glycerol, sorbitol, menthol, and geraniol)12. Thus, priority additives may increase a product’s attractiveness in several ways. Generally, additives that increase attractiveness may lead to brand preference or increased tobacco product consumption13,14. The industry reports do not contain any information about the potential increase of attractiveness of cigarettes and RYO due to the addition of the priority additives. Even though a quantitative assessment is not required, the TPD recognizes the concern of a tobacco product’s attractiveness in the introduction and Article 19.1 (a). For a comprehensive assessment of priority additives, attractiveness should be included. In our opinion, such an assessment should be required by the TPD and included in Article 7 to allow for regulatory action.

Deviations from SCHEER recommendations

In 2016, the SCHEER proposed a guidance and a template for the drafting of priority additive reports9,10. Overall, the industry reports do not follow these available SCHEER guidelines. That is, the industry consortium followed some of the recommendations (e.g. use of WHO list for chemical assessment instead of the outdated Hoffmann list, choice of in vitro tests), while several other recommendations were dismissed without any explanation (e.g. recommended molecular level approach for chemicals (quantitative structure-activity relationship; QSAR) was not applied, no new pyrolysis studies were performed, a comparative testing approach was applied for toxicity, even though it lacks discriminative power for testing of cigarettes. Some of the inconsistencies between the industry’s approach and the SCHEER recommendations are described below.

Literature reviews

The industry reports generally present two literature reviews: one non-systematic literature review regarding toxicity and addictiveness of the additive in general and a systematic literature review performed by a contract partner regarding the additive’s toxicological effects when used in tobacco. No argument is presented explaining why only toxicity and/or addictiveness were covered in these literature reviews, while other outcomes such as inhalation facilitation and characterizing flavor were omitted.

Although the strategy applied for the systematic literature review is provided in the appendix (databases queried, review equations, keywords used, exclusion criteria, etc.), both reviews seem biased. In the non-systematic review, few independent studies are presented, although our literature searches identified several relevant independent studies. For instance, critical publications concerning menthol’s intrinsic properties that facilitate inhalation are not mentioned in the non-systematic review (see Havermans et al.8 for details). Moreover, no argument was presented explaining why this literature review was non-systematic and omitted several relevant publications.

The exclusion criteria used by the industry in their systematic literature review are listed in one of the appendices of the industry reports but not described in sufficient detail. We also question the relevance of some of the listed criteria. For instance, the criterion ‘additive not contained in cigarettes or RYO tobacco’ leads to the exclusion of mechanistic studies concerning cellular or molecular mechanisms of the additive, the additives effect on nicotine uptake, as well as studies concerning inhalation toxicity by the additive per se (see examples in Havermans et al.8). Whether and how the quality of the studies to be included in the systematic literature review was assessed is not reported, and a risk of bias assessment is not included either.

Overall, we conclude that the literature reviews are biased and incomplete. The literature review that covers the effects of additives per se is non-systematic, and since our literature searches identified several relevant independent studies that were not included, they were not comprehensive either. The systematic review is only focused on the toxicity of the additives when added to tobacco and consequently excludes relevant literature concerning additives per se. Therefore, important information is not considered in the discussion in the industry reports. Altogether, the reported literature does not cover all relevant areas of research. For instance, inhalation toxicity of the additives and their pyrolysis products is not sufficiently addressed in the literature review. These aspects significantly limit the usefulness of the literature reviews for the industry’s evaluation of additives and represent a major limitation of the submitted reports.

Comparative testing experiments

As was discussed, the main critique against comparative testing is that it lacks discriminative power9,10. In addition, we identified further methodological limitations in the industry reports for both the chemistry and toxicity comparative testing.

Composition of cigarettes and reference products

The presence of humectants varied between the cigarettes used by the industry for control and test cigarettes, as well as reference cigarettes for the benchmark criterion (Figure 1). The impact of humectants on combustion conditions and MSS chemistry was not considered in the industry’s experimental design (Figure 1). Humectants are technically necessary and present in all commercially available cigarettes. Their purpose is to keep the humidity of the tobacco product, retaining water and avoiding the generation of an unpleasant harsh smoke14. Humectants greatly influence combustion conditions and consequently the emission of most compounds, including significantly decreased levels of organics like phenol, cresol and formaldehyde15,16.

Figure 1

Overview of the properties of humectants and the presence and absence of humectants in the test and reference cigarettes

https://www.tobaccopreventioncessation.com/f/fulltexts/150361/TPC-8-28-g001_min.jpg

In the comparative testing, no humectant was added to the control cigarettes. However, in the statistical analysis, the difference between mean values of test cigarettes and additive-free control cigarettes was compared to the historical variability of the reference cigarette 3R4F containing humectants (see description of statistical analysis)17. Since humectants affect the combustion conditions and content of organics in MSS significantly, the variability could differ between data originating from cigarettes with humectants (historical 3R4F variability) and without humectants (control and test cigarette variability). Thus, these variability data should be presented to justify the use of the variability of the reference cigarette 3R4F in the statistical analysis of control and test cigarettes. Moreover, the role of the humectants on the combustion process is not included in the discussion of the results, and neither is their impact on MSS chemical composition and toxicity15.

Smoke generation

In the comparative experiments reported by the industry, smoke was generated using the ISO method, which is a standard developed with the tobacco industry’s involvement. In the last two decades, there has been discussion within the WHO and independent research institutes on the ISO method’s relevance for how smokers use the cigarette in real life and whether the MSS acquired through this method is comparable to what smokers are exposed to.

The current consensus is that the ISO method underestimates compounds present in MSS compared to actual human smoking behavior. There are several reasons for this; firstly, the low puff frequency and puff volume leads to smaller amounts of smoke ‘inhaled’ from one cigarette. Secondly, the low intensity in the smoking regime might result in lower temperatures leading to the generation of lower amounts of compounds in MSS. Finally, ventilation holes in cigarette filters allow the influx of ambient air, diluting the smoke, whereas smokers tend to cover these holes with their lips and fingers (unintentionally). Thus, in real life, greater quantities of harmful substances end up in the smoke due to more intense smoking and closing of the ventilation holes6.

Another smoke generation method was developed by Health Canada and validated and recommended by WHO. It represents more intense puffing and combustion conditions and considers the covering of ventilation holes. This method produces higher emission levels, which are closer to human exposure, although still not fully representative18. The industry does not explain why an intense smoking regime was not included for comparison with the ISO regime. In our opinion, data resulting from both ISO and WHO intense smoking regimes would be needed for a comprehensive risk assessment.

Statistical approach

The industry used the same statistical approach to analyze the chemical and toxicological comparative testing. First, a statistical equivalence approach was applied using historical variability of the 3R4F reference cigarette as a benchmark (i.e. values generated over an extended period, generally at least 12 months2). The variability in the mean values of test cigarettes (low, max, max-plus) and additive-free control cigarettes was compared to this benchmark to determine significant differences. Accordingly, differences between test and control cigarettes were not considered relevant when smaller than the historical variability of the 3R4F reference cigarette. When the variability of the 3R4F reference cigarette was exceeded, three additional statistical tests were applied [analysis of variance (ANOVA), Dunnett’s and linear trend test]. Only findings that passed all the individual significance tests were considered relevant by the industry.

In the benchmark approach, the industry reports used a 99.7% confidence interval, which reflects allowing a 0.3% chance of false positives, i.e. detecting a significant difference when there is not an actual difference. However, 95% confidence intervals are typically used in the scientific literature. Specifying confidence in the variability of the 3R4F reference cigarette to 99% leads to a 1.5-fold wider range [3 standard deviations (3 SD)] than a 95% confidence requirement (2 SD). Consequently, application of the 99% confidence benchmark allows larger differences between test and control cigarettes to fall into this range and subsequently to be regarded as non-significant. This can lead to false-negative results, i.e. not detecting a significant difference where there is an actual difference.

For the analysis of the cigarette smoke composition and the smoke toxicity analysis, data were compared to a historical variation over a more extended period (the industry report notes ‘time period: 2013–2015’), based on either the ISO (chemical composition) or Canadian intense (toxicity analysis) regime. Thus, the historical variation is based on a different smoking regime than used in the study itself (ISO) for the toxicity data. Moreover, due to comparison with historical variability, any deviations from the additive-free control cigarette are less likely to be statistically significant since historical data variation will lead to much higher variation than can be expected within the study itself19. Examples of factors that are likely to contribute to higher variability over time include batch variations in chemicals, altered instrument performance and differences between laboratory staff.

In the subsequent statistical evaluation, three additional tests were performed. The mean concentrations for each of the three test cigarettes and the additive-free control cigarette were compared using analysis of variance (ANOVA) with significance evaluated at p=0.05. If the ANOVA showed a statistically significant effect, mean analyte concentrations among the three test cigarettes were compared to the control cigarette using the Dunnett’s test (with a family-wise error rate of α=0.1). This was followed by a linear trend analysis for a consistent additive-concentration related decrease or increase. Only if statistical significance was found for all three of these tests (as a follow-up to exceeding the 3R4F variation), differences were considered significant. This approach can lead to false-negative results if a significant outcome from one test is disregarded when the outcome from another test is non-significant.

The aspects of the statistical analysis described above (application of a 99.7% confidence interval, comparison with historic reference data and requirement for results to pass several tests) all increase the chance of false-negative results. Thus, the industry’s statistical approach appears to be in favor of discovering null findings. This is a major methodological limitation.

Chemical assessment

In addition to the methodological concerns regarding the comparative testing in general, several problems were identified regarding the chemical assessment.

Transfer experiments

Transfer into MSS was generally not determined in industry experiments for compounds described to have a “complex chemical composition and non-volatile nature”. However, for cocoa (which is also complex and non-volatile) the transfer rate of theobromine was determined since this is considered to be the predominant biochemically active compound of cocoa. Thus, the selection of priority additives for which the transfer into MSS was determined appears to be somewhat random. Assessment of the transfer of active compounds of other complex and non-volatile additives should have been included in the industry reports as well.

Pyrolysis products

The industry’s evaluation of pyrolysis experiments is mainly based on two publications from 2005 by Baker and Bishop20. In these pyrolysis experiments, besides the five most abundant pyrolysis products, the inclusion of hazardous components was based on the Hoffmann list, developed in the 1990s21-23. However, a more updated list of compounds, like the one developed by Talhout et al.24, is required since it also includes compounds that affect respiratory and cardiovascular endpoints in addition to carcinogens. Moreover, the experimental conditions in Baker and Bishop20 do not align fully with the approach proposed in the SCHEER opinion, e.g. in terms of the pyrolysis conditions, the applied list of analytes and the number of experimental parallels (triplicates rather than duplicates).

While SCHEER highlights the importance of evaluating the pyrolysis products of the priority additives and requests new pyrolysis experiments, the industry argues against using data from pyrolysis experiments in general3. They point out that pyrolysis experiments can only approximate the combustion of a burning cigarette in the case of non-volatile additives and may not realistically reflect thermic decomposition during smoking. The industry further states that there is no correlation between the results of pyrolysis experiments and smoke chemistry (other than for volatile compounds that have been added in small amounts). Based on the study of Hahn and Schaub25, they conclude that ‘to assess how tobacco additives influence the quantitative levels of toxic substances in whole smoke, i.e. mainstream smoke, the pyrolysis of additives has been deemed not suitable as an assessment criterion’. However, the authors of that study state that ‘pyrolysis of additives itself is not sufficient as an assessment criterion’, but should be part of their suggested four-step model for toxicological assessment of tobacco additives, including a toxicological evaluation of pyrolysis products as the second step25. Although we agree with the industry that pyrolysis experiments cannot predict MSS chemistry, these experiments still provide useful information about compounds that may be found in MSS. Thus, an evaluation of pyrolysis products in accordance with the SCHEER opinion should have been included in the industry reports.

The industry reports do not include an evaluation of the potential impact of pyrolysis products on toxicity (including CMR properties), addictiveness, inhalation facilitation or flavoring, as suggested by SCHEER. To at least assess one aspect of the impact of the pyrolysis products, we performed an independent evaluation of the CMR properties of the pyrolysis products presented by the industry20. This evaluation is given in Annex III in our report7 and included in our evaluation of the individual additives presented in Havermans et al.8 (2022).

List of analytes in comparative experiments

In accordance with the SCHEER recommendations, the analytes used for the comparative chemical analyses in the industry reports were based on the WHO list of 39 priority emissions (plus TPM, tar, and water). However, this analyte list has been proposed for regulatory purposes and for monitoring of toxicants in tobacco products over time26. To rule out priority additives’ contribution to toxicity and formation of CMR substances, a more elaborate list of analytes would be more appropriate, such as the list provided by Talhout et al.24. In addition, previously identified pyrolysis products of each additive based on the literature review were not added to the analyte list. This is a significant shortcoming, since assessing pyrolysis products in the comparative chemical analysis of MSS is required to evaluate their toxicity. Moreover, several of these previously identified pyrolysis products have been classified as compounds with CMR properties (e.g. furfural), as described in Annex III of our report7.

Data quality of chemical analysis

The quality of provided data on MSS composition varies throughout the industry reports for the 14 priority additives, limiting their usefulness. This is the most apparent for analysis of carbonyl compounds. The presented comparative MSS experiments were performed in two separate sets. The first set consisting of carob bean, cocoa, fenugreek, fig, glycerol, licorice and menthol seems to have been performed under relatively acceptable conditions, resulting in standard deviations of approximately 10% (Figure 2a). The second set, consisting of geraniol, guaiacol, guar gum, maltol, propylene glycol and sorbitol, resulted in 3–4 times higher standard deviations for carbonyl compounds (Figure 2b). The example provided in Figure 2 demonstrates that the variability in both additive-free control and test cigarettes was much higher for geraniol than for carob bean. In fact, for the same control cigarette that was measured in both sets, the standard deviations were reported to range from 7–14% for set 1, and 33–55% for set 2. This implies inconsistencies in the laboratory procedures. Interestingly, previous studies from both industry and independent researchers achieved low standard deviations of approximately 10% in levels of emitted carbonyls27-31.

Figure 2

Example of acceptable and high standard deviations in control (additive-free reference) cigarettes

https://www.tobaccopreventioncessation.com/f/fulltexts/150361/TPC-8-28-g002_min.jpg

Overall, the data resulting from the second experimental series showed large variability and do not appear suitable for regular statistical analysis. To allow for the extraction of useful information from these data, despite the poor data quality, we performed an independent stepwise evaluation of the carbonyl data. This included evaluation of: 1) the overlap of the standard deviations for test and control cigarettes, 2) increases with increasing application levels of the additive in the test cigarette, and 3) plausibility of the observed effect, which took the absolute application level of the additive into account (e.g. higher application levels are more likely to significantly increase carbonyl compounds that are formed during the pyrolysis of the additive)7. Based on this evaluation, we raise concerns regarding carbonyl formation resulting from the application of guar gum and sorbitol (Havermans et al.8).

Toxicological assessment

In addition to the methodological concerns regarding the comparative testing in general, several problems were identified regarding the toxicological assessment.

Pyrolysis product toxicity

A toxicological assessment of the pyrolysis products reported by the industry is not included in their reports. However, our independent evaluation identified several compounds with CMR properties among these pyrolysis products (Havermans et al.8). Pyrolysis products with CMR properties will contribute to the toxicity of the cigarettes if they are present in MSS. Therefore, verification of their presence is required for a meaningful risk-assessment. However, previously identified pyrolysis products were not added to the analyte list in the comparative chemical experiments. Moreover, a complete evaluation of the toxicity of the already identified pyrolysis products is warranted, as well as identification and evaluation of novel pyrolysis products based on a more updated analyte list than the Hoffmann list.

Selection of assays and endpoints

For two of the three in vitro assays used in the comparative testing, cell lines originating from Chinese hamsters were used, while the last assay uses bacteria (Ames test). For the cell line assays, the main parameters required to assess the adequacy of the model for the endpoint tested were not presented nor discussed. For instance, the differences between these two models, concerning relevance, tissue of origin, and sensitivity are not discussed. Neither are the consistencies or discrepancies in results obtained by these different models32. We also question whether the in vitro models applied are sufficient for extrapolating the data to the human situation. Moreover, the overall approach used by the industry is far from the recommendations of SCHEER, for example, SCHEER recommended to also apply in silico methods like QSAR, mode of action (MOA), and adverse outcome pathways (AOP) approaches, which were not considered.

Finally, the presented toxicity data focus solely on CMR properties. Although this is understandable since carcinogenic effects are the best-characterized adverse effects of smoking cigarettes, other adverse effects have also been reported to be associated with cigarette smoking and can interfere with or facilitate CMR properties. In particular, irritation is a known favoring factor of carcinogenicity. Thus, other relevant adverse effects induced by additives, such as irritation, sensitization and cardiovascular effects, should also have been included in the reports.

Comparative toxicity testing

The main limitation of the comparative testing approach is that these studies lack discriminative power due to the high background toxicity of tobacco products9,10. Another limitation of the comparative toxicity testing performed by the industry is that only in vitro tests were included. Although the selection of in vitro tests was in line with the recommendations from SCHEER9,10, they are of limited value in the assessment of CMR properties as in vivo testing is currently unavoidable to establish certain aspects of CMR properties, in particular for non-genotoxic carcinogens. However, to achieve a sufficient discriminative power with comparative testing, a very large number of animals would be required. Moreover, the application of animal testing for tobacco products is considered unethical. It is prohibited in some countries, e.g. in Germany, and by the EU policy that bans animal studies for chemicals to be used in voluntary products12,33. Based on the scientific literature and our shared knowledge, there is currently no suitable scientific method for assessing the increased effect size of an additive on toxicity as requested by the TPD. Thus, a revision of the TPD may be required. In such a revision, the possible use of assessment methodologies that do not necessarily require animal studies, such as MOA and AOP, should be considered.

Inhalation toxicity is not assessed

Toxicity is mostly evaluated based on existing data for the oral route. Although the industry reports include some information regarding the toxicity via inhalation for some additives (glycerol, propylene glycol and TiO2), they lack a comprehensive evaluation of the inhalation toxicity of the additives per se and their pyrolysis products. Since tobacco smoke is inhaled, experiments and assessments specific for inhalation should be included in a toxicological evaluation of the consequences of the inclusion of the additive in the tobacco (see the assessment framework for toxicity outlined in Supplementary file). Likewise, assessment of the additives’ pyrolysis products with regard to inhalation toxicity is essential for a comprehensive toxicity evaluation. It is also pointed out in Article 6.3 of the TPD that ‘Those studies shall take into account the intended use of the products concerned’, implying the importance of the inhalation route, ‘and examine in particular the emissions resulting from the combustion process involving the additive concerned’, acknowledging the importance of including the pyrolysis products in the evaluation. The lack of evaluation of the inhalation toxicity of the additives per se and their pyrolysis products is a major shortcoming of the industry reports.

Lack of exposure assessment

The application levels and transfer rates of additives are presented in most reports but not applied in the toxicological evaluation. In fact, there is no exposure assessment included for the priority additives at all in the industry reports, although this is required according to standard risk assessment procedures34. Another critical factor that is not considered is that mixtures of compounds may produce additive and synergistic toxicity at concentrations where the individual components are of lower concern.

Inhalation facilitation, nicotine uptake, and addictiveness

The industry performed a clinical study measuring plasma nicotine pharmacokinetics as a measure of nicotine uptake and smoker puffing behavior as a measure of cigarette smoke inhalation. In this study, diacetyl and TiO2 were not included, geraniol, guar gum, and maltol were only studied in a mixture. In contrast, the 10 remaining priority additives were analyzed both as a single additive and in a mixture. However, a suitable rationale for studying the three additives only in a mixture was not provided.

The industry concluded that none of the additives facilitated nicotine uptake or altered smoking behavior in their assessment. However, this conclusion is not in line with the substantial evidence from the literature for the effect of menthol on inhalation facilitation (Havermans et al.8), which is not discussed in the industry report either. Moreover, the clinical study assessing nicotine uptake and puffing behavior had methodological limitations and a suboptimal study design. For instance, the combination of high variability in the participant’s dependence (i.e. Fagerström scores from 1 to 9) and 4 hours between cigarettes could have caused no to very strong craving and withdrawal, which may have contributed to unnecessary high variability in the nicotine levels. In addition, no statistical assessment (beyond descriptive statistics) was performed for the smoking behavior study, based on the lack of significant data in the nicotine uptake study. However, not finding significant differences in one test does not exclude the possibility of significant differences in the other. Finally, this study has very little relevance to assessing the additives’ impact on inhalation facilitation during smoking initiation since all study participants had a smoking history of at least three years and a mean smoking history of 16 years.

Based on a discussion regarding the currently available scientific tests, the industry states that ‘by the current state of scientific knowledge, it was concluded that the clinical study gave no circumstantial indications of increased addictiveness for cigarettes containing priority additives’. While this may be true, it should be noted that this study was designed to study nicotine uptake and puffing behavior as endpoints for inhalation facilitation and not addictiveness. The discussion highlights that there are no validated tests for addictiveness and that most of the available tests are developed to study the addictiveness of nicotine rather than additives. Moreover, it is pointed out that the endpoints included in the clinical study are in line with the SCHEER recommendations. However, the overall approach applied by the industry is far from the recommendations of SCHEER, as a stepwise approach was proposed including in silico, in vitro, ex vivo and in vivo methods, the last only in exceptional cases. In contrast, the in vivo clinical study was the only approach completed by the industry, and no tests were conducted to assess the mode of action, which is of particular importance according to SCHEER. Moreover, evidence from existing studies regarding mode of action and mechanistic effects of additives on nicotine addiction (Havermans et al.8) was not included in the discussion regarding addictiveness. In particular, monoamine oxidase inhibiting effects of aldehydes and the various effects of menthol on the neurobiology of nicotine addiction should have been discussed35-38.

Characterizing flavors

To assess the additives’ characterizing flavor properties, the industry performed sensory testing using several consumer panels. This assessment was performed for only 8 out of 14 additives and did not include glycerol, guar gum, maltol, propylene glycol, sorbitol, and TiO2. This is somewhat surprising since SCHEER recognizes guar gum and sorbitol as flavorings12. Moreover, there were limitations in the sensory panel composition, the methodology applied in the evaluation of characterizing flavors and the interpretation of data, as specified in the following sections.

General limitations

Characterizing flavor is not a characteristic of a substance per se but of a product containing a specific combination of ingredients. However, the industry reports only assess the characterizing flavor of individual additives separately. The conclusions based on such experiments are not valid for products on the market where multiple ingredients are combined.

In the industry reports, there is no explanation provided for the rationale behind the choice of method applied in the sensory assessment. Moreover, the industry did not follow any international standard, such as the established recommendations for identifying characterizing flavors in tobacco products from the Health Effects Tobacco Composition (HETOC) consortium.

After the industry reports were submitted, the European Independent Advisory Panel (IAP) on characterizing tobacco flavors was established, and one of its goals was ‘to specify and, as appropriate, update the methodology for the technical assessment of test products’ (Commission Implementing Decision 2016/786). As this methodology was not available at the time of the industry tests, we acknowledge that it was not possible for the industry to follow these guidelines.

Further, it is important to note that some additives are pure compounds and/or mixtures that can undergo alteration during several stages: the production process of the extract or powder, the processing phase of the extract, the cigarette manufacturing phase, during storage and/or smoking. Thus, time-dependent changes may affect both flavor character and intensity. Information regarding the type of source material, the age of the material, the conditions under which it has been stored and storage time, the way in which the additive was incorporated in the tobacco product, as well the quantity of the remaining additive during the sensory tests is not included in the reports. However, these factors could have a determining impact on whether or not the additive would impart a characterizing flavor.

Limitations regarding the panel composition and size

The sensory analysis consists of a number of sequential steps, and different panels of consumers or trained panelists were used for each step. First, three trained panelists were used to identify reference products, then 15 consumers performed the clustering of products. New panels of 10 and 40 consumers, respectively, completed the ‘in/out’ test and the sensory analysis using CATA (check all that applies) testing. On the contrary, in most independent research, the number of trained experts in panels is at least ten39. For consumer panels, panelist numbers vary based on consumer characteristics for example, the purpose of the test, time-frame and cost, yet the commonly required minimum number is at least 40– 100 participants40-42. Thus, the sizes of the sensory panels in the industry study were insufficient. Also, there is no explanation why different sensory panels consisting of different numbers of consumers and experts were used for the different stages of the process.

A strong argument can be made that the primary target consumer for the tobacco industry in the use of characterizing flavors in tobacco products is: 1) young, and 2) a non-smoker. However, the ten consumers recruited for the industry study were smokers aged 18–65 years. Studies in which consumers of older age are recruited, and which restrict participation to smokers only will lead to an underestimation of the impact of a substance on the flavor of tobacco, since these individuals will be less sensitive to the characterizing odor than many other individuals (as their sensory capability is modified)43-45.

Thus, the industry used assessors who were likely to have lower sensitivity to the odors of priority additives in tobacco products than either the population at large, or the specific cohort at risk on account of their age and smoking habits.

Finally, the screening methods for the inclusion of sensory panel participants were of limited value, as they primarily focused on evaluating taste rather than odor. The screening was based on beverages (different brands of cola, water taste, sweetness ranking) and odor recognition of smells mainly easy to detect and irrelevant to tobacco products and additives (e.g. vinegar, lemon, grass).

Limitations regarding study design

The sensory analysis consists of several sequential steps (identification of reference products; screening using an ‘in/out’ test; sensory analysis using CATA (check all that applies) testing; analysis of the CATA data). The choice of cut-offs for the ‘in/out’ tests was not explained. Only products for which six or more of the ten consumers identified the test product as ‘out’ of the reference product range were subsequently tested for characterizing flavors (CATA). This cut-off criterion of 6 out of 10 consumers is both arbitrary and high, consequently the products selected for CATA had a high probability of being false negatives due to previous steps. Each step sets limits for the next steps in which products and parameters are being evaluated, which significantly increases the chance of false-negative results.

DISCUSSION

As outlined in the TPD, our main task was to evaluate the industry reports with regard to their comprehensiveness, methodology and conclusions. This article describes the significant methodological limitations identified in the industry reports.

The overall approach chosen by the industry has several limitations, including lack of: 1) an integrated discussion of laboratory results and literature review, and 2) assessment of whether the priority additives contribute to the toxicity of the additive. The toxicological and chemical evaluation in the industry reports is mainly based on comparative testing, which lacks discriminative power for products with extremely high toxicity and variability, like cigarettes, as pointed out in the SCHEER opinion. The industry’s overall approach relies partly on their interpretation of the TPD. They chose to use Article 7 as an argument for their comparative testing approach and only addressed the second part of Article 6.2.a. That is, they only assessed whether the priority additives increased the effect size of the endpoint in question to a significant degree, instead of also evaluating whether it contributes to the effect. However, as the work required from both the industry and the review panel is described solely in Article 6 of the TPD, the industry reports should, in our opinion, have been based on Article 6, including both measures of effect, namely increase and contribution.

Methodological limitations were identified in all sections of the industry reports. The literature reviews did not include relevant publications independent from the tobacco industry. The comparative chemical studies did not assess previously identified pyrolysis products of the additives. The toxicological evaluation did not include the assessment of pyrolysis products, including genotoxic and carcinogenic potential. The inhalation route was generally not included in the toxicological assessment, which is a significant limitation in the assessment of tobacco products for inhalative use. For both chemistry and toxicity testing, the statistical approach applied to test the difference between the additive-free control cigarette and the products containing the additives had several serious limitations. Three different aspects of the statistical approach used in the industry reports increased the chance of false-negative results; application of a 99.7% confidence interval instead of the commonly used 95%, comparison with variability in historical reference data rather than in the current experiments and the requirement for results to pass several statistical tests rather than relying on one test. The provided clinical study had limitations with regard to study design and statistical analysis, and addictiveness was not assessed. Moreover, the possible potentiation of addictive effects of nicotine by priority additives, for example through the process of monoamine oxidase inhibition, was not addressed. Finally, the methodology used to assess the characterizing flavor of the additives was flawed, particularly regarding the size and composition of the consumer panel and the selection of cut-offs in the sensory analysis.

CONCLUSIONS

There are serious limitations both in the overall approach chosen in the reports and in many specific methodological aspects. Due to these limitations, we conclude that the industry reports are of insufficient quality. The identified methodological limitations contributed significantly to the overall conclusions of our review: that the industry reports are not comprehensive, and the conclusions presented in the reports are not warranted. Consequently, the reporting obligations of the industry as stated in TPD Article 6 have not been fulfilled. Overall, the provided reports demonstrate that the tobacco industry cannot be considered an unbiased party in assessing their own products.