According to the 2020 Surgeon General’s Report on smoking cessation, approximately 50% of US adults who smoke make an attempt to stop smoking each year, and three in five of previous smokers have successfully quit1. In fact, the percentage of US adult current smokers has continuously decreased over the past 50 years and today reflects a 67% decline from 1965 prevalence rates. However, 34 million Americans currently smoke, increasing their risk for comorbidities such as cardiovascular and respiratory diseases, diabetes, and cancer2,3. Additionally, annual US healthcare costs related to smoking are evaluated at more than $170 billion1.

Given the morbidity, mortality and healthcare costs associated with smoking, clinicians should continue to address smoking cessation with their patients. Studies have shown that even brief advice on smoking cessation from a physician increases quit rates4. Beyond in-office conversations, physicians have an array of options for assisting patients in quitting, ranging from pharmacological therapies to behavioral interventions and, more recently, smartphone apps5,6. With new approaches to treatment emerging, physicians should strive to stay educated on the most effective options available – a feat they may in part achieve by reading published research on smoking cessation.

One valuable research source that a physician may refer to is systematic reviews. Systematic reviews aim to critically appraise and summarize the available research on a topic to answer a specific question, and they represent the highest level of research evidence7. Physicians prefer concise, easily understandable summaries of information when applying research to clinical decision-making, which may be found in abstracts of systematic reviews8. Furthermore, other studies have demonstrated that physicians often use abstracts in answering clinical questions9,10.

Because of the direct effects that abstracts of systematic reviews may have on patient care, the information presented within them should be without spin. Spin has been defined by Yavchitz et al.11 as ‘a specific way of reporting, intentional or not, to highlight that the beneficial effect of the experimental treatment in terms of efficacy or safety is greater than that shown by the results’. Several studies have shown spin to be present in the abstracts of randomized controlled trials1216; however, comparatively few studies have looked at spin in the abstracts of systematic reviews17,18. Thus, the objective of this study is to assess the abstracts of systematic reviews regarding smoking cessation treatments for the presence of spin, and to evaluate whether particular study characteristics are associated with spin.


Oversight, transparency, reproducibility, and reporting

The drafting of this manuscript was done in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)19 and the guidelines of Murad and Wang20 for metaepidemiological studies. The protocol for this study was uploaded to Open Science Framework to foster transparency and reproducibility. This particular study was conducted simultaneously with other studies evaluating the presence of spin in the abstracts of systematic reviews and thus these methods have been described also in those manuscripts. No humans were involved in this study. As a result, it was not subject to institutional review board oversight per the US Code of Federal Regulations.

Search strategy

A systematic review librarian, DNW, constructed the search strategies for the MEDLINE (Ovid) and Embase (Ovid) databases to locate systematic reviews and meta-analyses centered on treatment modalities of smoking cessation (Figure 1). On 2 June 2020, these searches and the resulting records were uploaded to Rayyan, a screening platform for systematic reviews. There were no time restrictions with regard to search strategy. After duplicates were removed, the remaining studies were screened by title and abstract by two of the investigators (MG and TK) to determine eligibility. This process was performed in a masked, duplicate manner. MG and TK met to resolve any discrepancies. If discrepancies could not be resolved, an arbitrator was used for final decisions.

Figure 1

Search strategies to obtain systematic reviews

Eligibility criteria

Articles were included if they met the following criteria: 1) must be a systematic review with or without a meta-analysis, 2) must be focused on smoking cessation interventions or treatments, 3) must be conducted on human subjects only, and 4) must be available in English. The PRISMA definition of systematic reviews and meta-analyses was used19. Articles were excluded if the above criteria were not met.

Systematic reviews were then uploaded to STATA for randomization, and the first 200 articles were sequentially extracted.

Statistical analysis

We conducted a priori power analysis using GPower with the following assumptions and parameters. One previous investigation on spin in abstracts for systematic reviews on acne vulgaris suggested that spin was present in 31% of abstracts. If we assume that: 1) 20% of PRISMA-compliant systematic reviews contain spin and 40% of non-PRISMA-compliant systematic reviews contain spin; 2) a type I error rate of 0.05 (2-tailed); 3) power of 0.80; and 4) a multiple coefficient of determination of 0.10; a total of 185 systematic reviews would be needed. The overall frequency of spin and its subtypes were characterized using descriptive statistics. We report the results as frequencies and percentages. To evaluate the association between study characteristics and the presence of spin in abstracts, Fisher’s exact tests were conducted using Stata 16.1 (StataCorp, LLC, College Station, TX). Statistical significance was defined as p<0.05.


Before title and abstract screening commenced, two investigators (MG and TK) completed an online training course on systematic reviews and meta-analyses by Li and Dickersin21. The investigators then completed two days of online and in-person training on the definition and interpretation of the nine most severe types of spin in systematic review abstracts as defined by Yavchitz et al.11. Finally, the investigators were trained in the revised A Measurement Tool to Assess systematic Reviews (AMSTAR-2) to determine the methodological quality of each systematic review and meta-analysis included in the study. A detailed outline of the training regimen can be found in our study protocol.

Data extraction

Data extraction was completed by two investigators (MG and TK) in a masked, duplicate fashion using a pilot-tested Google form (Supplementary file). The Google form was pilot tested by them on numerous papers known to contain spin to ensure the form contained all the items needed for data extraction and that the Google form worked correctly. The included systematic reviews and meta-analyses were examined for the presence of the nine most severe types of spin in their abstracts. These nine types are defined in Table 1. AMSTAR-2 was then used to evaluate the methodological quality of each systematic review and meta-analysis. AMSTAR-2 is a validated 16-item scale measuring the methodological quality of systematic reviews and meta-analyses22. In prior studies, the inter-rater reliability of AMSTAR-2 scores has been moderate to high, with high construct validity coefficients associated with the original AMSTAR instrument (r=0.91) and the Risk of Bias in Systematic Reviews instrument (r=0.8429). Based on the scores from the AMSTAR-2, the methodological quality of each review was subsequently determined to be of high, moderate, low, or critically low, quality using the AMSTAR-2 scale22. The investigators also gathered the study’s intervention type, PRISMA adherence, funding source, and publication year, as well as the publishing journal’s recommendation of adherence to PRISMA and 5-year Impact Factor (IF).

Table 1

Spin types and frequencies (%) in abstracts (N=200)

Most severe types of spinNumber containing the spin n (%)
1. Conclusion contains recommendations for clinical practice not supported by the findings.1 (0.5)
2. Title claims or suggests a beneficial effect of the experimental intervention not supported by the findings.1 (0.5)
3. Selective reporting of or overemphasis on efficacy outcomes or analysis favoring the beneficial effect of the experimental intervention.5 (2.5)
4. Conclusion claims safety based on non-statistically significant results with a wide confidence interval.0 (0.0)
5. Conclusion claims the beneficial effect of the experimental treatment despite high risk of bias in primary studies.0 (0.0)
6. Selective reporting of or overemphasis on harm outcomes or analysis favoring the safety of the experimental intervention.1 (0.5)
7. Conclusion extrapolates the review’s findings to a different intervention (i.e. claiming efficacy of one specific intervention although the review covers a class of several interventions).0 (0.0)
8. Conclusion extrapolates the review’s findings from a surrogate marker or a specific outcome to the global improvement of the disease.0 (0.0)
9. Conclusion claims the beneficial effect of the experimental treatment despite reporting bias.0 (0.0)

[i] a Seven abstracts contained spin, and 1 abstract contained 2 types of spin.


Sample characteristics

Our searches returned 3501 systematic reviews that would undergo title and abstract screening. After the removal of 1013 duplicates, an additional 2010 articles were removed for not satisfying inclusion criteria, leaving 478 that were retained for data extraction. Of these, 114 were excluded, resulting in 364 articles that met inclusion criteria. Before full-text analysis, these 364 articles underwent randomization and the first 200 systematic reviews and meta-analyses comprised our final sample from which data were extracted. Figure 2 illustrates our screening process with rationales for exclusions and random assignment.

Figure 2

Flow diagram of study selection

The majority of systematic reviews were focused on combined treatment (i.e. pharmacological and non-pharmacological interventions) (72/200; 36.0%), followed by non-pharmacological interventions (67/200; 33.5%). Most systematic reviews did not report adherence to PRISMA guidelines (140/200; 70.0%). This was despite the fact that 126 of the included journals recommended that authors of systematic reviews and meta-analyses adhere to PRISMA guidelines (126/200; 63.0%). Of the 122 systematic reviews reporting funding (122/200; 61.0%), public funding was the most frequent (77/122; 63.1%), followed by private (36/122; 29.5%) and industry funding (9/122; 7.4%); 26.5% of studies did not mention a funding source (53/200) and 12.5% stated that there was no funding involved (25/200). The mean 5-year IF for journals included in our sample was 6.10 (SD=7.37), with the largest 5-year IF being 59.1 and the smallest 0.8. The dates the systematic reviews were received by their publishing journals varied among our sample, from 1987 to 2020 (Table 2).

Table 2

General characteristics of systematic reviews and meta-analyses (N=200)

Total n (%)With Spin np
Intervention type0.683
Education13 (6.5)1
Combined treatment72 (36.0)2
Non-pharmacological67 (33.5)2
Pharmacological48 (24.0)2
Article mentions adherence to PRISMA0.43
No140 (70.0)4
Yes60 (30.0)3
Publishing journal recommends adherence to PRISMA0.427
No74 (37.0)4
Yes126 (63.0)3
Funding source0.369
Not funded25 (12.5)2
Industry9 (4.5)0
Not mentioned53 (26.5)3
Private36 (18.0)0
Public77 (38.5)2
AMSTAR-2 rating0.484
High14 (7)0
Moderate74 (37)2
Low37 (18.5)3
Critically low75 (37.5)2
mean ± SDmean ± SDOR (95% CI)
Journal Impact Factor6.10 ± 7.373.75 ± 2.830.80 (0.52–1.21)*
Publication Year (1987–2020)1.08 (0.93–1.25)*

* OR: unadjusted odds ratio.

Spin in abstracts of systematic reviews and meta-analyses

Spin occurred in seven (7/200; 3.5%) of the systematic review abstracts included in our sample, and one abstract contained more than one spin type. Spin type 3 (Table 1) occurred most frequently (5/200; 2.5%). Spin type 1, considered the most severe spin type, as the abstract makes a clinical recommendation not supported by the findings of the review, was present in one abstract (1/200; 0.5%). No abstracts contained spin types 4, 5, 7, 8 or 9 (Table 1). There were no significant associations of any of the systematic review characteristics, including the journal’s 5-year IF or publication year (Table 2).

AMSTAR-2 ratings

After critically appraising the methodological quality of the systematic review with AMSTAR-2, 14 systematic reviews were rated as high quality (14/200; 7.0%), 74 were rated as moderate quality (74/200; 37.0%), 37 were of low quality (37/200, 18.5%), and 75 were considered of critically low quality (75/200; 37.5%) (Table 2). The methodological quality of a systematic review was not significantly associated with spin (Table 2). All but two systematic reviews formulated their research questions using the Population, Intervention, Comparator, Outcome (PICO) method (198/200; 99.0%). All 16 items that comprise the AMSTAR-2 appraisal instrument, and the frequency of responses, are found in Table 3.

Table 3

AMSTAR-2 items and frequency of responses (N=200)

AMSTAR-2 ItemResponse, n (%)
YesNoPartial Yes
1. Did the research questions and inclusion criteria for the review include the elements of PICO?198 (99.0)2 (1.0)0 (0)
2. Did the report of the review contain an explicit statement that the review methods were established prior to the conduct of the review and did the report justify any significant deviations from the protocol?53 (26.5)108 (54.0)39 (19.5)
3. Did the review authors explain their selection of the study designs for inclusion in the review?56 (28.0)144 (72.0)0 (0)
4. Did the review authors use a comprehensive literature search strategy?54 (27.0)74 (37.0)72 (36.0)
5. Did the review authors perform study selection in duplicate?111 (55.5)89 (44.5)0 (0)
6. Did the review authors perform data extraction in duplicate?119 (59.5)81 (40.5)0 (0)
7. Did the review authors provide a list of excluded studies and justify the exclusions?59 (29.5)97 (48.5)44 (22.0)
8. Did the review authors describe the included studies in adequate detail?105 (52.5)11 (5.5)84 (42.0)
9. Did the review authors use a satisfactory technique for assessing the risk of bias (RoB) in individual studies that were included in the review?113 (56.5)77 (38.5)9 (4.5)
10. Did the review authors report on the sources of funding for the studies included in the review?30 (15.0)170 (85.0)0 (0)
11. If meta-analysis was performed, did the review authors use appropriate methods for statistical combination of results?117 (58.5)9 (4.5)0 (0)
12. If meta-analysis was performed, did the review authors assess the potential impact of RoB in individual studies on the results of the meta-analysis or other evidence synthesis?101 (50.5)25 (12.5)0 (0)
13. Did the review authors account for RoB in primary studies when interpreting/ discussing the results of the review?140 (70.0)60 (30.0)0 (0)
14. Did the review authors provide a satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review?125 (62.5)75 (37.5)0 (0)
15. If they performed quantitative synthesis did the review authors carry out an adequate investigation of publication bias (small study bias) and discuss its likely impact on the results of the review?60 (30.0)66 (33.0)0 (0)
16. Did the review authors report any potential sources of conflict of interest, including any funding they received for conducting the review?99 (49.5)101 (50.5)0 (0)

[i] Seventy-four articles did not perform a meta-analysis and thus were excluded from this criterion. PICO: Population, Intervention, Comparator, Outcome.


Principle findings

From the sample of systematic review and metaanalysis abstracts focused on smoking cessation treatments and interventions, we found that 96.5% (193/200) were free of spin. While this is encouraging, the seven abstracts that contained spin overemphasized the beneficial effects or selectively reported certain interventions, specifically spin type 3 (Table 1), as it comprised 62.5% (5/8) of all instances of spin. An example of this type of spin occurred in a study by Wagena et al.23 who recommended the prescription of ‘nortriptyline as a first-line therapy for smoking cessation’. The selective reporting is evident in the full text, which reported that no significant differences in efficacy were found between bupropion sustained release (SR) and nortriptyline after a 12-month follow-up period and that nortriptyline was found to result in higher prolonged abstinence rates after at least 6 months only compared to placebo and not bupropion SR. Furthermore, bupropion SR resulted in higher prolonged abstinence rates. Additionally, this abstract contains the only instance of spin type 1 (Table 1) as it recommends clinical treatment that is not sufficiently supported by statistical results. Instead it lists cost-effectiveness as a reason rather than the drug’s superiority in efficacy.

Another example of spin type 3 was found in a systematic review24 whose purpose was ‘to identify the most effective smoking cessation methods used’. First, the description of their purpose is misleading, as they reported which types of interventions have the highest number of positive outcome studies published, which does not necessarily correlate with intervention effectiveness. For example, they did not evaluate whether nicotine replacement therapy (NRT) studies had better outcomes than other interventions, such as Zyban, Champix, or counseling, but looked only at which intervention had a better positive to negative ratio. This methodology does not consider the magnitude of the outcomes of the studies reviewed, which makes it difficult to determine which intervention is truly the most effective and can lead to inaccurate rankings and conclusions. Second, the abstract results stated that NRT, Champix, and Zyban are the most supported methods, which differed from the ranking of their studied interventions found in their results. This also differed from the abstract conclusion, which reported that NRT and Champix in combination with educational interventions are recommended. In summary, it was not clear which interventions were found to be the most effective and the methodology that was used to make that determination did not take into consideration all the factors necessary to create an accurate efficacy ranking list for smoking cessation interventions.

Selective or biased reporting of results within the abstracts of systematic reviews, as in the examples we found, may lead to misinterpretation of the research findings. For example, a previous study showed that, due to limited resources, physicians are sometimes restricted to viewing only abstracts rather than the full text of articles that require journal subscription10. Furthermore, Lazarus et al.26 investigated the effects of spin within abstracts of randomized controlled trials among physicians and found that abstracts are more likely to lean towards the beneficial effects of medications, which may affect physicians’ interpretations of study results26. Although our study emphasizes the importance of eliminating spin in the abstracts of manuscripts, the presence of spin in any section of a manuscript is problematic and therefore needs to be addressed.

Although our study found few instances of spin within this sample, it is necessary to place our findings in a broader context of other studies regarding spin literature. To our knowledge, no other study has investigated the presence of spin in systematic reviews in the field of smoking cessation treatment, but several studies have found spin to be present in the abstracts of randomized controlled trials. For example, a cross-sectional review of clinical trials25 found that more than 50% of trials published in top psychiatry and psychology journals contained spin in the abstract, which further indicates the relevance of the discussion of spin in this particular discipline. Their findings raise important questions about the accuracy and objective presentation of research within primary research studies that comprise systematic reviews – which is of consequence, considering systematic reviews have a strong influence on clinical decisionmaking and patient outcomes7. Nevertheless, the articles in our survey were comparatively free of spin. This may be due to the nature of smoking cessation as a heavily studied topic that leaves little room for incorrect reporting practices such as spin. Systematic reviews help form the basis for developing practice guidelines and can provide information when gaps in knowledge exist26. Taking these important findings together, abstracts that use spin to emphasize or diminish a certain outcome, intentionally or not, can be harmful. Our results, which revealed a low occurrence of spin among smoking cessation papers, may show that clinicians in our field of study are less likely to interpret results incorrectly compared to other fields.


Because the portrayal of results in the abstracts of systematic reviews may have a direct impact on patient care, we recommend that journals, peer reviewers, and authors be held to a high standard when reporting their findings. Lazarus et al.27 demonstrated that peer reviewers failed to identify spin in abstract conclusions in 76% of the reports reviewed. Even more concerning, the same study found that more than 15% of peer reviewers requested some form of spin to be added by the authors27. By increasing awareness, education, and training regarding the concept of spin, we can begin to minimize its presence. Moreover, prior guidelines have been established to mitigate misleading claims in scientific research, and by encouraging the use of these guidelines, we can better regulate the existence of spin28. PRISMA is an evidence-based set of guidelines for reporting in systematic reviews that is widely used by various journals, authors, and peer reviewers. It establishes a standard of reporting that values transparency. Although our study found no correlation between the presence of spin in the abstracts of systematic reviews and a study’s adherence to PRISMA guidelines, we believe that by adding a requirement for PRISMA guidelines that specifically accounted for spin, more transparent writing in abstracts can be achieved. Ultimately, we recommend that medical providers such as physicians be trained to identify spin and read further into medical research as their interpretation of a systematic review can directly influence their patient care decisions.

Strengths and limitations

A strength of our investigation was that a full protocol was uploaded to OSF29 prior to beginning data extraction to promote transparency and reproducibility. Screening and data extraction were conducted in accordance with The Cochrane Collaboration guidelines, in which researchers worked independently before coming together to resolve any discrepancies30. Investigators were also required to complete in-depth training on the concept of spin and its subtypes in order to best standardize the identification of spin across publications. Limitations of our study include the inherent subjectivity in the classification of spin, which was managed through careful training and collaboration exercises. Another limitation was that, based on previous findings of spin in systematic reviews, we had intended to use logistic regression to determine associations between the presence of spin and study characteristics and estimated that evaluation of 185 systematic reviews would be needed to sufficiently power this study. Although we included more studies than the target sample size, the infrequent occurrence of spin in this sample required a deviation in statistical methods and we therefore used Fisher’s exact tests. Furthermore, while our evaluation approach was established by expert methodologists in systematic reviews and has face validity, we are not aware of psychometric studies beyond the original development article. Thus, our results should be interpreted in light of this limitation. With so few spin abstracts it is unlikely that the relative distribution of the types of spin (1–9) is generalizable. Furthermore, there may be a correlation between study characteristics and spin, only our sample is too small to detect it. Our results should be interpreted in light of these limitations. Additionally, even though we searched the two largest bibliographic databases, MEDLINE (Ovid) and Embase (Ovid), it is possible that our search strategy was not entirely comprehensive.

An additional factor to consider is how AMSTAR-2 was initially created in 2017 and was fitted with an improved and more complete reporting process of systematic reviews. This may cause a limitation in our study because authors of studies published before 2017 would not have known about the new factors associated with AMSTAR-2, which may have led to them achieving lower ratings in our assessments.


Systematic reviews on smoking cessation may help physicians stay up-to-date on the latest treatment options and help guide their clinical decisions. As a result, the abstracts of these systematic reviews should be completely free of spin. Our study is the start of an important conversation on the way to improving the scientific accuracy of research and acknowledging the factors that play a role in influencing the language used to convey published scientific research.