1.4. Evidence base

In 2021, WHO commissioned a systematic review of published and unpublished data on the new class of tests for TB infection not previously reviewed by WHO.

The overarching policy question was:

Should Mtb antigen-based skin tests (TBSTs) for TB infection be used as an alternative to the tuberculin skin testtests (TST) or WHO-endorsed interferon-y release assays (IGRA) to identify individuals most at risk of progression from TB infection to TB disease?

Based on the overarching policy question, four domains for evidence search and generation were included: diagnostic accuracy, safety, economic aspects and qualitative aspects.

For each domain, specific population, intervention, comparator and outcome (PICO) or research questions were defined.

Domain 1 – Diagnostic accuracy (PICO question): Do TBSTs have similar or better diagnostic performance than the TST or IGRAs to detect TB infection?

Domain 1 – Diagnostic accuracy (PICO question): Do TBSTs have similar or better diagnostic  performance than the TST or IGRAs to detect TB infection?

BCG: bacille Calmette-Guérin; CXR: chest X-ray; HIV: human immunodeficiency virus; IGRA: interferon-gamma release assay; Mtb: Mycobacterium tuberculosis; PICO: population, intervention, comparator and outcome; People with HIV: people living with HIV; TB: tuberculosis; TBST: Mtb antigen-based skin test; TNF: tumour necrosis factor; TPT: TB preventive treatment; TST: tuberculin skin test.

a 100/100 000 population.

b For estimation of specificity, the ideal population is one with very low likelihood of prior exposure to Mtb.

c TB disease is used as a proxy diagnosis for TB infection

Domain 2 – Safety: Do TBSTs for TB infection cause more adverse reactions than the TST or IGRAs?

  • What is the risk of adverse events of TBSTs compared with the current TST or IGRAs? 
  • Consider data on both local and systemic reactions graded by type, severity and seriousness, and stratified by subgroup. 
  • Compute relative risks where possible; however, if there is no control group receiving a comparator test, report frequency (%) of adverse events. 

Domain 3 – Cost–effectiveness analysis: What are economic considerations of TBSTs compared with the TST or IGRAs?

  • How large are the resource requirements (costs)?
  • What is the certainty of the evidence on resource requirements (costs)?
  • Does the cost–effectiveness of the intervention favour the intervention or the comparison?

Domain 4 – User perspective: What are end-user⁴ views and perspectives on use of novel skin-based in vivo tests for TB infection use?

  • Is there important uncertainty about, or variability in, how much end-users value the main outcomes?
  • What would be the impact on health equity?
  • Is the intervention acceptable to key stakeholders?
  • Is it feasible to implement the intervention?

The certainty of the evidence of the pooled studies was assessed systematically through PICO questions, using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach (2, 3). The GRADE approach produces an overall quality assessment (or certainty) of evidence, and has a framework for translating evidence into recommendations; also, under this approach, even if diagnostic accuracy studies are of observational design, they start as high-quality evidence.

GRADEpro Guideline Development Tool software (4) was used to generate summary of findings tables. The quality of evidence was rated as high (not downgraded), moderate (downgraded one level), low (downgraded two levels) or very low (downgraded more than two levels), based on five factors: risk of bias, indirectness, inconsistency, imprecision and other considerations. The quality (certainty) of evidence was downgraded by one level when a serious issue was identified and by two levels when a very serious issue was identified in any of the factors used to judge the quality of evidence. For data from the systematic reviews that were of a qualitative nature, the GRADE-CERQual tool was used. The tool examines the methodological limitations of the included studies, the coherence of each review finding, the adequacy of the data in support of a review finding and the relevance of the included studies to the review research questions; it is used to assess data quality from qualitative research studies.

Data synthesis was structured around the preset PICO question, as outlined above. The following web annexes provide additional information to evidence synthesis and analysis:

Web Annex A. Accuracy of Mycobacterium tuberculosis antigen-based skin tests: a systematic review and meta-analysis
Web Annex B. Safety of Mycobacterium tuberculosis antigen-based skin tests: a systematic review and meta-analysis
Web Annex C. GRADE profiles of Mycobacterium tuberculosis antigen-based skin tests
Web Annex D. Cost–effectiveness of Mycobacterium tuberculosis antigen-based skin tests: a systematic review
Web Annex E. Modelling for economic evidence for the use of Mycobacterium tuberculosis  antigen-based skin tests
Web Annex F. Qualitative evidence for the use of Mycobacterium tuberculosis antigen-based skin tests

Web Annex G. Mycobacterium tuberculosis antigen-based skin tests: evidence-to-decision table

⁴ End-users are health care providers, laboratory technicians and managers, programme staff, community workers, people being offered the test and family.

تصفُّح الكتاب