3.1. Mycobacterium tuberculosis antigen-based skin tests for the diagnosis of TB infection

Since 2011, the World Health Organization (WHO) has issued recommendations on the use of IGRAs for the diagnosis of TB infection. In 2018, WHO updated the recommendations to stipulate that the TST or IGRAs (or both) can be used to test for TB infection in LMIC. The TST is a widely used point-of-care test that involves intradermal injection of purified protein derivative (PPD), a crude mixture of different mycobacterial antigens, which stimulates a delayed-type hypersensitivity response and causes induration at the injection site within 48–72 hours. This test has relatively low specificity in those with recent bacille Calmette-Guérin (BCG) vaccination and low sensitivity in immunosuppressed individuals (e.g. people living with HIV [PLHIV]); hence, interpretive cut-offs must be adapted for these populations. A follow-up clinic visit is required after the placement of the TST, and results must be read within the suggested time frame to be valid. In contrast, IGRAs are in vitro tests that measure release of interferon- gamma (IFN-γ) by T-cells following stimulation by the early secretory antigenic target 6 kDa protein (ESAT-6) and culture filtrate protein 10 (CFP-10) antigens that are specific to Mtb. Unlike the TST, IGRAs are not affected by prior BCG vaccination, or by infection with nontuberculous mycobacteria (NTM), with few exceptions. However, IGRA platforms are more expensive to run and require specialized facilities and trained personnel; consequently, the TST is the most commonly used test for TB infection globally. Recent global shortages of PPD have underscored the need for alternatives.

In addition to the TB skin tests and interferon gamma release assays previously recommended by WHO, Mtb antigen-based skin tests (TBSTs) based on specific antigens have recently been developed, using the same ESAT-6 and CFP-10 antigens; these tests combine the simpler skintest platform with the specificity of IGRAs. TBSTs include the Cy-Tb (Serum Institute of India, India), Diaskintest® (Generium, Russian Federation) and C-TST (formerly known as ESAT6-CFP10 test, Anhui Zhifei Longcom, China). All tests use intradermal injection of antigen and, like the TST, are read after 48–72 hours as induration in millimetres, using the method suggested by Mantoux. Emerging evidence suggests that, compared with IGRAs, the tests may have similar specificity and provide more reliable results in children and adolescents as well as in PLHIV than the TST. However, the evidence had not been systematically reviewed.

In 2021, WHO commissioned a systematic review of published and unpublished data on this new class of tests for TB infection not previously revieed by WHO. The systematic review included data on diagnostic accuracy, safety, economic aspects and qualitative evidence on feasibility, acceptability, equity, end-user values and preferences. A Guideline Development Group (GDG) was convened by WHO from 31 January to 3 February 2022, to discuss the findings of the systematic reviews and to make recommendations on this class of diagnostic technologies for TB infection.

The following technologies were included in the evaluation:

  • Cy-Tb (Serum Institute of India, India);
  • Diaskintest (Generium, Russian Federation); and
  • C-TST (formerly known as ESAT6-CFP10 test, Anhui Zhifei Longcom, China)

Table 3.1.1 PICO questions for assessment of TBSTs

Table-3-1-1

 

The current recommendations are based on the evaluation of data for the tests that were included in the present evaluation. The findings cannot be extrapolated to other brand-specific tests; also, any new in-class technologies will need to be specifically evaluated by WHO.

Guidelines are disseminated through the WHO Global TB Programme (WHO/GTB) listservs to WHO regional offices, Member States, the Stop TB Partnership and other stakeholders (e.g. the Global Laboratory Initiative and the TB Supranational Reference Laboratory Network); they are also published on the websites of the WHO/GTB and Global Laboratory Initiative. The updated policy is incorporated into the WHO TB Knowledge Sharing Platform – an online reference resource for global TB policies and derivative products.

Recommendation
Unamed-table-18_1

 

Evidence base

In 2021, WHO commissioned a systematic review of published and unpublished data on the new class of tests for TB infection not previously reviewed by WHO. The overarching policy question was: Should Mtb antigen-based skin tests (TBSTs) for TB infection be used as an alternative to the tuberculin skin test (TST) or WHO-endorsed interferon-y release assays (IGRA) to identify individuals most at risk of progression from TB infection to TB disease? Based on the ov erarching policy question, four domains for evidence search and generation were included: diagnostic accuracy, safety, economic aspects and qualitative aspects. For each domain, specific population, intervention, comparator and outcome (PICO) or research questions were defined.

Domain 1 – Diagnostic accuracy (PICO question): Do TBSTs have similar or better diagnostic performance than the TST or IGRAs to detect TB infection?

Domain 2 – Safety: Do TBSTs for TB infection cause more adverse reactions than the TST or IGRAs?

  • What is the risk of adverse events of TBSTs compared with the current TST or IGRAs?
  • Consider data on both local and systemic reactions graded by type, severity and seriousness, and stratified by subgroup.
  • Compute relative risks where possible; however, if there is no control group receiving a comparator test, report frequency (%) of adverse events.

Domain 3 – Cost–effectiveness analysis: What are economic considerations of TBSTs compared with the TST or IGRAs?

  • How large are the resource requirements (costs)?
  • What is the certainty of the evidence on resource requirements (costs)?
  • Does the cost–effectiveness of the intervention favour the intervention or the comparison?

Domain 4 – User perspective: What are end-user⁴ views and perspectives on use of novel skin-based in vivo tests for TB infection use?

  • Is there important uncertainty about, or variability in, how much end-users value the main outcomes?
  • What would be the impact on health equity?
  • Is the intervention acceptable to key stakeholders?
  • Is it feasible to implement the intervention?

The certainty of the evidence of the pooled studies was assessed systematically through PICO questions, using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach (2, 3). The GRADE approach produces an overall quality assessment (or certainty) of evidence, and has a framework for translating evidence into recommendations; also, under this approach, even if diagnostic accuracy studies are of observational design, they start as high-quality evidence.

GRADEpro Guideline Development Tool software (4) was used to generate summary of findings tables. The quality of evidence was rated as high (not downgraded), moderate (downgraded one level), low (downgraded two levels) or very low (downgraded more than two levels), based on five factors: risk of bias, indirectness, inconsistency, imprecision and other considerations. The quality (certainty) of evidence was downgraded by one level when a serious issue was identified and by two levels when a very serious issue was identified in any of the factors used to judge the quality of evidence. For data from the systematic reviews that were of a qualitative nature, the GRADE-CERQual tool was used. The tool examines the methodological limitations of the included studies, the coherence of each review finding, the adequacy of the data in support of a review finding and the relevance of the included studies to the review research questions; it is used to assess data quality from qualitative research studies.

Data synthesis was structured around the preset PICO question, as outlined above. The following web annexes provide additional information to evidence synthesis and analysis:

Diagnostic accuracy

Diagnostic accuracy studies evaluating sensitivity, specificity and concordance (agreement) of TBSTs were identified. There were no identified studies on the efficacy of TPT based on diagnostic test results, on the predictive value for progression to TB disease or on the proportion started on TPT.

The assessed evidence for Cy-Tb and C-TST has included a manufacturer-recommended induration of at least 5 mm as the cut-off. According to the Diaskintest instructions for use, the presence of induration of any size is considered a positive response. However, the assessed evidence also included some studies for Diaskintest that used an induration of at least 5 mm as a cut-off, specified where applicable.

Sensitivity

A total of 20 studies involving 1627 participants provided data for evaluating the sensitivity of TBSTs in people with microbiologically confirmed TB, which was used as a proxy for sensitivity to diagnose TB infection. Of these, six studies with 539 participants were head-to-head comparisons with the TST or IGRAs (or both); 17 studies included 1276 participants who were HIV-negative or whose HIV status was unknown; five studies included 317 PLHIV; and four studies included 34 participants aged under 18 years. Of the included studies, 14 evaluated Diaskintest, four Cy-Tb and three C-TST, as shown in Figs. 3.1.1.1–3.1.1.2.

Fig. 3.1.1.1 Sensitivity of TBSTs in head-to-head studies

Fig-3-1-1-1

 

The pooled sensitivity against the microbiological reference standard for TB disease in six head- to-head studies (Fig. 3.1.1.1) was 78.1% (95% confidence interval [CI]: 70.6–84.1%). The evidence was considered to be of high certainty and was not downgraded. Starshinova 2018 (5) and Starshinova 2019 (6) evaluated Diaskintest results with a cut-off of induration of at least 5 mm; the rest of the studies were head-to-head studies evaluating Cy-Tb. The assessed evidence for Cy-Tb included a cut-off of at least 5 mm in all studies. The TST cut-off was 5 mm for PLHIV and 15 mm for people who were HIV-negative in four studies (7–10). Only studies on Diaskintest and Cy-Tb were included in this analysis.

Fig. 3.1.1.2 Sensitivity of TBSTs in all studies in individuals with HIV-negative or unknown status

Fig-3-1-1-2

 

The pooled sensitivity in 17 studies presented in Fig. 3.1.1.2 among participants who were HIVnegative or HIV status unknown was 76.0% (95% CI: 70.3–80.8%). The sensitivity estimates were lower in the studies using Diaskintest (any induration size). The reason for this is unclear; it may reflect different study populations or study quality. As a result, the evidence certainty was downgraded one level for inconsistency and another level for imprecision. Consequently, the certainty of the evidence was considered very low. Despite the manufacturer’s recommendation to use induration of any size as a positive result, the sensitivity in studies using a Diaskintest result of at least 5 mm as the cut-off was more closely aligned with the other tests in the class, which all use a cut-off of at least 5 mm.

Risk of bias was considered serious due to the person having knowledge of the reference standards when interpreting the results of index tests. In most Diaskintest studies, the selection of participants and of the reference standard were unclear; hence, the certainty of the evidence was downgraded one level for risk of bias. The sensitivity ranged from 55% to 100% (the reasons for this heterogeneity are unknown); consequently, the certainty of the evidence was downgraded one level for inconsistency. Thus, the overall certainty of the evidence was considered low.

Fig. 3.1.1.3 Sensitivity of TBSTs in PLHIV

Fig-3-1-1-3

 

Only studies on Diaskintest and Cy-Tb were included in the analysis presented in Fig. 3.1.1.3 The pooled sensitivity among PLHIV in five studies was 63.5% (95% CI: 52.6–73.2%). Risk of bias was considered serious for Diaskintest studies because of the person having knowledge of the reference standards when interpreting the results of index tests; hence, the evidence certainty was downgraded one level for risk of bias. The sensitivity estimates were lowest (39.8%) in the one study that used Diaskintest (any induration size). The reason for low sensitivity for Diaskintest (any induration size) is unclear, and the evidence certainty was downgraded one level for inconsistency. Certainty was also downgraded one level for imprecision. Consequently, the certainty of the evidence was considered to be very low.

Fig. 3.1.1.4 Sensitivity of TBSTs in children and adolescents

Fig-3-1-1-4

 

Sensitivity of TBSTs among children and adolescents is shown in Fig. 3.1.1.4. The pooled sensitivity in four studies for this class of tests was 97.1% (95% CI: 81.9–99.6%). The number of participants included in this analysis was small – only 34 participants in four studies; hence the studies were downgraded two levels for imprecision. Therefore, the evidence certainty was considered low. Only studies on Diaskintest were available for this analysis. Aggerbeck (7) estimated the sensitivity of Cy-Tb in 12 children and adolescents with TB, of whom only two were bacteriologically confirmed and were not included in the figure.

Specificity

A total of 14 studies involving 3792 participants provided data for evaluating specificity of TBSTs (including difference in specificity compared with the reference test); three of the studies included 1104 children and adolescents and three included 587 BCG-vaccinated individuals. Specificity was measured in healthy individuals with negative IGRA results. Difference in specificity was used as an alternative specificity measure, and was calculated as the difference in the proportion of negative results between TBSTs and the TST or IGRAs in healthy populations.

Fig. 3.1.1.5 Specificity in healthy individuals with negative IGRA results

Fig-3-1-1-5

 

The specificity assessed in the five studies presented in Fig. 3.1.1.5 was high for all three tests in the TBST class. For Diaskintest it was 99.1% (95% CI: 93.6–99.9%), as compared with QFT; for Cy-Tb it was 98.0% (95% CI: 92.6–99.5%), as compared with QFT; and for C-TST it was 95.5% (95% CI: 92.6–97.3%), as compared with T-Spot. During the GDG meeting, participants noted that – considering the totality of evidence (which included studies of very low quality) – the overall certainty of the evidence on tests’ effects for specificity was very low.

Specificity in children and adolescents (2 studies, 176 patients), as determined in individuals with negative IGRA results, was high. For Diaskintest with a cut-off of at least 5 mm it was 99.1% (95% CI: 94.9–99.9%), as compared with QFT, and for Cy-Tb it was 91.4% (95% CI: 82.2–96.1%), as compared with QFT. Specificity in BCG-vaccinated individuals (3 studies, 292 patients), as determined in healthy individuals with negative IGRA results, was also high, being 97–99% (depending on the test), with a pooled value of 99.0% (95% CI: 96.9–99.7%). More details can be found in Web Annex A.

Fig. 3.1.1.6 Difference in specificity – TBSTs versus the TST

Fig-3-1-1-6

 

The overall pooled difference in specificity in 14 studies (Fig. 3.1.1.6) comparing TBSTs and the TST was 33.5% (95% CI: 18.2–48.8%) higher for TBSTs. In studies of Diaskintest and C-TST done in high TB incidence settings, the differences in specificity were higher for Diaskintest versus the TST (with both tests having a cut-off of at least 5 mm) (57.3%, 95% CI: 40.2–74.3%), than with Diaskintest (any induration size) versus the TST with a cut-off of at least 5 mm (29.9%, 95% CI: –3.66–63.5%). For C-TST versus the TST with a cut-off of at least 5 mm, the difference in specificity was 39.9% (95% CI: 34.0–45.8%). In contrast, in studies of Cy-Tb undertaken in low TB incidence settings, the difference in specificity between Cy-Tb and the TST was less prominent, but was greater with the TST with a cut-off of at least 15 mm (4.61%, 95% CI: –28.6–37.9%) than with the TST with a cut-off of 5 or 15 mm (–2.0%, 95% CI: –12.3–8.3%). The difference may be explained by the background level of BCG in the study populations or by the cut-offs that were used. Fig. 3.1.1.7 has more details on the specificity of TBSTs versus the TST in BCG-vaccinated people. Overall risk of bias was considered serious because test allocation by arm was not blinded in any of the studies except those for Cy-Tb. In most Diaskintest studies, the selection of participants and the diagnosis of the reference standard were unclear. The certainty of the evidence was therefore downgraded one level for risk of bias. The difference in specificity ranged from –2% to 72%; hence, the certainty of the evidence was downgraded one more level for inconsistency. Consequently, the certainty of the evidence for difference in specificity between TBSTs and the TST was low.

Fig. 3.1.1.7 Difference in specificity – TBSTs versus the TST in BCG-vaccinated population

Fig-3-1-1-7

 

Two studies (three analyses) provided data on difference in specificity in BCG-vaccinated populations, which was even higher for this population than in populations where only some people had received BCG vaccination; the pooled difference in specificity was 67.4% (95% CI: 24.0–110.7%). Overall risk of bias was considered serious because test allocation by arm was not blinded; hence, the certainty of the evidence was downgraded one level for risk of bias. The CI was broad, ranging from 24.0% to 110.7%, so the certainty of the evidence was downgraded one more level for imprecision. Consequently, certainty of the evidence for difference in specificity between TBSTs and the TST in BCG-vaccinated populations was low.

The pooled difference in specificity in six studies comparing TBSTs and IGRAs was low, at 2.3% (95% CI: –1.6–6.2%), meaning that TBSTs were similar to IGRAs in terms of specificity.

Agreement

Overall, 16 studies involving 3198 participants (among which four studies with 1307 participants recruited people aged under 18 years) were included to assess agreement of the index tests with comparator tests (the TST or IGRAs, or both).

In participants without TB disease, agreement was high (≥90%) for Cy-Tb and Diaskintest – (any induration size) and Diaskintest 5 mm induration – compared with QFT (Fig. 3.1.1.8). Agreement was slightly lower at 85.5% (95% CI: 75.7–91.7%) for C-TST compared with T-Spot. In one study, which evaluated Diaskintest with induration of at least 7 mm compared with T-Spot, the agreement was considerably lower, at 60.9% (95% CI: 54.3–67.2%). Risk of bias was considered serious because the allocation of tests was not blinded in five studies; hence, certainty of the evidence was downgraded one level for risk of bias. Agreement ranged widely (from 61% to 97%) for various tests and studies, so the certainty of the evidence was downgraded one level for inconsistency. Consequently, certainty of the evidence for agreement between TBSTs and IGRAs was low.

Fig. 3.1.1.8 Agreement of TBSTs versus IGRAs in all studies including participants without active TB

Fig-3-1-1-8

 

In participants with TB disease, high agreement between TBSTs and IGRAs as the comparator (85.7%) was observed (Fig. 3.1.1.9). Some variability in agreement was seen between the different tests: 79.6% (95% CI: 76.3–82.6%) for Cy-Tb 5 mm compared with QFT; 97.3% (95% CI: 72.7–99.8%) for Diaskintest (any induration size) compared with QFT; and 97.0% (95% CI: 92.3– 98.9%) for DST 5 mm induration compared with QFT. Agreement was slightly lower at 85.4% (95% CI: 72.4–92.9%) for C-TST compared with T-Spot. Risk of bias was considered serious because, in four studies, the allocation of tests by arm was not blinded; hence, the certainty of the evidence was downgraded one level for risk of bias. The agreement ranged from 75% to 100% for various tests and studies, so certainty of the evidence was downgraded one level for inconsistency. The overall certainty of the evidence for agreement between TBSTs and IGRAs in people with TB disease was considered low.

Fig. 3.1.1.9 Agreement of TBSTs versus IGRAs in all studies including people with active TB

Fig-3-1-1-9

 

Safety

A systematic review of studies reporting the outcomes of interest, including local reactions – that is, injection site reactions (ISR) and systemic adverse events from TBSTs – was undertaken. The following databases were searched for studies from inception until 30 July 2021: Medline, Embase, e-library, the Chinese Biomedical Literature Database and the China National Knowledge Infrastructure Database. The test manufacturers were contacted for individual studies, and studies were identified through a public call for data by WHO. Longitudinal and case–control studies reporting adverse events of the index tests alone or compared with recognized comparator tests (e.g. QFT, T-Spot and the TST) in humans were included with no language restrictions. Screening of titles and abstracts as well as full-text articles and the assessment of quality were performed by two investigators in duplicate. A meta-analysis was conducted using a random-effects model, and studies that were considered to be clinically homogenous were pooled.

Overall, seven studies for Cy-Tb, five for C-TST and 11 for Diaskintest were identified. Characteristics of studies were as follows:

  • Cy-Tb: clinical trials – three studies in South Africa and four in Europe. Most participants were adults; in studies in South Africa, 20–40% of participants were PLHIV. Five of seven studies included random allocation of Cy-Tb versus the TST into two arms and thus allowed comparison of ISR. All five studies were included in the pooled evidence assessment on any ISR. Only one study provided comparable data on systemic reactions. This study was also included in the pooled evidence assessment on systemic reactions.
  • C-TST: all five studies were conducted in China and included only HIV-negative adults. All of them included non-random allocation of C-TST versus the TST into two arms; thus, no study evaluating C-TST was included in the pooled evidence assessment on any ISR. Also, no studies including any comparable data on systemic reactions were available.
  • Diaskintest: cross-sectional studies using routinely collected data mostly in the Russian Federation, and one in Ukraine, including various populations (adults, children and adolescents – healthy, contacts of TB patients and with TB). Two studies on Diaskintest provided comparable data on ISR; however, one of them provided no information about the number of participants who experienced any ISR; thus, only one study on Diaskintest was included in the meta-analysis.

Fig. 3.1.1.10 Any injection site reactions

Fig-3-1-1-10

 

Proportion of PLHIV: Aggerbeck 2018 (7) (25%), Aggerbeck 2019 (8) (20%); Hoff 2016 (10) (39.5%). Other studies included HIV-negative individuals. Aggerbeck 2018 (7) included children aged under 5 years (20%) and aged 5–17 years (31%); Ruhwald 2017 (9) included children aged under 5 years (3.5%) and aged 5–17 years (8.8%). Other studies included adults. Hoff 2016 (10), Aggerbeck 2019 (8) and Streltsova 2011 (11) included people with TB only.

The pooled risk of any ISR due to Cy-Tb (n=2878, 5 studies) and Diaskintest (n=53, 1 study) presented in Fig. 3.1.1.10 was not significantly different from the TST (risk ratio [RR] 1.09; 95% CI: 0.74–1.61). The risk of any systemic reaction was only analysable in one study (Cy-Tb) that allowed such comparison, and was not significantly different from the TST (RR 0.84; 95% CI: 0.60–1.10). The Diaskintest study was considered to have high risk of bias, while the overall certainty of evidence from the randomized controlled trials for any ISR was judged as high. For any systemic reactions, overall certainty of evidence was judged to be moderate because of the small sample size and wide CI.

Following the request from GDG members for the post-marketing surveillance data for Diaskintest, the following data were reported by the manufacturer: in 2019–2021, over a 55.7 mln Diaskintest tests were done, with 27 serious adverse effects and 30 non-serious adverse effects. Based on the totality of data, the GDG rated the certainty of evidence as high.

Based on the data presented at the GDG meeting, it was concluded that the safety profile of novel TBSTs is similar to that of the TST, and is associated with mostly mild ISR such as itching and pain. From the reviewed studies, there appears to be no safety signal that might affect the choice between specific TBSTs and the TST. However, the group also noted that this was not a full safety review covering product safety, animal or preclinical studies. Regulatory assessment for safety is needed before any of the TBST products are implemented.

Cost and cost–effectiveness analysis

Two reviews following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were carried out to look at costs and cost–effectiveness of:

  • novel TBST, such as Diaskintest, C-TST and Cy-Tb (primary review); and
  • TST and IGRA tests (secondary review).

The articles searched were those presenting economic evaluations of the diagnostic tests (costs and cost–effectiveness) using a health provider perspective and related to TB infection in humans. The articles reviewed were those written in English, Chinese or Russian languages, and published in Medline, OVID, Chinese Biomedical Literature, China National Knowledge Infrastructure and Russian e-library databases. Quality of studies was assessed using Drummond’s checklist.

In addition, a Markov-chain model was developed for the purposes of the GDG meeting, to study the cost–effectiveness of TBSTs versus the currently available tests, the TST and IGRAs. When simulating a cohort of individuals transitioning among different states and steps along the TB cascade of care, the model took into consideration the following parameters:

  • prevalence of TB infection in TB-negative individuals, percentage;
  • people completing treatment after initiation following a positive TB infection result, percentage;
  • people not initiating treatment after testing positive for TB infection, percentage;
  • people interrupting treatment after initiation following a positive TB infection test result, percentage;
  • progression from TB infection to active TB, probability;
  • efficacy of TB infection treatment;
  • active TB treatment coverage;
  • recovery from active TB (treated + untreated);
  • death from active TB (treated + untreated);
  • probability of a true positive test result if the patient has TB infection (sensitivity); and
  • probability of a true negative test result if the patient does not have TB infection (specificity).

Model parameters, unit costs and estimates of diagnostic test accuracy were sourced from the literature, including from the systematic reviews mentioned above. The manufacturers of novel TBSTs were also contacted to source costs of the new tests. However, only Generium, the manufacturer of Diaskintest, provided estimated test costs, including delivery costs, for different delivery volumes. Consequently, the modelling study focused on Diaskintest as the representative of the TBST class of tests.

The model was parameterized to three countries: Brazil, South Africa and the United Kingdom. Three testing strategies were considered in this analysis: Diaskintest (index); the TST; and QuantiFERON-TB IGRAs, either Gold In-Tube or Gold Plus (comparator tests). Outcomes reported included unit cost (in US dollars)⁵ per patient, incremental cost–effectiveness ratio (ICER) and incremental net benefit per quality-adjusted life year (QALY) gained. Unit costs considered in each country included test kit, staff time, laboratory and disposable costs. Costs were considered from a health system perspective and did not reflect patient or societal costs.

Given that only information on Diaskintest was available, a univariate sensitivity analysis on TBST unit costs and a comparison of the results of the three strategies was performed to identify possible maximum unit costs of new TBSTs, for the strategy to remain cost saving or costeffective, but without specifying a particular type of TBST.

The conclusions were based on the predefined research questions outlined below.

How large are the resource requirements (costs)?

In the eight studies that assessed Diaskintest, most estimated a cost of $1.60 per test. One study evaluated the unit costs considering staff time, consumables and laboratory costs, resulting in a cost of $5.07. This study, using the same costing factors, also estimated the unit cost of C-TST as $9.96. The 29 studies on IGRAs or the TST (or both) estimated an average cost of $37.84 for the TST and $89.33 for IGRAs (accounting for different ingredients). The cost–effectiveness of the tests varied among and within risk groups, with no clear economic consensus around the cost–effectiveness of comparison tests.

What is the certainty of the evidence of resource requirements (costs)?

Based on Drummond’s scores, the quality of studies that have assessed cost–effectiveness of C-TST and Diaskintest in this review was concerning; only one out of eight studies was of high quality. However, the quality of the studies that assessed cost–effectiveness of the TST and IGRAs was generally high.

Does the cost–effectiveness of the intervention favour the intervention or the comparison?

Based on the systematic review results, there was insufficient evidence regarding both the cost and cost–effectiveness of novel TBSTs. The quality of the studies was concerning according to the Drummond’s checklist for economic evaluations. More high-quality studies are needed that consider different health settings and risk populations to estimate the cost–effectiveness and the likely economic impact of these tests.

Results of the Markov-chain model conducted for the purposes of the GDG meeting concluded that, in Brazil, Diaskintest is cost saving compared with the TST and IGRAs. Compared with the TST, Diaskintest is cost saving at $5.60, with an incremental gain of 0.02 QALYs per patient. Compared with IGRAs, Diaskintest is cost saving at $8.40, with an incremental gain of 0.01 QALYs. In South Africa, Diaskintest is more cost saving than the TST or IGRAs. Compared with the TST, Diaskintest is cost saving at $4.39, with an incremental gain of 0.02 QALYs, and compared with IGRAs, it is cost saving at $64.41, with an incremental gain of 0.01 QALYs. In the United Kingdom, Diaskintest is cost saving compared with the TST but not with IGRAs. Compared with the TST, Diaskintest is cost saving at $73.33, with an incremental gain of 0.04 QALYs; however, compared with IGRAs, Diaskintest showed an increase in cost of $15.80 but still an incremental gain of 0.03 QALYs.

In summary, the modelling and univariate sensitivity analysis results show that, in Brazil and South Africa, use of Diaskintest would potentially save costs per patient and result in greater health gains (QALYs per patient) compared with the TST and IGRAs. In the United Kingdom, Diaskintest results in health gains but is more expensive in terms of expected cost per patient than IGRAs. Our results also show that, in Brazil and South Africa, IGRAs are more costly to implement than the TST but would result in health gains. However, in the United Kingdom, IGRAs are cheaper to implement and are more cost-effective than the TST.

User perspective

User perspectives on the value, feasibility, usability and acceptability of diagnostic technologies are important in the implementation of such technologies. If the perspectives of laboratory personnel, clinicians, patients and TB programme personnel are not considered, the technologies risk being inaccessible to and underused by those for whom they are intended.

To address questions related to user perspective, the following activities were undertaken:

  • Two systematic reviews, which synthesized the qualitative research evidence on end-user values and preferences for the use of specific TBSTs for TB infection, compared with existing tests (IGRAs and the TST). Study quality and confidence in the evidence were evaluated in accordance with the GRADE-CERQual.
  • Twenty semi-structured interviews with a diverse range of clinicians, laboratory staff, programme officers and individuals living with TB infection (referred to as “consumers” throughout this report).
  • A discrete choice experiment (DCE) survey, drawing from themes derived in systematic reviews and semi-structured interviews. DCE methodology was used to elicit stated values and preferences from participants (end-users) without directly asking them to state their preferred options.

Four studies were identified that met the inclusion criteria for both systematic reviews. From the review on specific TBST, only one data source was identified (from the Russian Federation), and that came from a WHO public call for data relating to the feasibility and acceptability of TBSTs. Participants were parents of children and adolescents with TB infection. From the review on current IGRAs and the TST, three peer-reviewed articles were found to meet the inclusion criteria; these three papers were from the Netherlands, South Africa and the United States of America (USA). Participants included a range of health professionals involved in TB care (Netherlands, South Africa and USA) and PLHIV (South Africa). The overall confidence in the quality of the evidence from the studies was low to moderate based on the GRADE-CERQual assessments, because the data lacked richness, with most studies reporting only summaries of participant quotes or limited direct quotes. All studies were conducted on specific subgroups (e.g. PLHIV, or parents of children and adolescents with TB infection).

For user interviews, 20 participants were recruited – 13 were TB health care providers (8 from low- and middle-income countries [LMIC]) and seven were people affected by TB (3 from LMIC). Health care providers included programme executives and decision-makers, public health practitioners and advocates, physicians, researchers and laboratory technicians, and a nurse.

For DCE, a total of 234 participants completed this activity (186 providers and 48 consumers). Overall, 59% of respondents were female and 56% were aged 36–55 years; the main countries in which respondents were based were India (26%), the USA (16%), South Africa (9%), Pakistan (8%) and Zimbabwe (7%).

The conclusions were based on the predefined research questions outlined below.

Is there important uncertainty about or variability in how much end-users value the main outcomes?

Qualitative data from the systematic reviews and end-user interviews, and quantitative data from the DCE indicated that health care consumers and providers had similar values and preferences in terms of TB infection tests. Key end-user values included test accuracy, convenience, positive patient experience, cost and resource requirements. In particular, end-users valued tests with high accuracy such as TBST and IGRAs (i.e. low false positive and false negative rates), because they reduce the risk of downstream consequences associated with false positive and false negative results (e.g. anxiety, and the need for additional testing or unnecessary treatment). End-users also preferred having a test that was convenient to administer and access. This included valuing tests that can be accessed in a community or primary care setting, that do not require follow-up visits to read test results, and that can be administered without the need for additional systems or infrastructure to be developed. These findings were initially identified from themes emerging from the systematic reviews and end-user interviews, and were confirmed by the DCE findings.

From the qualitative data from the reviews and interviews, all TB infection test options were found to have strengths and limitations in terms of convenience. End-users valued a positive consumer experience. This meant that tests with fewer psychological effects (e.g. anxiety, stigma and stress) and physical consequences (e.g. discomfort) were preferred. Tests that were more accurate tended to be associated with better consumer experience, although some aspects of consumer experience were worse in skin tests (e.g. stigma from the welt and discomfort) compared with non-skin-based tests. Low-cost tests were generally preferred due to greater accessibility in resource-limited contexts (e.g. TBST and the TST). Tests with lower resource requirements were preferred in resource-limited settings (e.g. TBST and the TST); however, this appeared to be less of a consideration in high-income countries. End-users showed a preference towards TB infection tests that used existing infrastructure in their health care setting. Data from the DCE confirmed that not requiring an in-person follow-up appointment and not requiring specialist staff or equipment to interpret or administer the test were important end-user preferences for TB testing.

What would be the impact on health equity?

Qualitative evidence from reviews and end-user interviews indicates that specific TBSTs are unlikely to create any new equity issues. Rather, TBSTs are likely to improve health equity through the provision of a more accurate, low-cost test for resource-limited settings where the TST is already in use. Moreover, their portability and low cost make them suited to use in large-scale screening programmes in vulnerable, hard-to-reach communities. However, it is possible that TBSTs may not affect health equity in low-resource settings that do not already use the TST, because there are barriers to accessing skin and other health care tests in these settings, which would need to be addressed first, regardless of the type of TB test available. In terms of test accessibility, the data from the DCE found that consumers had a strong preference for testing in the community and primary care settings, compared with hospital locations; this finding could have health equity implications.

Is the intervention acceptable to key stakeholders?

Qualitative data from systematic reviews and end-user interviews suggest that TBSTs were perceived to have greater specificity and sensitivity than the TST. Having greater test accuracy was deemed desirable to avoid the negative consequences of false positives or negatives. However, TBSTs were expected to have many of the same limitations as skin tests in terms of patient experience (e.g. the need for a return visit, discomfort, a welt on the arm and stigma) compared with IGRAs. IGRAs were deemed the preferred test option in countries that already have IGRAs in use, because the required supporting infrastructure is already in place, and because TBSTs would have comparable accuracy and performance, thus would not add value. There were also broader concerns about skin tests because these tests were viewed as a dated, basic technology that is subject to human error and interpretation. Suggestions for improving the acceptability of TBSTs included careful communication during the implementation of this test, with endorsement by health care providers and organizations (e.g. WHO). Data from the DCE found strong and consistent preferences among both health care providers and consumers for tests that minimize false positive and false negative results. The DCE also found that consumers had a strong preference for testing in the community and primary care settings compared with hospital locations.

Is the intervention feasible to implement?

Findings from the qualitative evidence synthesis (reviews and end-user interviews) support the feasibility of use of TBSTs, but only in settings where the TST is already in use, and the required resourcing and training is already in place. TBST are likely to be low-cost, portable tests that can be well-suited for low-resource health care settings, which may not be able to support IGRAs owing to the greater cost and resources required to implement IGRAs. However, if health care settings already have the necessary infrastructure in place to implement IGRAs, then that is a more feasible test option than any skin tests because IGRAs do not require a return visit to read the result (a step where patients may be lost to follow-up). Results from the DCE found that not requiring an in-person follow-up appointment, or specialist staff or equipment to interpret or administer the test, were important preferences for TB testing that would influence feasibility. There was some suggestion that providers preferred more expensive tests (when offered a choice based on a hypothetical cost of $50 compared with $25), although test cost was the least important determinant of test choice.

Implementation considerations

Considerations for implementation were as follows:

  • regulatory approval from national regulatory authorities or other relevant bodies is required before implementation of in vivo diagnostic tests;
  • appropriate communication on the new class of tests is necessary, highlighting the difference between the TST and TBSTs;
  • implementation of TBSTs requires a cold chain;
  • well-trained skilled staff are needed to administer and interpret this class of tests;
  • multiuse vials will require effective operational planning and batching; hence, single-use vials or vials with fewer doses to match daily needs are preferred;
  • procurement and stock management aspects will have to be considered, as with implementing any new class of tests;
  • because the reading of the TBST results requires a second patient visit, linkage to care requires reinforcement, to decrease loss to follow-up;
  • global market availability and necessary volumes of the new class of tests must be considered; and
  • measurement of the TBST reaction size and interpretation must be standardized.
Monitoring and evaluation

Factors that will require monitoring and evaluation are as follows:

  • adverse event monitoring is a gap with the current TST use; thus, recording and reporting systems for results and adverse events need to be introduced when implementing the new tests; and
  • there is a need to monitor the linkage between results of the new class of the tests and number of people placed on TPT.
Research priorities

Research priorities are as follows:

  • specificity of Diaskintest and C-TST in populations with a low prevalence of TB infection, and direct head-to-head comparisons of all three TBST;
  • assessing the barriers for implementation and patient access;
  • additional accuracy studies on high-risk groups: children aged under 5 years, children (aged 5–10 years) and adolescents (aged 10–18 years), PLHIV, prisoners and migrants;
  • studies evaluating the epidemiologic and economic impact of TBST use in the TB infection diagnosis and TPT cascade;
  • longitudinal studies to assess the predictive value for active TB compared with current tests;
  • economic studies (e.g. cost and cost–effectiveness of TBSTs under different scenarios); and
  • studies evaluating the use of digital tools for reading of results, to avoid return patient visits.

Book navigation