TY - JOUR
T1 - Using genetic data to identify transmission risk factors
T2 - Statistical assessment and application to tuberculosis transmission
AU - Goldstein, Isaac H.
AU - Bayer, Damon
AU - Barilar, Ivan
AU - Kizito, Balladiah
AU - Matsiri, Ogopotse
AU - Modongo, Chawangwa
AU - Zetola, Nicola M.
AU - Niemann, Stefan
AU - Minin, Volodymyr M.
AU - Shin, Sanghyuk S.
N1 - Funding Information:
This work was supported by the National Institutes of Health (R01AI147336 to IG, BK, OM, CM, SN, VMN, SSS; R01AI097045 to NMZ). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank Xavier Didelot for helping us resolve issues using TransPhylo and Paul Bastide and Sebastian Lequime for their help using nosoi. We also thank Caroline Colijn, Jessica Stockdale and Vijay Jeevanantham Naidu for sharing their code using TransPhylo and BEAST2.
Publisher Copyright:
© 2022 Goldstein et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2022/12
Y1 - 2022/12
N2 - Identifying host factors that influence infectious disease transmission is an important step toward developing interventions to reduce disease incidence. Recent advances in methods for reconstructing infectious disease transmission events using pathogen genomic and epidemiological data open the door for investigation of host factors that affect onward transmission. While most transmission reconstruction methods are designed to work with densely sampled outbreaks, these methods are making their way into surveillance studies, where the fraction of sampled cases with sequenced pathogens could be relatively low. Surveillance studies that use transmission event reconstruction then use the reconstructed events as response variables (i.e., infection source status of each sampled case) and use host characteristics as predictors (e.g., presence of HIV infection) in regression models. We use simulations to study estimation of the effect of a host factor on probability of being an infection source via this multi-step inferential procedure. Using TransPhylo—a widely-used method for Bayesian estimation of infectious disease transmission events—and logistic regression, we find that low sensitivity of identifying infection sources leads to dilution of the signal, biasing logistic regression coefficients toward zero. We show that increasing the proportion of sampled cases improves sensitivity and some, but not all properties of the logistic regression inference. Application of these approaches to real world data from a population-based TB study in Botswana fails to detect an association between HIV infection and probability of being a TB infection source. We conclude that application of a pipeline, where one first uses TransPhylo and sparsely sampled surveillance data to infer transmission events and then estimates effects of host characteristics on probabilities of these events, should be accompanied by a realistic simulation study to better understand biases stemming from imprecise transmission event inference.
AB - Identifying host factors that influence infectious disease transmission is an important step toward developing interventions to reduce disease incidence. Recent advances in methods for reconstructing infectious disease transmission events using pathogen genomic and epidemiological data open the door for investigation of host factors that affect onward transmission. While most transmission reconstruction methods are designed to work with densely sampled outbreaks, these methods are making their way into surveillance studies, where the fraction of sampled cases with sequenced pathogens could be relatively low. Surveillance studies that use transmission event reconstruction then use the reconstructed events as response variables (i.e., infection source status of each sampled case) and use host characteristics as predictors (e.g., presence of HIV infection) in regression models. We use simulations to study estimation of the effect of a host factor on probability of being an infection source via this multi-step inferential procedure. Using TransPhylo—a widely-used method for Bayesian estimation of infectious disease transmission events—and logistic regression, we find that low sensitivity of identifying infection sources leads to dilution of the signal, biasing logistic regression coefficients toward zero. We show that increasing the proportion of sampled cases improves sensitivity and some, but not all properties of the logistic regression inference. Application of these approaches to real world data from a population-based TB study in Botswana fails to detect an association between HIV infection and probability of being a TB infection source. We conclude that application of a pipeline, where one first uses TransPhylo and sparsely sampled surveillance data to infer transmission events and then estimates effects of host characteristics on probabilities of these events, should be accompanied by a realistic simulation study to better understand biases stemming from imprecise transmission event inference.
UR - http://www.scopus.com/inward/record.url?scp=85144587363&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85144587363&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1010696
DO - 10.1371/journal.pcbi.1010696
M3 - Article
C2 - 36469509
AN - SCOPUS:85144587363
SN - 1553-734X
VL - 18
JO - PLoS Computational Biology
JF - PLoS Computational Biology
IS - 12
M1 - e1010696
ER -