Model for comparative analysis of antigen receptor repertoires

Grzegorz A. Rempala, Michał Seweryn, Leszek Ignatowicz

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

In modern molecular biology one of the standard ways of analyzing a vertebrate immune system is to sequence and compare the counts of specific antigen receptor clones (either immunoglobulins or T-cell receptors) derived from various tissues under different experimental or clinical conditions. The resulting statistical challenges are difficult and do not fit readily into the standard statistical framework of contingency tables primarily due to the serious under-sampling of the receptor populations. This under-sampling is caused, on one hand, by the extreme diversity of antigen receptor repertoires maintained by the immune system and, on the other, by the high cost and labor intensity of the receptor data collection process. In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of the species overlap and diversity borrowed from ecological studies. While this approach is robust in a wide range of situations, it seems to provide little insight into the underlying clonal size distribution and the overall mechanism differentiating the receptor populations. As a possible alternative, the current paper presents a parametric method that adjusts for the data under-sampling as well as provides a unifying approach to a simultaneous comparison of multiple receptor groups by means of the modern statistical tools of unsupervised learning. The parametric model is based on a flexible multivariate Poisson-lognormal distribution and is seen to be a natural generalization of the univariate Poisson-lognormal models used in the ecological studies of biodiversity patterns. The procedure for evaluating a model's fit is described along with the public domain software developed to perform the necessary diagnostics. The model-driven analysis is seen to compare favorably vis a vis traditional methods when applied to the data from T-cell receptors in transgenic mice populations.

Original languageEnglish (US)
Pages (from-to)1-15
Number of pages15
JournalJournal of Theoretical Biology
Volume269
Issue number1
DOIs
StatePublished - Jan 21 2011

Fingerprint

Antigen Receptors
Antigens
Comparative Analysis
Receptor
antigens
receptors
T-cells
Immune system
Sampling
T-Cell Antigen Receptor
Population
Immune System
Poisson Distribution
Poisson distribution
Molecular biology
Unsupervised learning
Public Sector
Biodiversity
Model
Transgenic Mice

Keywords

  • Computational immunology
  • Lognormal distribution
  • Poisson abundance models
  • Species diversity estimation
  • T-cell antigen receptors

ASJC Scopus subject areas

  • Statistics and Probability
  • Medicine(all)
  • Modeling and Simulation
  • Immunology and Microbiology(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics

Cite this

Model for comparative analysis of antigen receptor repertoires. / Rempala, Grzegorz A.; Seweryn, Michał; Ignatowicz, Leszek.

In: Journal of Theoretical Biology, Vol. 269, No. 1, 21.01.2011, p. 1-15.

Research output: Contribution to journalArticle

Rempala, Grzegorz A. ; Seweryn, Michał ; Ignatowicz, Leszek. / Model for comparative analysis of antigen receptor repertoires. In: Journal of Theoretical Biology. 2011 ; Vol. 269, No. 1. pp. 1-15.
@article{e7396b6c906048448706a265c49e0127,
title = "Model for comparative analysis of antigen receptor repertoires",
abstract = "In modern molecular biology one of the standard ways of analyzing a vertebrate immune system is to sequence and compare the counts of specific antigen receptor clones (either immunoglobulins or T-cell receptors) derived from various tissues under different experimental or clinical conditions. The resulting statistical challenges are difficult and do not fit readily into the standard statistical framework of contingency tables primarily due to the serious under-sampling of the receptor populations. This under-sampling is caused, on one hand, by the extreme diversity of antigen receptor repertoires maintained by the immune system and, on the other, by the high cost and labor intensity of the receptor data collection process. In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of the species overlap and diversity borrowed from ecological studies. While this approach is robust in a wide range of situations, it seems to provide little insight into the underlying clonal size distribution and the overall mechanism differentiating the receptor populations. As a possible alternative, the current paper presents a parametric method that adjusts for the data under-sampling as well as provides a unifying approach to a simultaneous comparison of multiple receptor groups by means of the modern statistical tools of unsupervised learning. The parametric model is based on a flexible multivariate Poisson-lognormal distribution and is seen to be a natural generalization of the univariate Poisson-lognormal models used in the ecological studies of biodiversity patterns. The procedure for evaluating a model's fit is described along with the public domain software developed to perform the necessary diagnostics. The model-driven analysis is seen to compare favorably vis a vis traditional methods when applied to the data from T-cell receptors in transgenic mice populations.",
keywords = "Computational immunology, Lognormal distribution, Poisson abundance models, Species diversity estimation, T-cell antigen receptors",
author = "Rempala, {Grzegorz A.} and Michał Seweryn and Leszek Ignatowicz",
year = "2011",
month = "1",
day = "21",
doi = "10.1016/j.jtbi.2010.10.001",
language = "English (US)",
volume = "269",
pages = "1--15",
journal = "Journal of Theoretical Biology",
issn = "0022-5193",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - Model for comparative analysis of antigen receptor repertoires

AU - Rempala, Grzegorz A.

AU - Seweryn, Michał

AU - Ignatowicz, Leszek

PY - 2011/1/21

Y1 - 2011/1/21

N2 - In modern molecular biology one of the standard ways of analyzing a vertebrate immune system is to sequence and compare the counts of specific antigen receptor clones (either immunoglobulins or T-cell receptors) derived from various tissues under different experimental or clinical conditions. The resulting statistical challenges are difficult and do not fit readily into the standard statistical framework of contingency tables primarily due to the serious under-sampling of the receptor populations. This under-sampling is caused, on one hand, by the extreme diversity of antigen receptor repertoires maintained by the immune system and, on the other, by the high cost and labor intensity of the receptor data collection process. In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of the species overlap and diversity borrowed from ecological studies. While this approach is robust in a wide range of situations, it seems to provide little insight into the underlying clonal size distribution and the overall mechanism differentiating the receptor populations. As a possible alternative, the current paper presents a parametric method that adjusts for the data under-sampling as well as provides a unifying approach to a simultaneous comparison of multiple receptor groups by means of the modern statistical tools of unsupervised learning. The parametric model is based on a flexible multivariate Poisson-lognormal distribution and is seen to be a natural generalization of the univariate Poisson-lognormal models used in the ecological studies of biodiversity patterns. The procedure for evaluating a model's fit is described along with the public domain software developed to perform the necessary diagnostics. The model-driven analysis is seen to compare favorably vis a vis traditional methods when applied to the data from T-cell receptors in transgenic mice populations.

AB - In modern molecular biology one of the standard ways of analyzing a vertebrate immune system is to sequence and compare the counts of specific antigen receptor clones (either immunoglobulins or T-cell receptors) derived from various tissues under different experimental or clinical conditions. The resulting statistical challenges are difficult and do not fit readily into the standard statistical framework of contingency tables primarily due to the serious under-sampling of the receptor populations. This under-sampling is caused, on one hand, by the extreme diversity of antigen receptor repertoires maintained by the immune system and, on the other, by the high cost and labor intensity of the receptor data collection process. In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of the species overlap and diversity borrowed from ecological studies. While this approach is robust in a wide range of situations, it seems to provide little insight into the underlying clonal size distribution and the overall mechanism differentiating the receptor populations. As a possible alternative, the current paper presents a parametric method that adjusts for the data under-sampling as well as provides a unifying approach to a simultaneous comparison of multiple receptor groups by means of the modern statistical tools of unsupervised learning. The parametric model is based on a flexible multivariate Poisson-lognormal distribution and is seen to be a natural generalization of the univariate Poisson-lognormal models used in the ecological studies of biodiversity patterns. The procedure for evaluating a model's fit is described along with the public domain software developed to perform the necessary diagnostics. The model-driven analysis is seen to compare favorably vis a vis traditional methods when applied to the data from T-cell receptors in transgenic mice populations.

KW - Computational immunology

KW - Lognormal distribution

KW - Poisson abundance models

KW - Species diversity estimation

KW - T-cell antigen receptors

UR - http://www.scopus.com/inward/record.url?scp=77958138360&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77958138360&partnerID=8YFLogxK

U2 - 10.1016/j.jtbi.2010.10.001

DO - 10.1016/j.jtbi.2010.10.001

M3 - Article

VL - 269

SP - 1

EP - 15

JO - Journal of Theoretical Biology

JF - Journal of Theoretical Biology

SN - 0022-5193

IS - 1

ER -