11 research outputs found

    VDJtools: Unifying Post-analysis of T Cell Receptor Repertoires

    No full text
    <div><p>Despite the growing number of immune repertoire sequencing studies, the field still lacks software for analysis and comprehension of this high-dimensional data. Here we report VDJtools, a complementary software suite that solves a wide range of T cell receptor (TCR) repertoires post-analysis tasks, provides a detailed tabular output and publication-ready graphics, and is built on top of a flexible API. Using TCR datasets for a large cohort of unrelated healthy donors, twins, and multiple sclerosis patients we demonstrate that VDJtools greatly facilitates the analysis and leads to sound biological conclusions. VDJtools software and documentation are available at <a href="https://github.com/mikessh/vdjtools" target="_blank">https://github.com/mikessh/vdjtools</a>.</p></div

    Overlap and clustering of TCR repertoires.

    No full text
    <p><b>A.</b> Hierarchical clustering of healthy donor and multiple sclerosis (MS) patient samples using F pairwise similarity metric (the geometric mean of the total frequency of overlapping clonotypes in first and second sample in pair). <b>B.</b> Multi-dimensional scaling (MDS) plot. Samples were projected onto two-dimensional plane based on pairwise similarities (F metric). <b>C.</b> Permutation testing for closeness of samples coming from the same group based on MDS plot. The plot shows observed (dashed red lines) and permuted (histograms) average within-group sample distance. In contrast to control group, MS group displays highly dissimilar T-cell repertoires. N = 10,000 permutations of group labels were performed. <b>D.</b> Hierarchical clustering of samples based on the Euclidean distance between Variable segment frequency vectors. Note that the clustering provides a nice separation between sample groups (Control and MS, P = 0.013, Fisher’s exact test).</p

    TCR nucleotide sequences shared between twins are statistically different from sequences shared between unrelated individuals.

    No full text
    <p>Distribution of log<sub>10</sub> <i>P</i><sub>gen</sub>, with <i>P</i><sub>gen</sub> the probability that a sequence is generated by the VJ recombination process, for shared out-of-frame TCR alpha clonotypes between one individual and the other five. While the distribution of shared sequences between unrelated individuals (red curves) is well explained by coincidental convergent recombination as predicted by our stochastic model (blue), sequences shared between two twins (green) have an excess of low probability sequences: 31 sequences with log<sub>10</sub> <i>P</i><sub>gen</sub> < −10. For comparison the distribution of <i>P</i><sub>gen</sub> in regular (not necessarily shared) sequences is shown in black.</p

    Analysis of autologous HSCT-driven changes in T-cell repertoire.

    No full text
    <p><b>A.</b> Stacked clonotype frequency plot highlighting the details of overlap between sample MS8 (before autologous HSCT) and MS8-HSCT (post HSCT). Top 100 clonotypes based on their average frequency in those samples are shown, while other clonotypes that were observed in both samples are marked as “Not shown”. The frequency of remaining clonotypes is marked as “Not in overlap”. <b>B.</b> Changes in Variable-Joining segment pairing in CDR3 junctions changes induced by HSCT. Chord diagram is used for visualization, ribbons connecting segment pairs are scaled by corresponding V-J pair frequency. “TRB” prefix is stripped from segment names for simplicity.</p

    Sharing of alpha out-of-frame TCR clonotypes as a function of clonal abundance.

    No full text
    <p>The normalized number of shared out-of-frame alpha CDR3 nucleotide sequences between two individuals is showed as a function of clonotype abundance (e.g. normalized sharing for 2000 most abundant clones from both repertoires, 4000 most abundant, etc.), and compared to the amount of sharing that would be expected by chance (blue curve), taking into account the variable fraction of zero-insertion clonotypes as a function of their abundance. Data and predictions show excellent quantitative agreement (inset), with one fitting parameter. Error bars show one standard deviation.</p

    CDR3 junction features.

    No full text
    <p>MS patient-derived repertoire is enriched for TCR sequences with long VJ insert, partially due to high abundance of specific Variable segment regions. <b>A</b>. Length of Variable and Joining segment germline parts within CDR3 (V-germ and J-germ) and of VJ insert (VJ-junc) compared between MS donors and healthy controls. <b>B</b>. Average length of VJ junctions among all and selected V-segments (TRBV5-6,5–1,5–8,7–6 and 20–1, shown to be over-expressed in MS patients compared to controls, see main text) according to TCR sequences from repertoires of healthy donors. <b>C</b>. Comparison of VJ insert lengths between control and MS donors for clonotypes with TRBV5-6,5–1,5–8,7–6 and 20–1 segments. P-values computed using two-tailed unpaired T-test (A, C) and paired T-test (B).</p

    Estimation of repertoire diversity using multinomial model.

    No full text
    <p><b>A.</b> Rarefaction analysis of repertoire samples from healthy donors and multiple sclerosis patients. The number of unique clonotypes in a sub-sample plotted against its size (number of T-cell receptor cDNA molecules, TRBM). Solid and dashed lines are diversity estimates computed by interpolating and extrapolating using a multinomial model respectively [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004503#pcbi.1004503.ref029" target="_blank">29</a>]. Note that generally rarefaction curves for MS samples go below those of control donors. Post-HSCT sample (MS8-HSCT) displays the lowest diversity. <b>B.</b> Comparison of repertoire diversity using normalized Chao1 estimate. Normalization is performed by down-sampling datasets to the size of smallest dataset and computing the estimate for resulting datasets (mean estimate value from n = 3 re-samples is used). MS8-HSCT sample is discarded from calculations. *—P = 0.022, two-tailed T-test; effect size estimated by Cohen’s d is 0.98.</p

    Overview of VDJtools software package.

    No full text
    <p>VDJtools analysis routines can be grouped into 6 modules and are applicable to results produced by commonly used immune repertoire sequencing processing software. Basic statistics and segment usage module include general statistics (clonotype and read count, number and frequency of non-coding clonotypes, convergent recombination of CDR3 amino acid sequences, insert size statistics, etc), spectratyping (distribution of clonotype frequency by CDR3 length), Variable and Joining segment usage profiles and their pairing frequency in re-arranged receptor junction sequences. Repertoire overlap module includes routines for computing sets of overlapping clonotypes and their characteristics, and scatter plots of clonotype frequencies. Diversity analysis includes routines for visualizing clonotype frequency distribution, computing repertoire diversity estimates and rarefaction plots. The fourth set of routines can be used to create clonotype abundance profiles and track clonotypes in time course of vaccination, myeloablation and blood cell transplant. Sample clustering is implemented based on computed repertoire similarity measures and could be used to distinguish various biological conditions, cell subsets and tissues. Auxiliary routines provide means for clonotype table filtering (e.g. by segment usage or non-coding CDR3 sequence) as well as annotation with custom or pre-built pathogen-specific clonotype database. VDJtools can be incorporated in Java programming language-based pipelines as demonstrated by VDJviz clonotype browser.</p

    The number of inserted nucleotides in in-frame TCR beta clonotypes depends on their abundance.

    No full text
    <p><b>A</b>. Mean numbers of insertions were obtained by analysing groups of 3000 sequences of decreasing abundance. Clonotypes from the cord blood (black) show a strong dependence on abundance, with high-abundance clones having much fewer insertions than low-abundance ones. Clonotypes in a young adult naive repertoire (blue) show a similar but less marked trend. Naive clonotypes in older adults (violet and green) show an even weaker trend. Adult memory samples of all ages show no dependence at all (red, yellow and maroon). Error bars show 2 standard errors. <b>B</b>. Probability distributions of the number of insertions in two rank classes, for young naive and cord-blood samples (ranks 1-3000 on top, ranks 45001-48000 on bottom). For high-ranking sequences, the probability of having zero insertions is high both for adult naive and cord blood samples. For middle-ranking sequences, the probability of 0 insertions is much lower, and the distributions are similar between adult naive and cord-blood samples. <b>C</b>. Fraction of clonotypes with zero insertions for different abundance classes. Error bars show one standard deviation. We present the analysis for independently published cord blood donors and different bin sizes in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005572#pcbi.1005572.s012" target="_blank">S11</a> and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005572#pcbi.1005572.s011" target="_blank">S10</a> Figs respectively.</p

    Lifetime of abundant in-frame TCR beta clonotypes with zero insertions.

    No full text
    <p>The fraction of zero-insertion clonotypes among the 2000 most abundant clonotypes in the unpartitioned repertoire as a function of age (black circles) is well fitted by an exponentially decaying function of time (black curve). This decay is faster than would be predicted from the decay of the naive compartment alone (red curve), indicating a slow decay of zero-insertion clonotypes of fetal origin. Red diamonds show percentage of naive T-cells measured using flow cytometry (see [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005572#pcbi.1005572.ref023" target="_blank">23</a>] for details). Scale of red axis was chosen so that the two decay curves start at the same point at age 0, and have the same long-time limit. We present the analysis for different bin sizes in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005572#pcbi.1005572.s011" target="_blank">S10 Fig</a>.</p
    corecore