16 research outputs found
Vaccination shapes evolutionary trajectories of SARS-CoV-2
The large-scale evolution of the SARS-CoV-2 virus has been marked by rapid
turnover of genetic clades. New variants show intrinsic changes, notably
increased transmissibility, as well as antigenic changes that reduce the
cross-immunity induced by previous infections or vaccinations. How this
functional variation shapes the global evolutionary dynamics has remained
unclear. Here we show that selection induced by vaccination impacts on the
recent antigenic evolution of SARS-CoV-2; other relevant forces include
intrinsic selection and antigenic selection induced by previous infections. We
obtain these results from a fitness model with intrinsic and antigenic fitness
components. To infer model parameters, we combine time-resolved sequence data,
epidemiological records, and cross-neutralisation assays. This model accurately
captures the large-scale evolutionary dynamics of SARS-CoV-2 in multiple
geographical regions. In particular, it quantifies how recent vaccinations and
infections affect the speed of frequency shifts between viral variants. Our
results show that timely neutralisation data can be harvested to identify
hotspots of antigenic selection and to predict the impact of vaccination on
viral evolution
Fierce selection and interference in B-cell repertoire response to chronic HIV-1
During chronic infection, HIV-1 engages in a rapid coevolutionary arms race
with the host's adaptive immune system. While it is clear that HIV exerts
strong selection on the adaptive immune system, the characteristics of the
somatic evolution that shape the immune response are still unknown. Traditional
population genetics methods fail to distinguish chronic immune response from
healthy repertoire evolution. Here, we infer the evolutionary modes of B-cell
repertoires and identify complex dynamics with a constant production of better
B-cell receptor mutants that compete, maintaining large clonal diversity and
potentially slowing down adaptation. A substantial fraction of mutations that
rise to high frequencies in pathogen engaging CDRs of B-cell receptors (BCRs)
are beneficial, in contrast to many such changes in structurally relevant
frameworks that are deleterious and circulate by hitchhiking. We identify a
pattern where BCRs in patients who experience larger viral expansions undergo
stronger selection with a rapid turnover of beneficial mutations due to clonal
interference in their CDR3 regions. Using population genetics modeling, we show
that the extinction of these beneficial mutations can be attributed to the rise
of competing beneficial alleles and clonal interference. The picture is of a
dynamic repertoire, where better clones may be outcompeted by new mutants
before they fix
Significance analysis and statistical mechanics: an application to clustering
This paper addresses the statistical significance of structures in random
data: Given a set of vectors and a measure of mutual similarity, how likely
does a subset of these vectors form a cluster with enhanced similarity among
its elements? The computation of this cluster p-value for randomly distributed
vectors is mapped onto a well-defined problem of statistical mechanics. We
solve this problem analytically, establishing a connection between the physics
of quenched disorder and multiple testing statistics in clustering and related
problems. In an application to gene expression data, we find a remarkable link
between the statistical significance of a cluster and the functional
relationships between its genes.Comment: to appear in Phys. Rev. Let
Identification of unique neoantigen qualities in long-term survivors of pancreatic cancer
Pancreatic ductal adenocarcinoma is a lethal cancer with fewer than 7% of patients surviving past 5 years. T-cell immunity has been linked to the exceptional outcome of the few long-term survivors1,2, yet the relevant antigens remain unknown. Here we use genetic, immunohistochemical and transcriptional immunoprofiling, computational biophysics, and functional assays to identify T-cell antigens in long-term survivors of pancreatic cancer. Using whole-exome sequencing and in silico neoantigen prediction, we found that tumours with both the highest neoantigen number and the most abundant CD8+ T-cell infiltrates, but neither alone, stratified patients with the longest survival. Investigating the specific neoantigen qualities promoting T-cell activation in long-term survivors, we discovered that these individuals were enriched in neoantigen qualities defined by a fitness model, and neoantigens in the tumour antigen MUC16 (also known as CA125). A neoantigen quality fitness model conferring greater immunogenicity to neoantigens with differential presentation and homology to infectious disease-derived peptides identified long-term survivors in two independent datasets, whereas a neoantigen quantity model ascribing greater immunogenicity to increasing neoantigen number alone did not. We detected intratumoural and lasting circulating T-cell reactivity to both high-quality and MUC16 neoantigens in long-term survivors of pancreatic cancer, including clones with specificity to both high-quality neoantigens and predicted cross-reactive microbial epitopes, consistent with neoantigen molecular mimicry. Notably, we observed selective loss of high-quality and MUC16 neoantigenic clones on metastatic progression, suggesting neoantigen immunoediting. Our results identify neoantigens with unique qualities as T-cell targets in pancreatic ductal adenocarcinoma. More broadly, we identify neoantigen quality as a biomarker for immunogenic tumours that may guide the application of immunotherapies
Cluster-Statistik und Genexpressionanalyse
Clustering, which involves dividing data elements into classes based on their
observed properties, is one of the main tools in exploratory data analysis. It
is used widely in the analysis of gene expression, where one searches for
structures related to the underlying biological mechanisms. Clusters of gene
expression patterns are a signature of a common regulatory process of the
involved genes. Clusters of experimental conditions, e.g. tissues in an
organism, imply similar states of cell differentiation. The latter property is
used in the tumour sample classification. This thesis establishes a
statistical grounding for cluster analysis in high-dimensional data. The
methods used in the thesis are strongly influenced by solutions from the field
of statistical mechanics. The basic concepts and computational methods of
statistical mechanics are summarised in Chapter 2. In Chapter 3, we propose
probabilistic models for vectors in high-dimensional real space. Motivated by
the characteristics of gene expression data, we discuss different properties
defining a cluster: point density, positional bias, and directional density
(defined in Chapter 3). These properties are related to different choices of a
similarity measure and of a background distribution for unclustered vectors.
We consider several combinations of such background distributions and
similarity measures, and we arrive at well-defined scoring schemes for
clusters. Clusters in data usually arise due to an underlying functional
mechanism. However, even unrelated vectors drawn from the background
distribution can form agglomerations which by chance resemble clusters and
yield high cluster scores. In Chapter 4, we address the problem of the
statistical significance of clusters. For the scoring schemes proposed in
Chapter 3, we compute the cluster score p-value, which tells how likely it is
to observe a group of random vectors with the same or higher score. Our
analytical solution is based on a mapping to a problem from the statistical
mechanics of disordered systems. In an application to yeast gene expression
data, we show that the cluster score p-value is in agreement with the
biological significance of clustered genes, as measured by enrichment of
considered clusters in gene ontology terms (i.e. known functional annotations
of genes). In Chapter 5, we focus on another important aspect of the
statistics of high-dimensional data: dependencies between vector components.
Such dependencies are prevalent in gene expression data, for example between
subsequent time points in time-course experiments. Correct estimation of such
dependencies is crucial both for clustering of experimental conditions, and
for computation of similarities of gene expression vectors. Here, we show that
the estimation of vector-component dependencies requires accounting for an
important confounding factor: the presence of clusters of data vectors. We
propose a mixture-model-based inference method, which disentangles the
spurious effect of clusters from the true signal. We successfully apply our
method to the problem of tumour sample classification. In Chapter 6, we
propose the significance-based clustering algorithm. The algorithm seeks the
best representation of data as a mixture of the background and of clusters
characterised by a statistically significant score. In the implementation of
this approach, we draw from all concepts discussed in the preceding chapters
of this thesis: In the process of finding clusters of vectors, the algorithm
estimates the metric which accounts for dependencies between components of the
vectors. Further, using the probabilistic framework of the mixture-model, it
assigns low prior probability, and effectively penalises, clusters with high
cluster score p-value. In application to gene-expression data of yeast and
human, we show that the significance-constraint improves the biological
significance of resulting clusters.Clustering, das Gruppieren von Datenpunkten aufgrund ihrer beobachteten
Eigenschaften, ist eines der wichtigsten Werkzeuge in der Datenanalyse. Es
wird haeufig in der Analyse von Genexpressionsdaten verwendet, um Gene zu
identifizieren, die aehnliche biologischen Funktionen haben. Cluster von
Genexpressionsmustern lassen oft auf einen gemeinsamen regulatorischen Prozess
der beteiligten Gene schliessen. Cluster von experimentellen Bedingungen, z.B.
von unterschiedlichen Geweben in einem Organismus, sind ein Hinweis auf einen
aehnlichen Zustand der Zelldifferenzierung. Die zuletzt genannte Eigenschaft
wird haeufig zur Klassifikation von Tumordaten verwendet. Diese Dissertation
etabliert statistische Grundlagen fuer Clustering in hochdimensionalen Daten.
Die neu eingefuehrten Methoden basieren zu grossen Teilen auf Erkenntnissen
der statistischen Mechanik. Zuerst werden deshalb in Kapitel 2 grundlegende
Konzepte und Algorithmen der statistischen Mechanik eingefuehrt. In Kapitel 3
wird ein neues probabilistisches Model fuer Cluster im hochdimensionalen
realen Raum vorgeschlagen. Motiviert durch die Merkmale von
Genexpressionsdaten werden verschiedene Observablen eines Clusters definiert:
Punktdichte, Positions-Bias und Richtungsdichte. Diese Observablen messen in
verschiedener Weise Aehnlichkeiten zwischen Datenpunkten und beschreiben die
Hintergrundverteilung zufaelliger Datenpunkte. Daraus wird eine sogenannte
Score-Funktionen fuer Cluster abgeleitet. Obwohl Gene mit aehnlicher Funktion
mit hoher Wahrscheinlichkeit Cluster in Genexpressionsdaten bilden, koennen
auch zufaellig verteilte Datenvektoren Cluster bilden und hohe Cluster-Scores
erhalten. In Kapitel 4 wird deshalb die statistische Signifikanz fuer Cluster
behandelt. Fuer die Score-Funktionen aus Kapitel 3 werden Verfahren zur
Berechnung eines sogenannten p-Wertes vorgestellt. Der Funktion p(S) gibt die
Wahrscheinlickeit an, dass Zufallsvektoren einen Cluster-Score von mindestens
S erhalten. Dieses Problem wir mit Methoden der statistischen Mechanik
ungeordenter Systeme behandelt, die zu einer analytischen Loesung fuehren. In
einer Anwendung auf Genexpressionsdaten aus Hefe wird gezeigt, dass Cluster-
Scores p-Werte biologische Signifikanz von co-exprimierten Genen
widerspiegeln; die biologische Signifikanz wird hierbei durch Gen-Ontologie-
Parameter in den betrachteten Clustern gemessen. Dies zeigt, dass Gene mit
aehnlichen biologischen Funktionen in der Tat als signifikante Cluster
identifiert werden. In Kapitel 5 wird ein weiterer wichtiger Aspekt
statistischer Methoden fuer hochdimensionale Daten behandelt: Abhaengigkeiten
zwischen Vektorkomponenten. Solche Abhaengigkeiten sind haeufig in
Genexpressiondaten zu finden, beispielweise verursacht durch zeitlich
aufeinanderfolgende Experimente im Rahmen von Zeitreihenexperimenten. Eine
korrekte Abschaetzung solcher Abhaengigkeiten ist sowohl fuer das Clustering
von experimentellen Bedingungen als auch zur Berechnung der Aehnlichkeiten von
Genen von entscheidender Bedeutung. Fuer die Abschaetzung von Abhaengigkeiten
von Vektorkomponenten ist die Beruecksichtigung eines wichtigen Stoerfaktors
notwendig: das Vorhandensein von Clustern von Datenvektoren. Wir schlagen eine
Inferenzmethode basierend auf einer Mischverteilung vor, welche das zufaellige
Auftreten von Clustern vom wahren Signal trennt. In unserem Ansatz verwenden
wir die probabilistischen Modelle fuer Cluster aus Kapitel 3. Wir wenden diese
Methode auf das Problem der Tumorprobenklassifizierung an. In Kapitel 6 wird
der Algorithmus zur Berechnung von signifikanzbasiertem Clustering
vorgestellt. Der Algorithmus sucht die beste Zerlegung der Daten als Mischung
von zufaelligen Datenvektoren (aus der Hintergrundverteilung) und statistisch
signifikanten Clustern im Sinne unserer Theorie. Beim Auffinden von Clustern
von Datenvektoren schaetzt der Algorithmus ab, welches Aehnlichkeitsmass die
Abhaengigkeiten zwischen Vektorkomponenten am besten bescheibt. Des weiteren
erlaubt die probabilistische Mischverteilung die Verwendung von
Ausgangwahrscheinlichkeiten, die Cluster mit grossen p-Werten bestraft. In
einer Anwendung auf Genexpressionsdaten von Hefe und Mensch wird gezeigt, dass
dieser Mischverteilungs-Ansatz die biologische Signifikanz der erhaltenen
Cluster erhoeht
Can we read the future from a tree?
Overall view, front from above; This marble bust of the lover of the emperor Hadrian (ruled 117-138) was present in the Louvreâs Salle des Antiques in 1793. It was long confused with a bronze sculpture confiscated from the ChĂąteau dâĂcouen in the same year, then transferred to Versailles before later joining the Louvre. (Antiquities from the royal residences were confiscated with the property of the Crown in 1792 and put on display at the Louvre.) Both sculptures reproduce a bust found during the Renaissance that probably came from the Villa Hadriana. The Louvre bust comes from the French royal collections and is an 18th-century copy of the Roman work. Source: Louvre Museum [website]; http://www.louvre.fr/ (accessed 4/15/2011
Two-Stage Model-Based Clustering for Liquid Chromatography Mass Spectrometry Data Analysis
Proteomic mass spectrometry is gaining an increasing role in diagnostics and in studies on protein complexes and biological systems. This experimental technology is producing high-throughput data which is inherently noisy and may contain various errors. Mathematical processing can help in removing them.
Predictive Modeling of Influenza Shows the Promise of Applied Evolutionary Biology
Seasonal influenza is controlled through vaccination campaigns. Evolution of influenza virus antigens means that vaccines must be updated to match novel strains, and vaccine effectiveness depends on the ability of scientists to predict nearly a year in advance which influenza variants will dominate in upcoming seasons. In this review, we highlight a promising new surveillance tool: predictive models. Developed through data-sharing and close collaboration between the World Health Organization and academic scientists, these models use surveillance data to make quantitative predictions regarding influenza evolution. Predictive models demonstrate the potential of applied evolutionary biology to improve public health and disease control. We review the state of influenza predictive modeling and discuss next steps and recommendations to ensure that these models deliver upon their considerable biomedical promise
Recommended from our members
A neoantigen fitness model predicts tumour response to checkpoint blockade immunotherapy
Checkpoint blockade immunotherapies enable the host immune system to recognize and destroy tumour cells. Their clinical activity has been correlated with activated T-cell recognition of neoantigens, which are tumour-specific, mutated peptides presented on the surface of cancer cells. Here we present a fitness model for tumours based on immune interactions of neoantigens that predicts response to immunotherapy. Two main factors determine neoantigen fitness: the likelihood of neoantigen presentation by the major histocompatibility complex (MHC) and subsequent recognition by T cells. We estimate these components using the relative MHC binding affinity of each neoantigen to its wild type and a nonlinear dependence on sequence similarity of neoantigens to known antigens. To describe the evolution of a heterogeneous tumour, we evaluate its fitness as a weighted effect of dominant neoantigens in the subclones of the tumour. Our model predicts survival in anti-CTLA-4-treated patients with melanoma and anti-PD-1-treated patients with lung cancer. Importantly, low-fitness neoantigens identified by our method may be leveraged for developing novel immunotherapies. By using an immune fitness model to study immunotherapy, we reveal broad similarities between the evolution of tumours and rapidly evolving pathogens