16 research outputs found

    Vaccination shapes evolutionary trajectories of SARS-CoV-2

    Full text link
    The large-scale evolution of the SARS-CoV-2 virus has been marked by rapid turnover of genetic clades. New variants show intrinsic changes, notably increased transmissibility, as well as antigenic changes that reduce the cross-immunity induced by previous infections or vaccinations. How this functional variation shapes the global evolutionary dynamics has remained unclear. Here we show that selection induced by vaccination impacts on the recent antigenic evolution of SARS-CoV-2; other relevant forces include intrinsic selection and antigenic selection induced by previous infections. We obtain these results from a fitness model with intrinsic and antigenic fitness components. To infer model parameters, we combine time-resolved sequence data, epidemiological records, and cross-neutralisation assays. This model accurately captures the large-scale evolutionary dynamics of SARS-CoV-2 in multiple geographical regions. In particular, it quantifies how recent vaccinations and infections affect the speed of frequency shifts between viral variants. Our results show that timely neutralisation data can be harvested to identify hotspots of antigenic selection and to predict the impact of vaccination on viral evolution

    Fierce selection and interference in B-cell repertoire response to chronic HIV-1

    Full text link
    During chronic infection, HIV-1 engages in a rapid coevolutionary arms race with the host's adaptive immune system. While it is clear that HIV exerts strong selection on the adaptive immune system, the characteristics of the somatic evolution that shape the immune response are still unknown. Traditional population genetics methods fail to distinguish chronic immune response from healthy repertoire evolution. Here, we infer the evolutionary modes of B-cell repertoires and identify complex dynamics with a constant production of better B-cell receptor mutants that compete, maintaining large clonal diversity and potentially slowing down adaptation. A substantial fraction of mutations that rise to high frequencies in pathogen engaging CDRs of B-cell receptors (BCRs) are beneficial, in contrast to many such changes in structurally relevant frameworks that are deleterious and circulate by hitchhiking. We identify a pattern where BCRs in patients who experience larger viral expansions undergo stronger selection with a rapid turnover of beneficial mutations due to clonal interference in their CDR3 regions. Using population genetics modeling, we show that the extinction of these beneficial mutations can be attributed to the rise of competing beneficial alleles and clonal interference. The picture is of a dynamic repertoire, where better clones may be outcompeted by new mutants before they fix

    Significance analysis and statistical mechanics: an application to clustering

    Full text link
    This paper addresses the statistical significance of structures in random data: Given a set of vectors and a measure of mutual similarity, how likely does a subset of these vectors form a cluster with enhanced similarity among its elements? The computation of this cluster p-value for randomly distributed vectors is mapped onto a well-defined problem of statistical mechanics. We solve this problem analytically, establishing a connection between the physics of quenched disorder and multiple testing statistics in clustering and related problems. In an application to gene expression data, we find a remarkable link between the statistical significance of a cluster and the functional relationships between its genes.Comment: to appear in Phys. Rev. Let

    Identification of unique neoantigen qualities in long-term survivors of pancreatic cancer

    Get PDF
    Pancreatic ductal adenocarcinoma is a lethal cancer with fewer than 7% of patients surviving past 5 years. T-cell immunity has been linked to the exceptional outcome of the few long-term survivors1,2, yet the relevant antigens remain unknown. Here we use genetic, immunohistochemical and transcriptional immunoprofiling, computational biophysics, and functional assays to identify T-cell antigens in long-term survivors of pancreatic cancer. Using whole-exome sequencing and in silico neoantigen prediction, we found that tumours with both the highest neoantigen number and the most abundant CD8+ T-cell infiltrates, but neither alone, stratified patients with the longest survival. Investigating the specific neoantigen qualities promoting T-cell activation in long-term survivors, we discovered that these individuals were enriched in neoantigen qualities defined by a fitness model, and neoantigens in the tumour antigen MUC16 (also known as CA125). A neoantigen quality fitness model conferring greater immunogenicity to neoantigens with differential presentation and homology to infectious disease-derived peptides identified long-term survivors in two independent datasets, whereas a neoantigen quantity model ascribing greater immunogenicity to increasing neoantigen number alone did not. We detected intratumoural and lasting circulating T-cell reactivity to both high-quality and MUC16 neoantigens in long-term survivors of pancreatic cancer, including clones with specificity to both high-quality neoantigens and predicted cross-reactive microbial epitopes, consistent with neoantigen molecular mimicry. Notably, we observed selective loss of high-quality and MUC16 neoantigenic clones on metastatic progression, suggesting neoantigen immunoediting. Our results identify neoantigens with unique qualities as T-cell targets in pancreatic ductal adenocarcinoma. More broadly, we identify neoantigen quality as a biomarker for immunogenic tumours that may guide the application of immunotherapies

    Cluster-Statistik und Genexpressionanalyse

    No full text
    Clustering, which involves dividing data elements into classes based on their observed properties, is one of the main tools in exploratory data analysis. It is used widely in the analysis of gene expression, where one searches for structures related to the underlying biological mechanisms. Clusters of gene expression patterns are a signature of a common regulatory process of the involved genes. Clusters of experimental conditions, e.g. tissues in an organism, imply similar states of cell differentiation. The latter property is used in the tumour sample classification. This thesis establishes a statistical grounding for cluster analysis in high-dimensional data. The methods used in the thesis are strongly influenced by solutions from the field of statistical mechanics. The basic concepts and computational methods of statistical mechanics are summarised in Chapter 2. In Chapter 3, we propose probabilistic models for vectors in high-dimensional real space. Motivated by the characteristics of gene expression data, we discuss different properties defining a cluster: point density, positional bias, and directional density (defined in Chapter 3). These properties are related to different choices of a similarity measure and of a background distribution for unclustered vectors. We consider several combinations of such background distributions and similarity measures, and we arrive at well-defined scoring schemes for clusters. Clusters in data usually arise due to an underlying functional mechanism. However, even unrelated vectors drawn from the background distribution can form agglomerations which by chance resemble clusters and yield high cluster scores. In Chapter 4, we address the problem of the statistical significance of clusters. For the scoring schemes proposed in Chapter 3, we compute the cluster score p-value, which tells how likely it is to observe a group of random vectors with the same or higher score. Our analytical solution is based on a mapping to a problem from the statistical mechanics of disordered systems. In an application to yeast gene expression data, we show that the cluster score p-value is in agreement with the biological significance of clustered genes, as measured by enrichment of considered clusters in gene ontology terms (i.e. known functional annotations of genes). In Chapter 5, we focus on another important aspect of the statistics of high-dimensional data: dependencies between vector components. Such dependencies are prevalent in gene expression data, for example between subsequent time points in time-course experiments. Correct estimation of such dependencies is crucial both for clustering of experimental conditions, and for computation of similarities of gene expression vectors. Here, we show that the estimation of vector-component dependencies requires accounting for an important confounding factor: the presence of clusters of data vectors. We propose a mixture-model-based inference method, which disentangles the spurious effect of clusters from the true signal. We successfully apply our method to the problem of tumour sample classification. In Chapter 6, we propose the significance-based clustering algorithm. The algorithm seeks the best representation of data as a mixture of the background and of clusters characterised by a statistically significant score. In the implementation of this approach, we draw from all concepts discussed in the preceding chapters of this thesis: In the process of finding clusters of vectors, the algorithm estimates the metric which accounts for dependencies between components of the vectors. Further, using the probabilistic framework of the mixture-model, it assigns low prior probability, and effectively penalises, clusters with high cluster score p-value. In application to gene-expression data of yeast and human, we show that the significance-constraint improves the biological significance of resulting clusters.Clustering, das Gruppieren von Datenpunkten aufgrund ihrer beobachteten Eigenschaften, ist eines der wichtigsten Werkzeuge in der Datenanalyse. Es wird haeufig in der Analyse von Genexpressionsdaten verwendet, um Gene zu identifizieren, die aehnliche biologischen Funktionen haben. Cluster von Genexpressionsmustern lassen oft auf einen gemeinsamen regulatorischen Prozess der beteiligten Gene schliessen. Cluster von experimentellen Bedingungen, z.B. von unterschiedlichen Geweben in einem Organismus, sind ein Hinweis auf einen aehnlichen Zustand der Zelldifferenzierung. Die zuletzt genannte Eigenschaft wird haeufig zur Klassifikation von Tumordaten verwendet. Diese Dissertation etabliert statistische Grundlagen fuer Clustering in hochdimensionalen Daten. Die neu eingefuehrten Methoden basieren zu grossen Teilen auf Erkenntnissen der statistischen Mechanik. Zuerst werden deshalb in Kapitel 2 grundlegende Konzepte und Algorithmen der statistischen Mechanik eingefuehrt. In Kapitel 3 wird ein neues probabilistisches Model fuer Cluster im hochdimensionalen realen Raum vorgeschlagen. Motiviert durch die Merkmale von Genexpressionsdaten werden verschiedene Observablen eines Clusters definiert: Punktdichte, Positions-Bias und Richtungsdichte. Diese Observablen messen in verschiedener Weise Aehnlichkeiten zwischen Datenpunkten und beschreiben die Hintergrundverteilung zufaelliger Datenpunkte. Daraus wird eine sogenannte Score-Funktionen fuer Cluster abgeleitet. Obwohl Gene mit aehnlicher Funktion mit hoher Wahrscheinlichkeit Cluster in Genexpressionsdaten bilden, koennen auch zufaellig verteilte Datenvektoren Cluster bilden und hohe Cluster-Scores erhalten. In Kapitel 4 wird deshalb die statistische Signifikanz fuer Cluster behandelt. Fuer die Score-Funktionen aus Kapitel 3 werden Verfahren zur Berechnung eines sogenannten p-Wertes vorgestellt. Der Funktion p(S) gibt die Wahrscheinlickeit an, dass Zufallsvektoren einen Cluster-Score von mindestens S erhalten. Dieses Problem wir mit Methoden der statistischen Mechanik ungeordenter Systeme behandelt, die zu einer analytischen Loesung fuehren. In einer Anwendung auf Genexpressionsdaten aus Hefe wird gezeigt, dass Cluster- Scores p-Werte biologische Signifikanz von co-exprimierten Genen widerspiegeln; die biologische Signifikanz wird hierbei durch Gen-Ontologie- Parameter in den betrachteten Clustern gemessen. Dies zeigt, dass Gene mit aehnlichen biologischen Funktionen in der Tat als signifikante Cluster identifiert werden. In Kapitel 5 wird ein weiterer wichtiger Aspekt statistischer Methoden fuer hochdimensionale Daten behandelt: Abhaengigkeiten zwischen Vektorkomponenten. Solche Abhaengigkeiten sind haeufig in Genexpressiondaten zu finden, beispielweise verursacht durch zeitlich aufeinanderfolgende Experimente im Rahmen von Zeitreihenexperimenten. Eine korrekte Abschaetzung solcher Abhaengigkeiten ist sowohl fuer das Clustering von experimentellen Bedingungen als auch zur Berechnung der Aehnlichkeiten von Genen von entscheidender Bedeutung. Fuer die Abschaetzung von Abhaengigkeiten von Vektorkomponenten ist die Beruecksichtigung eines wichtigen Stoerfaktors notwendig: das Vorhandensein von Clustern von Datenvektoren. Wir schlagen eine Inferenzmethode basierend auf einer Mischverteilung vor, welche das zufaellige Auftreten von Clustern vom wahren Signal trennt. In unserem Ansatz verwenden wir die probabilistischen Modelle fuer Cluster aus Kapitel 3. Wir wenden diese Methode auf das Problem der Tumorprobenklassifizierung an. In Kapitel 6 wird der Algorithmus zur Berechnung von signifikanzbasiertem Clustering vorgestellt. Der Algorithmus sucht die beste Zerlegung der Daten als Mischung von zufaelligen Datenvektoren (aus der Hintergrundverteilung) und statistisch signifikanten Clustern im Sinne unserer Theorie. Beim Auffinden von Clustern von Datenvektoren schaetzt der Algorithmus ab, welches Aehnlichkeitsmass die Abhaengigkeiten zwischen Vektorkomponenten am besten bescheibt. Des weiteren erlaubt die probabilistische Mischverteilung die Verwendung von Ausgangwahrscheinlichkeiten, die Cluster mit grossen p-Werten bestraft. In einer Anwendung auf Genexpressionsdaten von Hefe und Mensch wird gezeigt, dass dieser Mischverteilungs-Ansatz die biologische Signifikanz der erhaltenen Cluster erhoeht

    Can we read the future from a tree?

    Full text link
    Overall view, front from above; This marble bust of the lover of the emperor Hadrian (ruled 117-138) was present in the Louvre’s Salle des Antiques in 1793. It was long confused with a bronze sculpture confiscated from the Chñteau d’Écouen in the same year, then transferred to Versailles before later joining the Louvre. (Antiquities from the royal residences were confiscated with the property of the Crown in 1792 and put on display at the Louvre.) Both sculptures reproduce a bust found during the Renaissance that probably came from the Villa Hadriana. The Louvre bust comes from the French royal collections and is an 18th-century copy of the Roman work. Source: Louvre Museum [website]; http://www.louvre.fr/ (accessed 4/15/2011

    Two-Stage Model-Based Clustering for Liquid Chromatography Mass Spectrometry Data Analysis

    No full text
    Proteomic mass spectrometry is gaining an increasing role in diagnostics and in studies on protein complexes and biological systems. This experimental technology is producing high-throughput data which is inherently noisy and may contain various errors. Mathematical processing can help in removing them.

    Predictive Modeling of Influenza Shows the Promise of Applied Evolutionary Biology

    No full text
    Seasonal influenza is controlled through vaccination campaigns. Evolution of influenza virus antigens means that vaccines must be updated to match novel strains, and vaccine effectiveness depends on the ability of scientists to predict nearly a year in advance which influenza variants will dominate in upcoming seasons. In this review, we highlight a promising new surveillance tool: predictive models. Developed through data-sharing and close collaboration between the World Health Organization and academic scientists, these models use surveillance data to make quantitative predictions regarding influenza evolution. Predictive models demonstrate the potential of applied evolutionary biology to improve public health and disease control. We review the state of influenza predictive modeling and discuss next steps and recommendations to ensure that these models deliver upon their considerable biomedical promise
    corecore