Search CORE

19,636 research outputs found

Using genotype abundance to improve phylogenetic inference

Author: DeWitt III William S.
Matsen IV Frederick A.
Mesin Luka
Minin Vladimir N.
Victora Gabriel D.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 20/02/2018
Field of study

Modern biological techniques enable very dense genetic sampling of unfolding evolutionary histories, and thus frequently sample some genotypes multiple times. This motivates strategies to incorporate genotype abundance information in phylogenetic inference. In this paper, we synthesize a stochastic process model with standard sequence-based phylogenetic optimality, and show that tree estimation is substantially improved by doing so. Our method is validated with extensive simulations and an experimental single-cell lineage tracing study of germinal center B cell receptor affinity maturation

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Consistency and convergence rate of phylogenetic inference via regularization

Author: Dinh Vu
Ho Lam Si Tung
Matsen IV Frederick A.
Suchard Marc A.
Publication venue
Publication date: 05/01/2018
Field of study

It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct "gene tree." Although the gene tree may deviate from the "species tree" due to a variety of genetic processes, in the absence of evidence to the contrary it is parsimonious to assume that they agree. A common statistical approach in these situations is to develop a likelihood penalty to incorporate such additional information. Recent studies using simulation and empirical data suggest that a likelihood penalty quantifying concordance with a species tree can significantly improve the accuracy of gene tree reconstruction compared to using sequence data alone. However, the consistency of such an approach has not yet been established, nor have convergence rates been bounded. Because phylogenetics is a non-standard inference problem, the standard theory does not apply. In this paper, we propose a penalized maximum likelihood estimator for gene tree reconstruction, where the penalty is the square of the Billera-Holmes-Vogtmann geodesic distance from the gene tree to the species tree. We prove that this method is consistent, and derive its convergence rate for estimating the discrete gene tree structure and continuous edge lengths (representing the amount of evolution that has occurred on that branch) simultaneously. We find that the regularized estimator is "adaptive fast converging," meaning that it can reconstruct all edges of length greater than any given threshold from gene sequences of polynomial length. Our method does not require the species tree to be known exactly; in fact, our asymptotic theory holds for any such guide tree.Comment: 34 pages, 5 figures. To appear on The Annals of Statistic

arXiv.org e-Print Archive

eScholarship - University of California

Phylogenetic Analysis of Cell Types using Histone Modifications

Author: Bucher Philipp
Lin Yu
Moret Bernard M. E.
Nair Nishanth Ulhas
Publication venue
Publication date: 07/07/2013
Field of study

In cell differentiation, a cell of a less specialized type becomes one of a more specialized type, even though all cells have the same genome. Transcription factors and epigenetic marks like histone modifications can play a significant role in the differentiation process. In this paper, we present a simple analysis of cell types and differentiation paths using phylogenetic inference based on ChIP-Seq histone modification data. We propose new data representation techniques and new distance measures for ChIP-Seq data and use these together with standard phylogenetic inference methods to build biologically meaningful trees that indicate how diverse types of cells are related. We demonstrate our approach on H3K4me3 and H3K27me3 data for 37 and 13 types of cells respectively, using the dataset to explore various issues surrounding replicate data, variability between cells of the same type, and robustness. The promising results we obtain point the way to a new approach to the study of cell differentiation.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Inferring stabilizing mutations from protein phylogenies : application to influenza hemagglutinin

Author: A Akasako
A Akasako
A Cao
A Martin
A Mitraki
A Rambaut
AA Pakula
AR Dinner
AR Fersht
AR Fersht
AS Yang
AS Yang
AV Gribenko
B Steipe
B Steipe
BM Broome
C Pal
C Park
CB Anfinsen
CB Do
CM Dobson
CT Saunders
D Gilis
D Perl
D Shortle
DA Cowan
DA Drummond
DA Drummond
DD Loeb
DM Taverna
DM Taverna
E Capriotti
E Hoffmann
E van Nimwegen
EPC Rocha
Eugene I. Shakhnovich
F Chiti
F Ronquist
G Parisi
GG Brownlee
H Akashi
H Li
H Schindelin
H Zhao
H Zhou
HW Hellinga
I Keller
IE Sanchez
IMP del Pino
J Felsenstein
J Felsenstein
J Felsenstein
J Felsenstein
J Kyte
JA Wells
JB Garrett
JD Bloom
JD Bloom
JD Bloom
JD Bloom
Jesse D. Bloom
JL Thorne
JM Koshi
JP Huelsenbeck
JP Huelsenbeck
JR Cochran
JR Lepock
JV Chamary
K Ishikawa
K Ishikawa
K Katayanagi
KA Bava
KA Gray
KB Zeldovich
KJ Szretter
KL Maxwell
L Giver
L Serrano
M Dai
M Haruki
M Jacob
M Lehmann
M Matrosovich
M Ueda
M Wunderlich
Matthew J. Glassman
MD Kumar
MF Sippl
MM Garcia-Mira
MM Gromiha
MP Canadillas
MS Fornasari
MW Pantoliano
N Amin
N Goldman
N Goldman
N Lartillot
N Tong
R Godoy-Ruiz
R Godoy-Ruiz
R Godoy-Ruiz
R Guerois
R Rabadan
R Sakaue
RC Edgar
RJ Ellis
S Govindarajan
S Kimura
S Kimura
S Nakajima
S Sato
SC Choi
SH White
SJ Gamblin
SS Jaswal
U Bastolla
V Parthiban
VG Dugan
VN Uversky
W Besenmatter
WS Sandberg
WSW Wong
XJ Zhang
Y Bao
YY Tseng
Z Chen
Publication venue: International Society for Computational Biology
Publication date: 01/04/2009
Field of study

One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors

Uncommon Problems in Phylogenetic Inference

Author: Bettisworth Ben
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 13/09/2023
Field of study

Die Phylogenetik ist die Lehre der Entwicklung des Lebens auf der Erde. Das Auf- decken alter evolutionärer Beziehungen zwischen lebenden Arten ist von großem Wert, da sie zu wichtigen Entdeckungen in der Biologie führte, wie beispielsweise zur Entwicklung neuer Medikamente, zur Nachverfolgung der Dynamik einer globa- len Pandemie sowie zu Erkenntnissen über den Ursprung der Menschheit. Heutzu- tage werden phylogenetische Analysen typischerweise mit Hilfe statistischer Modelle durchgeführt, wobei Sequenzdaten, in der Regel molekulare Sequenzen, als Einga- bedaten verwendet werden. Basierend auf diesen statistischen Modellen wird die wahrscheinlichste Erklärung für die Eingabedaten berechnet. Das heißt, der (ver- meintlich) korrekte phylogenetische Baum ist der Baum, der gemäß eines bestimm- ten Modells der Sequenzentwicklung am wahrscheinlichsten ist. Die rasche Zunahme verfügbarer Daten in den letzten Jahren ermöglicht wesentlich kompliziertere phylogenetische Analysen. Paradoxerweise hat diese massive Zunah- me der für die Analyse verfügbaren Daten nicht in allen Fällen zu einer endgültigen Schlussfolgerung geführt, d. h. das verwendete Modell ist unsicher bezüglich der wahrscheinlichsten Schlussfolgerung. Dies kann auf eine Vielzahl von Faktoren zu- rückzuführen sein, wie beispielsweise hochkomplexe Modelle, Rauschen in einigen oder allen Daten sowie physikalische Prozesse, die durch das Modell nicht angemes- sen berücksichtigt werden. Schwierigkeiten aufgrund von Ungewissheit sind weder in der Phylogenetik noch in der Wissenschaft im Allgemeinen neu, doch die Entwick- lung komplizierterer Analysemethoden fordert neue Methoden zur Angabe, Analyse und Integration von Unsicherheiten. Die vorliegende Arbeit präsentiert drei Beiträge zur Verbesserung der Unsicherheits- bewertung. Der erste Beitrag betrifft die Bestimmung der Wurzel von ungewurzelten phylogenetischen Bäumen. Phylogenetische Bäume sind entweder bezüglich der Zeit orientiert, in diesem Fall nennt man sie verwurzelt, oder sie haben keine Orientie- rung, in diesem Fall sind sie unverwurzelt. Die meisten Programme zur Bestimmung phylogenetischer Bäume erzeugen aus rechnerischen Gründen einen ungewurzelten phylogenetischen Baum. Ich habe das Open-Source-Softwaretool RootDigger entwi- ckelt, das sowohl einen ungewurzelten phylogenetischen Baum, als auch eine Vertei- lung der wahrscheinlichen Wurzeln berechnet. Darüber hinaus verfügt RootDigger über ein Parallelisierungsschema mit verteiltem Speicher, welches auch die Analyse großer Datensätze erlaubt, wie beispielsweise die Bestimmung eines phylogenetischen Baumes aus 8736 SARS-CoV-2-Virussequenzen. Mein zweiter Beitrag in der vorliegenden Arbeit ist das Open-Source-Softwaretool Phylourny zur Berechnung des wahrscheinlichsten Gewinners eines Knock-out-Turniers. Der Algorithmus in Phylourny ist angelehnt an den Felsenstein Pruning Algorith- mus, einen dynamischen Programmierungsalgorithmus zur Berechnung der Wahr- scheinlichkeit eines phylogenetischen Baums. Die Verwendung dieses Algorithmus erlaubt eine erhebliche Beschleunigung der Berechnung im Vergleich zu Standard- Turniersimulationen. Mit dieser beschleunigten Methode untersucht Phylourny auch den Parameterraum des Modells mit Hilfe einer MCMC-Methode, um Ergebnisse zu bewerten und zusammenzufassen, die eine ähnliche Wahrscheinlichkeit des Auftre- tens haben. Diese Ergebnisse weichen oft erheblich vom wahrscheinlichsten Ergebnis ab. In der vorliegenden Arbeit präsentiere ich die Performanz von Phylourny anhand zweier realer Fußball- und Basketballturniere. Der finale Beitrag in dieser Arbeit ist die Neugestaltung und Neuimplementierung eines bekannten Tools für historische Biogeografie, mit dem sich Rückschlüsse auf die Verteilung der angestammten Verbreitungsgebiete ziehen lassen. Ein Hauptin- teresse der Biogeographie besteht in der Bestimmung der Verbreitungsgebiete von Arten. Die historische Biogeografie befasst sich daher häufig mit der Ableitung des Verbreitungsgebiets der Vorfahren lebender Arten. Diese Verteilungen des Verbrei- tungsgebiets der Vorfahren sind ein häufiges Ergebnis von biogeografischen Studien, die oft mit einem Modell abgeleitet werden, das zahlreiche Ähnlichkeiten mit Mo- dellen der Sequenzevolution aufweist. Meine neue Version, Lagrange-NG, berechnet die Ergebnisse bis zu 50 Mal schneller als die vorherige Version und bis zu zwei Grö- ßenordnungen schneller als das beliebte analoge Tool BioGeoBEARS. Darüber hinaus habe ich eine neue Abstandsmetrik entwickelt, die es erlaubt Ergebnisse alternativer Tools und Algorithmen zu vergleichen

KITopen