Search CORE

2,040 research outputs found

Fast and scalable inference of multi-sample cancer lineages.

Author: Batzoglou Serafim
Hajirasouliha Iman
Kashef-Haghighi Dorna
Popic Victoria
Salari Raheleh
West Robert B
Publication venue: eScholarship, University of California
Publication date: 30/12/2014
Field of study

Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of somatic single nucleotide variants obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. LICHeE is open source and available at http://viq854.github.io/lichee

arXiv.org e-Print Archive

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Recommended from our members

Inference of single-cell phylogenies from lineage tracing data using Cassiopeia.

Author: Chan Michelle M
Hussmann Jeffrey A
Jones Matthew G
Khodaverdian Alex
Quinn Jeffrey J
Wang Robert
Weissman Jonathan S
Xu Chenling
Yosef Nir
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia

eScholarship - University of California

MAVID: Constrained ancestral alignment of multiple sequences

Author: Bray Nicolas
Pachter Lior
Publication venue
Publication date: 13/11/2003
Field of study

We describe a new global multiple alignment program capable of aligning a large number of genomic regions. Our progressive alignment approach incorporates the following ideas: maximum-likelihood inference of ancestral sequences, automatic guide-tree construction, protein based anchoring of ab-initio gene predictions, and constraints derived from a global homology map of the sequences. We have implemented these ideas in the MAVID program, which is able to accurately align multiple genomic regions up to megabases long. MAVID is able to effectively align divergent sequences, as well as incomplete unfinished sequences. We demonstrate the capabilities of the program on the benchmark CFTR region which consists of 1.8Mb of human sequence and 20 orthologous regions in marsupials, birds, fish, and mammals. Finally, we describe two large MAVID alignments: an alignment of all the available HIV genomes and a multiple alignment of the entire human, mouse and rat genomes

arXiv.org e-Print Archive

PubMed Central

Caltech Authors

Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study

Author: Bentley SD
Colijn C
Harris SR
Kendall M
Lees JA
Parkhill J
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2018
Field of study

Background: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are many available methods to infer phylogenies, and these have various advantages and disadvantages, but few unbiased comparisons of the range of approaches have been made. Methods: We simulated data from a defined "true tree" using a realistic evolutionary model. We built phylogenies from this data using a range of methods, and compared reconstructed trees to the true tree using two measures, noting the computational time needed for different phylogenetic reconstructions. We also used real data from Streptococcus pneumoniae alignments to compare individual core gene trees to a core genome tree. Results: We found that, as expected, maximum likelihood trees from good quality alignments were the most accurate, but also the most computationally intensive. Using less accurate phylogenetic reconstruction methods, we were able to obtain results of comparable accuracy; we found that approximate results can rapidly be obtained using genetic distance based methods. In real data we found that highly conserved core genes, such as those involved in translation, gave an inaccurate tree topology, whereas genes involved in recombination events gave inaccurate branch lengths. We also show a tree-of-trees, relating the results of different phylogenetic reconstructions to each other. Conclusions: We recommend three approaches, depending on requirements for accuracy and computational time. Quicker approaches that do not perform full maximum likelihood optimisation may be useful for many analyses requiring a phylogeny, as generating a high quality input alignment is likely to be the major limiting factor of accurate tree topology. We have publicly released our simulated data and code to enable further comparisons

Oxford University Research Archive

Spiral - Imperial College Digital Repository

Computational phylogenetics and the classification of South American languages

Author: Chousou‐Polydouri Natalia
Michael Lev
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

In recent years, South Americanist linguists have embraced computational phylogenetic methods to resolve the numerous outstanding questions about the genealogi- cal relationships among the languages of the continent. We provide a critical review of the methods and language classification results that have accumulated thus far, emphasizing the superiority of character-based methods over distance-based ones and the importance of develop- ing adequate comparative datasets for producing well- resolved classifications

Crossref

eScholarship - University of California

ZORA

How Fitch-Margoliash Algorithm can Benefit from Multi Dimensional Scaling

Author: Hitchcock E.
Darwin C.
Edwards A.W.F.
Sneath P.H.A.
Saitou N.
Salemi M.
Lespinats S.
Jolliffe I.
Kuhner M.K.
Zaretsky K.
Cavalli-Sforza L.L.
Matsuda H.
Swofford D.L.
Li J.
Press W.H.
Glover F.
Goldberg D.E.
Reeves C.R.
Dowsland K.A.
Chalmers M.
Gromov M.
Milman V.D.
Bulmer M.
Demartines P.
Fleiss J.L.
Publication venue: Libertas Academica
Publication date: 01/01/2011
Field of study

Whatever the phylogenetic method, genetic sequences are often described as strings of characters, thus molecular sequences can be viewed as elements of a multi-dimensional space. As a consequence, studying motion in this space (ie, the evolutionary process) must deal with the amazing features of high-dimensional spaces like concentration of measured phenomenon

Crossref

Hal - Université Grenoble Alpes

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

Warwick Research Archives Portal Repository

Online Research Database In Technology

The Qphyl System: a web-based interactive system for phylogenetic analysis

Author: Zhen Zhao
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2008
Field of study

Phylogenetic tree reconstruction is a prominent problem in computational biology. Currently, all computational methods have their limitations and work well only for simple problems of small size. No existing method can guarantee that trees constructed for real-world problems are true phylogenetic trees for large and complex problems mainly because the existing computational models are not very biologically realistic. It has become a serious issue for many important real-life applications which often desire accurate results from phylogenetic analysis. Thus, it is very crucial to effectively incorporate multi-disciplinary analyses and synthesize results from various sources when answering real-life questions. In this thesis, a novel web-based phylogeny reconstruction system with a real-time interactive environment, called Qphyl (short for quartet-based phylogenetic analysis) is introduced. The Qphyl system uses a new interactive approach to enable biologists to greatly improve the final results through effectively dynamic interaction with the computation, e.g., to move the computation back and forth to different stages so users can check the intermediate results, compare results from different methods and carry out certain manual refinements using their biological domain-specific knowledge in the decision making on how a tree should be reconstructed. Currently the alpha version of this web-based interactive system has been released and accessible through the URL: http://ww-test.it.usyd.edu.au/sogrid/qphyl/

Sydney eScholarship

Unfolding Latent Tree Structures using 4th Order Tensors

Author: Ishteva Mariya
Park Haesun
Song Le
Publication venue
Publication date: 03/10/2012
Field of study

Discovering the latent structure from many observed variables is an important yet challenging learning task. Existing approaches for discovering latent structures often require the unknown number of hidden states as an input. In this paper, we propose a quartet based approach which is \emph{agnostic} to this number. The key contribution is a novel rank characterization of the tensor associated with the marginal distribution of a quartet. This characterization allows us to design a \emph{nuclear norm} based test for resolving quartet relations. We then use the quartet test as a subroutine in a divide-and-conquer algorithm for recovering the latent tree structure. Under mild conditions, the algorithm is consistent and its error probability decays exponentially with increasing sample size. We demonstrate that the proposed approach compares favorably to alternatives. In a real world stock dataset, it also discovers meaningful groupings of variables, and produces a model that fits the data better

arXiv.org e-Print Archive

CiteSeerX

FlatNJ: A novel network-based approach to visualize evolutionary and biogeographical relationships

Author: Balvočiūtė Monika
Moulton Vincent
Spillner Andreas
Publication venue: 'Oxford University Press (OUP)'
Publication date: 17/01/2014
Field of study

Split networks are a type of phylogenetic network that allow visualization of conflict in evolutionary data. We present a new method for constructing such networks called FlatNetJoining (FlatNJ). A key feature of FlatNJ is that it produces networks that can be drawn in the plane in which labels may appear inside of the network. For complex data sets that involve, for example, non-neutral molecular markers, this can allow additional detail to be visualized as compared to previous methods such as split decomposition and NeighborNet. We illustrate the application of FlatNJ by applying it to whole HIV genome sequences, where recombination has taken place, fluorescent proteins in corals, where ancestral sequences are present, and mitochondrial DNA sequences from gall wasps, where biogeographical relationships are of interest. We find that the networks generated by FlatNJ can facilitate the study of genetic variation in the underlying molecular sequence data and, in particular, may help to investigate processes such as intra-locus recombination. FlatNJ has been implemented in Java and is freely available at www.uea.ac.uk/computing/software/flatnj

Dryad Digital Repository (Duke University)

University of East Anglia digital repository