Search CORE

3,897 research outputs found

Reconstructing Biological and Digital Phylogenetic Trees in Parallel

Author: Afshar Ramtin
Goodrich Michael T.
Matias Pedro
Osegueda Martha C.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual European Symposium on Algorithms (ESA 2020)
Publication date: 01/01/2020
Field of study

In this paper, we study the parallel query complexity of reconstructing biological and digital phylogenetic trees from simple queries involving their nodes. This is motivated from computational biology, data protection, and computer security settings, which can be abstracted in terms of two parties, a responder, Alice, who must correctly answer queries of a given type regarding a degree-d tree, T, and a querier, Bob, who issues batches of queries, with each query in a batch being independent of the others, so as to eventually infer the structure of T. We show that a querier can efficiently reconstruct an n-node degree-d tree, T, with a logarithmic number of rounds and quasilinear number of queries, with high probability, for various types of queries, including relative-distance queries and path queries. Our results are all asymptotically optimal and improve the asymptotic (sequential) query complexity for one of the problems we study. Moreover, through an experimental analysis using both real-world and synthetic data, we provide empirical evidence that our algorithms provide significant parallel speedups while also improving the total query complexities for the problems we study

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Recommended from our members

Inference of single-cell phylogenies from lineage tracing data using Cassiopeia.

Author: Chan Michelle M
Hussmann Jeffrey A
Jones Matthew G
Khodaverdian Alex
Quinn Jeffrey J
Wang Robert
Weissman Jonathan S
Xu Chenling
Yosef Nir
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia

eScholarship - University of California

An entropy based heuristic model for predicting functional sub-type divisions of protein families

Author: Bakis Yasin
Bakış Yasin
Sezerman Ugur
Sezerman Uğur
Yorukoglu Deniz
Yörükoğlu Deniz
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/07/2009
Field of study

Multiple sequence alignments of protein families are often used for locating residues that are widely apart in the sequence, which are considered as influential for determining functional specificity of proteins towards various substrates, ligands, DNA and other proteins. In this paper, we propose an entropy-score based heuristic algorithm model for predicting functional sub-family divisions of protein families, given the multiple sequence alignment of the protein family as input without any functional sub-type or key site information given for any protein sequence. Two of the experimented test-cases are reported in this paper. First test-case is Nucleotidyl Cyclase protein family consisting of guanalyate and adenylate cyclases. And the second test-case is a dataset of proteins taken from six superfamilies in Structure-Function Linkage Database (SFLD). Results from these test-cases are reported in terms of confirmed sub-type divisions with phylogeny relations from former studies in the literature

Sabanci University Research Database

Multivariate Approaches to Classification in Extragalactic Astronomy

Author: Chattopadhyay Asis Kumar
Fraix-Burnet Didier
Thuillard Marc
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2015
Field of study

Clustering objects into synthetic groups is a natural activity of any science. Astrophysics is not an exception and is now facing a deluge of data. For galaxies, the one-century old Hubble classification and the Hubble tuning fork are still largely in use, together with numerous mono-or bivariate classifications most often made by eye. However, a classification must be driven by the data, and sophisticated multivariate statistical tools are used more and more often. In this paper we review these different approaches in order to situate them in the general context of unsupervised and supervised learning. We insist on the astrophysical outcomes of these studies to show that multivariate analyses provide an obvious path toward a renewal of our classification of galaxies and are invaluable tools to investigate the physics and evolution of galaxies.Comment: Open Access paper. http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>. \<10.3389/fspas.2015.00003 \&g

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Frontiers - Publisher Connector

HAL Descartes

HAL-INSU

HAL Université de Savoie

A target enrichment method for gathering phylogenetic information from hundreds of loci: An example from the Compositae.

Author: Burke John M
Dikow Rebecca B
Funk Vicki A
Kozik Alex
Mandel Jennifer R
Masalia Rishi R
Michelmore Richard W
Rieseberg Loren H
Staton S Evan
Publication venue: eScholarship, University of California
Publication date: 01/02/2014
Field of study

UnlabelledPremise of the studyThe Compositae (Asteraceae) are a large and diverse family of plants, and the most comprehensive phylogeny to date is a meta-tree based on 10 chloroplast loci that has several major unresolved nodes. We describe the development of an approach that enables the rapid sequencing of large numbers of orthologous nuclear loci to facilitate efficient phylogenomic analyses. •Methods and resultsWe designed a set of sequence capture probes that target conserved orthologous sequences in the Compositae. We also developed a bioinformatic and phylogenetic workflow for processing and analyzing the resulting data. Application of our approach to 15 species from across the Compositae resulted in the production of phylogenetically informative sequence data from 763 loci and the successful reconstruction of known phylogenetic relationships across the family. •ConclusionsThese methods should be of great use to members of the broader Compositae community, and the general approach should also be of use to researchers studying other families

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

A New Genus of Miniaturized and Pug-Nosed Gecko from South America (Sphaerodactylidae: Gekkota)

Author: Bauer Aaron M.
Colli Guarino R.
Daza Juan D.
Gamble Tony
Vitt Laurie J.
Publication venue: e-Publications@Marquette
Publication date: 25/11/2011
Field of study

Sphaerodactyl geckos comprise five genera distributed across Central and South America and the Caribbean. We estimated phylogenetic relationships among sphaerodactyl genera using both separate and combined analyses of seven nuclear genes. Relationships among genera were incongruent at different loci and phylogenies were characterized by short, in some cases zero-length, internal branches and poor phylogenetic support at most nodes. We recovered a polyphyletic Coleodactylus, with Coleodactylus amazonicus being deeply divergent from the remaining Coleodactylus species sampled. The C. amazonicus lineage possessed unique codon deletions in the genes PTPN12 and RBMX while the remaining Coleodactylus species had unique codon deletions in RAG1. Topology tests could not reject a monophyletic Coleodactylus, but we show that short internal branch lengths decreased the accuracy of topology tests because there were not enough data along these short branches to support one phylogenetic hypothesis over another. Morphological data corroborated results of the molecular phylogeny, with Coleodactylus exhibiting substantial morphological heterogeneity. We identified a suite of unique craniofacial features that differentiate C. amazonicus not only from other Coleodactylus species, but also from all other geckos. We describe this novel sphaerodactyl lineage as a new genus, Chatogekko gen. nov. We present a detailed osteology of Chatogekko, characterizing osteological correlates of miniaturization that provide a framework for future studies in sphaerodactyl systematics and biology

epublications@Marquette

PubMed Central

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Computational phylogenetics and the classification of South American languages

Author: Chousou‐Polydouri Natalia
Michael Lev
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

In recent years, South Americanist linguists have embraced computational phylogenetic methods to resolve the numerous outstanding questions about the genealogi- cal relationships among the languages of the continent. We provide a critical review of the methods and language classification results that have accumulated thus far, emphasizing the superiority of character-based methods over distance-based ones and the importance of develop- ing adequate comparative datasets for producing well- resolved classifications

Crossref

eScholarship - University of California

ZORA

Clustering by compression

Author: Cilibrasi Rudi
Vitanyi Paul
Publication venue
Publication date: 09/04/2004
Field of study

We present a new method for clustering based on compression. The method doesn't use subject-specific features or background knowledge, and works as follows: First, we determine a universal similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is universal in that it is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, co-developed by one of the authors, is provably optimal but uses the non-computable notion of Kolmogorov complexity. We propose precise notions of similarity metric, normal compressor, and show that the NCD based on a normal compressor is a similarity metric that approximates universality. To extract a hierarchy of clusters from the distance matrix, we determine a dendrogram (binary tree) by a new quartet method and a fast heuristic to implement it. The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors. In genomics we presented new evidence for major questions in Mammalian evolution, based on whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta hypothesis against the Theria hypothesis.Comment: LaTeX, 27 pages, 20 figure

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications