Search CORE

83 research outputs found

CO-phylum: An Assembly-Free Phylogenomic Approach for Close Related Organisms

Author: Blanchette
Cannon
Chen
Cole
Dalquen
Darling
Domazet-Loso
Edgar
Elias
Foster
Glenn
Hohl
Hu
Huang
Huiguang Yi
Jun
Li
Li Jin
Loytynoja
Ma
Otu
Peterlongo
Qi
Ratan
Saitou
Snel
Stuart
Touchon
Ulitsky
Wagner
Wang
Wiens
Wong
Zhou
Publication venue: 'Oxford University Press (OUP)'
Publication date: 06/04/2011
Field of study

Phylogenomic approaches developed thus far are either too time-consuming or lack a solid evolutionary basis. Moreover, no phylogenomic approach is capable of constructing a tree directly from unassembled raw sequencing data. A new phylogenomic method, CO-phylum, is developed to alleviate these flaws. CO-phylum can generate a high-resolution and highly accurate tree using complete genome or unassembled sequencing data of close related organisms, in addition, CO-phylum distance is almost linear with p-distance.Comment: 21 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Reverse-Safe Data Structures for Text Indexing

Author: Gabriele Fici
Giulia Bernardini
Grigorios Loukides
Huiping Chen
Solon P. Pissis
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model

Archivio istituzionale della ricerca - Università di Trieste

Crossref

CWI's Institutional Repository

University of Birmingham Research Portal

Archivio istituzionale della ricerca - Università di Palermo

Alignment-free Phylogeny Reconstruction Based On Quartet Trees

Author: Dencker Thomas
Publication venue
Publication date: 04/03/2020
Field of study

Georg-August-University Göttingen

RasBhari: optimizing spaced seeds for database searching, read mapping and alignment-free sequence comparison

Author: Hahn Lars
Leimeister Chris-André
Lonardi Stefano
Morgenstern Burkhard
Ounit Rachid
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 20/07/2016
Field of study

Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Estimating evolutionary distances between genomic sequences from spaced-word matches

Author
Publication venue: BioMed Central
Publication date: 11/02/2015
Field of study

Springer - Publisher Connector

Phylogenetic Tree Construction for Starfish and Primate Genomes via Alignment Free Methods

Author: Krishnan Ambujam
Publication venue: LSU Digital Commons
Publication date: 01/01/2015
Field of study

A phylogenetic tree is a tree like diagram showing the evolutionary relationship among various species based on their differences or similarity in their physical or genetic makeup.The similarity in their genetic makeup is traditionally measured based on pairwise distance between their gene sequences using sequence alignment methods. Due to the advancement in next generation sequencing technologies there is a huge amount of datasets available for partially or completely sequenced genomes. These massive datasets requires a faster comparison methods other than the traditional alignment-based approaches. Therefore, alignment free approaches are gaining popularity in recent years. In this thesis, we compare alignment-based and various alignment free methods for phylogenetic tree construction. The alignment free methods we study are based on k-mer frequency, Average Common Substring (ACS) and ACS with position restrictions and mismatches. The position restricted ACS is a novel contribution of this thesis. To evaluate performance of the alignment free approaches we applied it to phylogeny reconstruction using DNA ( 27 primate mitochondrial genomes) and protein (Starfish RNA-seq) sequence sets. The phylogenetic trees are constructed using Neighbor joining to the distance matrices obtained with the above mentioned alignment-free methods. The resulting phylogenetic trees are then compared with the reference tree using Branch Score Distance measure. Both the Neighbor joining and the Branch Score Distance Measure are calculated by using the programs neighbor and treedist from the PHYLIP package

Louisiana State University

Filtered spaced-word matches: a novel approach to fast and accurate sequence comparison

Author: Leimeister Chris-Andre
Publication venue
Publication date: 12/12/2018
Field of study

Georg-August-University Göttingen