Search CORE

1,348 research outputs found

In search of lost introns

Author: Adachi
Aldous
Altschul
Bieri
Blum
Carmel
Collins
Coulombe-Huntington
Csűrös
Csűrös
Devroye
Durbin
Edgar
Felsenstein
Felsenstein
Felsenstein
Friedman
Guindon
Harding
Heard
Hubbard
Igor B. Rogozin
IHBSC
J. Andrew Holey
Jeffares
Kececioglu
Kosakovsky Pond
Larget
Ma
Marchler-Bauer
McDiarmid
McKenzie
Miklós Csűrös
Müller
Nguyen
Nielsen
Nixon
Press
Pruitt
Raible
Rogozin
Rogozin
Rosenberg
Roy
Roy
Roy
Roy
Stamatakis
Steel
Sverdlov
Sverdlov
Tatusov
Vaňácová
Zhang
Publication venue
Publication date: 03/02/2007
Field of study

Many fundamental questions concerning the emergence and subsequent evolution of eukaryotic exon-intron organization are still unsettled. Genome-scale comparative studies, which can shed light on crucial aspects of eukaryotic evolution, require adequate computational tools. We describe novel computational methods for studying spliceosomal intron evolution. Our goal is to give a reliable characterization of the dynamics of intron evolution. Our algorithmic innovations address the identification of orthologous introns, and the likelihood-based analysis of intron data. We discuss a compression method for the evaluation of the likelihood function, which is noteworthy for phylogenetic likelihood problems in general. We prove that after

O(nL)

preprocessing time, subsequent evaluations take

O(nL/\log L)

time almost surely in the Yule-Harding random model of

n

-taxon phylogenies, where

L

is the input sequence length. We illustrate the practicality of our methods by compiling and analyzing a data set involving 18 eukaryotes, more than in any other study to date. The study yields the surprising result that ancestral eukaryotes were fairly intron-rich. For example, the bilaterian ancestor is estimated to have had more than 90% as many introns as vertebrates do now

arXiv.org e-Print Archive

Crossref

College of Saint Benedict and Saint John’s University: DigitalCommons@CSB/SJU

Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era [preprint]

Author: Armstrong Joel
Karlsson Elinor K.
Paten Benedict
Publication venue: eScholarship@UMassChan
Publication date: 26/08/2019
Field of study

Cactus, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequence. We describe progressive extensions to Cactus that enable reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We show that Cactus is capable of scaling to hundreds of genomes and beyond by describing results from an alignment of over 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment yet created. Further, we show improvements in orthology resolution leading to downstream improvements in annotation

eScholarship@UMMS

Duplication-Loss Genome Alignment: Complexity and Algorithm

Author: A. Bergeron
B. Moret
D. Sankoff
G. Ausiello
G. Bourque
G. Fertin
J. Ma
M. Marron
N. El-Mabrouk
N. El-Mabrouk
N. El-Mabrouk
P. Alimonti
P. Holloway
S. Hannenhalli
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

PhyloSift: Phylogenetic analysis of genomes and metagenomes

Author: Bik HM
Darling AE
Eisen JA
Jospin G
Lowe E
Matsen FA
Publication venue: 'PeerJ'
Publication date: 01/01/2014
Field of study

Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection. In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata. These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454). © 2014 Darling et al

OPUS - University of Technology Sydney

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Integration of Alignment and Phylogeny in the Whole-Genome Era

Author: Sun Hongtao
Publication venue: Washington University Open Scholarship
Publication date: 15/05/2015
Field of study

With the development of new sequencing techniques, whole genomes of many species have become available. This huge amount of data gives rise to new opportunities and challenges. These new sequences provide valuable information on relationships among species, e.g. genome recombination and conservation. One of the principal ways to investigate such information is multiple sequence alignment (MSA). Currently, there is large amount of MSA data on the internet, such as the UCSC genome database, but how to effectively use this information to solve classical and new problems is still an area lacking of exploration. In this thesis, we explored how to use this information in four problems, i.e. sequence orthology search problem, multiple alignment improvement problem, short read mapping problem, and genome rearrangement inference problem. For the first problem, we developed a EM algorithm to iteratively align a query with a multiple alignment database with the information from a phylogeny relating the query species and the species in the multiple alignment. We also infer the query\u27s location in the phylogeny. We showed that by doing alignment and phylogeny inference together, we can improve the accuracies for both problems. For the second problem, we developed an optimization algorithm to iteratively refine the multiple alignment quality. Experiment results showed our algorithm is very stable in term of resulting alignments. The results showed that our method is more accurate than existing methods, i.e. Mafft, Clustal-O, and Mavid, on test data from three sets of species from the UCSC genome database. For the third problem, we developed a model, PhyMap, to align a read to a multiple alignment allowing mismatches and indels. PhyMap computes local alignments of a query sequence against a fixed multiple-genome alignment of closely related species. PhyMap uses a known phylogenetic tree on the species in the multiple alignment to improve the quality of its computed alignments while also estimating the placement of the query on this tree. Both theoretical computation and experiment results show that our model can differentiate between orthologous and paralogous alignments better than other popular short read mapping tools (BWA, BOWTIE and BLAST). For the fourth problem, we gave a simple genome recombination model which can express insertions, deletions, inversions, translocations and inverted translocations on aligned genome segments. We also developed an MCMC algorithm to infer the order of the query segments. We proved that using any Euclidian metrics to measure distance between two sequence orders in the tree optimization goal function will lead to a degenerated solution where the inferred order will be the order of one of the leaf nodes. We also gave a graph-based formulation of the problem which can represent the probability distribution of the order of the query sequences

Washington University St. Louis: Open Scholarship

A bioinformatics analysis of contributors to false discovery for a mouse genotyping array

Author: Patel Nisha
Publication venue: Scholarship@Western
Publication date: 10/08/2018
Field of study

Microarray experiments employing massively-parallel hybridization are valuable for the study of genetic variation, however, errors during hybridization and limitations of single-species design must be considered for use within and across species. The Mouse Diversity Genotyping Array (MDGA) is a low cost, high-resolution microarray with probes that bind to target DNA for variant detection. Errors associated with probe design and incomplete protein removal from target DNA lead to false discovery and thus necessitate examination of probe suitability and target DNA availability. Bioinformatics methods were used to carry out confirmation of probe annotations, assessment of DNA accessibility for hybridization to probes, and prediction of the theoretical ability of MDGA probes to hybridize cross-species to naked mole-rat genomic DNA. The results are a filtered probe list demonstrated to reduce false discovery, a suggested approach to assess biases arising from protein-bound DNA, and predictions for cross-species application of the MDGA to naked mole-rat samples

Scholarship@Western