Search CORE

531 research outputs found

Algorithms in comparative genomics

Author: Chikkagoudar Satish
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2010
Field of study

The field of comparative genomics is abundant with problems of interest to computer scientists. In this thesis, the author presents solutions to three contemporary problems: obtaining better alignments for phylogeny reconstruction, identifying related RNA sequences in genomes, and ranking Single Nucleotide Polymorphisms (SNPs) in genome-wide association studies (GWAS). Sequence alignment is a basic and widely used task in bioinformatics. Its applications include identifying protein structure, RNAs and transcription factor binding sites in genomes, and phylogeny reconstruction. Phylogenetic descriptions depend not only on the employed reconstruction technique, but also on the underlying sequence alignment. The author has studied and established a simple prescription for obtaining a better phylogeny by improving the underlying alignments used in phylogeny reconstruction. This was achieved by improving upon Gotoh\u27s iterative heuristic by iterating with maximum parsimony guide-trees. This approach has shown an improvement in accuracy over standard alignment programs. A novel alignment algorithm named Probalign-RNAgenome that can identify non-coding RNAs in genomic sequences was also developed. Non-coding RNAs play a critical role in the cell such as gene regulation. It is thought that many such RNAs lie undiscovered in the genome. To date, alignment based approaches have shown to be more accurate than thermodynamic methods for identifying such non-coding RNAs. Probalign-RNAgenome employs a probabilistic consistency based approach for aligning a query RNA sequence to its homolog in a genomic sequence. Results show that this approach is more accurate on real data than the widely used BLAST and Smith- Waterman algorithms. Within the realm of comparative genomics are also a large number of recently conducted GWAS. GWAS aim to identify regions in the genome that are associated with a given disease. The support vector machine (SVM) provides a discriminative alternative to the widely used chi-square statistic in GWAS. A novel hybrid strategy that combines the chi-square statistic with the SVM was developed and implemented. Its performance was studied on simulated data and the Wellcome Trust Case Control Consortium (WTCCC) studies. Results presented in this thesis show that the hybrid strategy ranks causal SNPs in simulated data significantly higher than the chi-square test and SVM alone. The results also show that the hybrid strategy ranks previously replicated SNPs and associated regions (where applicable) of type 1 diabetes, rheumatoid arthritis, and Crohn\u27s disease higher than the chi-square, SVM, and SVM Recursive Feature Elimination (SVM-RFE)

Digital Commons @ New Jersey Institute of Technology (NJIT)

Molecular Evolution & Phylogeny: What, When, Why & How?

Author: Mohan Kale
Pandurang Kolekar
Urmila Kulkarni-Kale
Publication venue: 'IntechOpen'
Publication date: 02/09/2011
Field of study

IntechOpen

Characterization of multiple sequence alignment errors using complete-likelihood score and position-shift map

Author: Kiyoshi Ezawa
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Springer - Publisher Connector

Simultaneous phylogeny reconstruction and multiple sequence alignment

Author: BME Moret
C Notredame
C Notredame
D Higgins
D Huson
D Powell
D Sankoff
D Sankoff
D Sankoff
E Myers
F Yue
Feng Yue
G Lancia
J Hein
J Stoye
J Strugnell
J Thompson
J Thompson
Jian Shi
Jijun Tang
K Wong
L Wang
M Vingron
N Goldman
N Saitou
O Gotoh
R Robinson
S Henikoff
T Jiang
T Ogden
U Roshan
W Pearson
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Evolutionary Inference via the Poisson Indel Process

Author: Alexandre Bouchard-Côté
Buiculescu
Cox
Dreyer
Hein
Hein
Huelsenbeck
Michael I. Jordan
Miklós
Nelesen
Roshan
Saitou
Searls
Wheeler
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 18/01/2013
Field of study

We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classical evolutionary process, the TKF91 model, is a continuous-time Markov chain model comprised of insertion, deletion and substitution events. Unfortunately this model gives rise to an intractable computational problem---the computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a new stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The new model is closely related to the TKF91 model, differing only in its treatment of insertions, but the new model has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared to separate inference of phylogenies and alignments.Comment: 33 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Phylogenetic Trees and Their Analysis

Author: Ford Eric
Publication venue: CUNY Academic Works
Publication date: 01/02/2014
Field of study

Determining the best possible evolutionary history, the lowest-cost phylogenetic tree, to fit a given set of taxa and character sequences using maximum parsimony is an active area of research due to its underlying importance in understanding biological processes. As several steps in this process are NP-Hard when using popular, biologically-motivated optimality criteria, significant amounts of resources are dedicated to both both heuristics and to making exact methods more computationally tractable. We examine both phylogenetic data and the structure of the search space in order to suggest methods to reduce the number of possible trees that must be examined to find an exact solution for any given set of taxa and associated character data. Our work on four related problems combines theoretical insight with empirical study to improve searching of the tree space. First, we show that there is a Hamiltonian path through tree space for the most common tree metrics, answering Bryant\u27s Challenge for the minimal such path. We next examine the topology of the search space under various metrics, showing that some metrics have local maxima and minima even with perfect data, while some others do not. We further characterize conditions for which sequences simulated under the Jukes-Cantor model of evolution yield well-behaved search spaces. Next, we reduce the search space needed for an exact solution by splitting the set of characters into mutually-incompatible subsets of compatible characters, building trees based on the perfect phylogenies implied by these sets, and then searching in the neighborhoods of these trees. We validate this work empirically. Finally, we compare two approaches to the generalized tree alignment problem, or GTAP: Sequence alignment followed by tree search vs. Direct Optimization, on both biological and simulated data

City University of New York

Recommended from our members

Improved methods for phylogenetics

Author: Nelesen Serita Marie
Publication venue
Publication date: 13/08/2010
Field of study

textPhylogenetics is the study of evolutionary relationships. It is a scientific endeavour to discover history, and it is not easy. Massive amounts of data together with computationally difficult optimization problems mean that heuristics are prevalent, and ever better techniques are sought. New approaches are valuable if they are more accurate, but are considered even more so if they are faster than pre-existing methods. Improvements to existing algorithms, whether in terms of space requirements, or faster running times, are also worthwhile. This dissertation explores three new techniques, each of which is valuable according to the previous definitions. The first contribution is TASPI, a system for storing collections of phylogenetic trees, and performing post-tree analyses. TASPI stores collections of trees more compactly than the previous method, and this compact structure lends itself to post-tree analyses. This results in the ability to compute strict and majority consensus trees faster than common alternatives. As an added benefit, TASPI is written in ACL2, which allows properties of the algorithms and data structures to be formally verified. The second contribution is an improved method to generate phylogenetic trees. A common methodology involves two steps, first estimating a Multiple Sequence Alignment (MSA), and then estimating a tree using that MSA. This method changes the way in which the MSA is estimated, and this leads to improved accuracy of the resultant trees. Also, in some cases, the time required is also reduced. The third contribution is BLuTGEN, a method by which a phylogenetic tree is estimated from sequence data, but without ever generating an MSA for the full dataset. BLuTGEN is as accurate as one of the best published tree estimation techniques (SATé), but takes a novel approach which allows it to be applied to much larger datasets.Computer Science

Texas ScholarWorks

Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference

Author: Dessimoz Christophe
Gil Manuel
Goldman Nick
Herrero Javier
Ledergerber Christian
Muffato Matthieu
Tan Ge
Publication venue: Oxford University Press
Publication date: 01/06/2015
Field of study

Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms

PubMed Central

UCL Discovery

ZHAW digitalcollection

Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference

Author: Dessimoz Christophe
Gil Manuel
Goldman Nick
Herrero Javier
Ledergerber Christian
Muffato Matthieu
Tan Ge
Publication venue
Publication date: 02/08/2017
Field of study

Repository for Publications and Research Data

RERO DOC Digital Library

Novel approaches for large-scale phylogenetics and applications in the context of the amphibian tree of life

Author: Siu Ting Salvatierra Karen
Publication venue
Publication date: 01/01/2014
Field of study

During this thesis, I addressed some problems associated with large-scale phylogenetic analyses by tackling issues related to missing data and careful handling and addition of novel data in large-scale reconstructions, presenting an application of this approach in the context of amphibian phylogenetics. I developed a method (called “Concatabominations”) building on the original Safe Taxonomic Reduction method (Wilkinson 1995) as an alternative approach to the issue of identifying rogue taxa. The safe removal of rogue taxa due to missing data can potentially reduce the terraces in tree space search and improve resolution in the final consensus tree. In a pragmatic point of view, the new method can help in targeting taxa that require further sampling during a research design. Novel sequence data for the rediscovered Ericabatrachus baleensis allowed to explore its placement in the Amphibian tree of life. I tested the inclusion of novel data using a backbone alignment from a previous work (de novo analysis) and a backbone phylogenetic tree (constrained analysis), after careful curation of gene partitions to include in an analysis. I found that the use of a constrained phylogenetic inference using a previous accepted tree seems to be a practical solution to the rapid phylogenetic placement of a taxon in cases of well-supported relationships. However, a de novo analysis might ensure an optimal alignment and avoid risks introduced when adding new data. Finally, I investigated the evolutionary relationships of the three lineages of the extant amphibians (Anura, Caudata and Gymnophiona) using an independent source of evidence: miRNAs, recently used to help resolve difficult phylogenetic problems. The analyses yielded a high number of shared miRNAs using the Xenopus tropicalis genome, contrasting with a lower number of miRNAs discovered using the Axolotl transcriptome. This suggests that not using genomic data is not ideal to validate miRNAs. Nevertheless, in spite of the limitations, I was able to find two potential novel miRNAs: one supporting the monophyly of Lissamphibia, and another supporting the Batrachia hypothesis. Overall, I hope the work developed in this thesis contributes with new insights into large-scale phylogenetics and in particular to amphibian phylogenetics

MURAL - Maynooth University Research Archive Library