6 research outputs found
A framework for orthology assignment from gene rearrangement data
Abstract. Gene rearrangements have successfully been used in phylogenetic reconstruction and comparative genomics, but usually under the assumption that all genomes have the same gene content and that no gene is duplicated. While these assumptions allow one to work with organellar genomes, they are too restrictive when comparing nuclear genomes. The main challenge is how to deal with gene families, specifically, how to identify orthologs. While searching for orthologies is a common task in computational biology, it is usually done using sequence data. We approach that problem using gene rearrangement data, provide an optimization framework in which to phrase the problem, and present some preliminary theoretical results.
A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes
The reconstruction of ancestral genome architectures and gene orders from homologies between extant species is a long-standing problem, considered by both cytogeneticists and bioinformaticians. A comparison of the two approaches was recently investigated and discussed in a series of papers, sometimes with diverging points of view regarding the performance of these two approaches. We describe a general methodological framework for reconstructing ancestral genome segments from conserved syntenies in extant genomes. We show that this problem, from a computational point of view, is naturally related to physical mapping of chromosomes and benefits from using combinatorial tools developed in this scope. We develop this framework into a new reconstruction method considering conserved gene clusters with similar gene content, mimicking principles used in most cytogenetic studies, although on a different kind of data. We implement and apply it to datasets of mammalian genomes. We perform intensive theoretical and experimental comparisons with other bioinformatics methods for ancestral genome segments reconstruction. We show that the method that we propose is stable and reliable: it gives convergent results using several kinds of data at different levels of resolution, and all predicted ancestral regions are well supported. The results come eventually very close to cytogenetics studies. It suggests that the comparison of methods for ancestral genome reconstruction should include the algorithmic aspects of the methods as well as the disciplinary differences in data aquisition
Linear programming for phylogenetic reconstruction based on gene rearrangements
Phylogenetic reconstruction from gene rearrangements has attracted increasing attention from biologists and computer scientists over the last few years. Methods used in reconstruction include distance-based methods, parsimony methods using sequence-based encodings, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but has been limited to small genomes because the running time of its scoring algorithm grows exponentially with the number of genes in the genome. We report here on a new method to compute a tight lower bound on the score of a given tree, using a set of linear constraints generated through selective applications of the triangle inequality. Our method generates an integer linear program with a carefully limited number of constraints, rapidly solves its relaxed version, and uses the result to provide a tight lower bound. Since this bound is very close to the optimal tree score, it can be used directly as a selection criterion, thereby enabling us to bypass entirely the expensive scoring procedure. We have implemented this method within our GRAPPA software and run several series of experiments on both biological and simulated datasets to assess its accuracy. Our results show that using the bound as a selection criterion yields excellent trees, with error rates below 5 % up to very large evolutionary distances, consistently beating the baseline Neighbor-Joining. Our new method enables us to extend the range of applicability of the direct optimization method to chromosomes of size comparable to those of bacteria, as well as to datasets with complex combinations of evolutionary events.