187 research outputs found
GLADX: An Automated Approach to Analyze the Lineage-Specific Loss and Pseudogenization of Genes
A well-established ancestral gene can usually be found, in one or multiple copies, in different descendant species. Sometimes during the course of evolution, all the representatives of a well-established ancestral gene disappear in specific lineages; such gene losses may occur in the genome by deletion of a DNA fragment or by pseudogenization. The loss of an entire gene family in a given lineage may reflect an important phenomenon, and could be due either to adaptation, or to a relaxation of selection that leads to neutral evolution. Therefore, the lineage-specific gene loss analyses are important to improve the understanding of the evolutionary history of genes and genomes. In order to perform this kind of study from the increasing number of complete genome sequences available, we developed a unique new software module called GLADX in the DAGOBAH framework, based on a comparative genomic approach. The software is able to automatically detect, for all the species of a phylum, the presence/absence of a representative of a well-established ancestral gene, and by systematic steps of re-annotation, confirm losses, detect and analyze pseudogenes and find novel genes. The approach is based on the use of highly reliable gene phylogenies, of protein predictions and on the analysis of genomic mutations. All the evidence associated to evolutionary approach provides accurate information for building an overall view of the evolution of a given gene in a selected phylum. The reliability of GLADX has been successfully tested on a benchmark analysis of 14 reported cases. It is the first tool that is able to fully automatically study the lineage-specific losses and pseudogenizations. GLADX is available at http://ioda.univ-provence.fr/IodaSite/gladx/
Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient
Abstract
Background
Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great “Tree of Life” (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user’s needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces.
Results
With the aim of building such a “phylotastic” system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (
http://www.phylotastic.org
), and a server image.
Conclusions
Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.http://deepblue.lib.umich.edu/bitstream/2027.42/112888/1/12859_2013_Article_5897.pd
Recommended from our members
Bioinformatics Lab: A Course-Based Undergraduate Research Experience Curriculum
This is a computer lab course intended to give students technical and practical experience in analyzing sequencing data. We focus on bacterial genomes from recently isolated bacteria, so there is a possibility of discovery of new microbial diversity. Collaborative learning will be encouraged, but students will work individually to assemble, annotate, and analyze a bacterial genome.
Each week will begin with a review on the topic to be covered for that week. When applicable, there will be pre-recorded lectures to introduce lab topics, and students are expected to have watched these before coming to class. Our time in class will be spent as a computer lab, where students will work collaboratively and at their own pace to complete their genome sequencing and analysis. The guided analysis will transition into independent study. Students will produce a final capstone report in the style of the Microbiology Resource Announcements published by the American Society of Microbiology.
This course fulfils the Microbiology Major requirement as a Major Laboratory Elective, and the Microbiology Graduate requirement as a core graduate-level course in microbial science.https://scholarworks.umass.edu/micro_ed_materials/1002/thumbnail.jp
Evolutionary genomics : statistical and computational methods
This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
Algorithms to Explore the Structure and Evolution of Biological Networks
High-throughput experimental protocols have revealed thousands of relationships amongst genes and proteins under various conditions. These putative associations are being aggressively mined to decipher the structural and functional architecture of the cell. One useful tool for exploring this data has been computational network analysis. In this thesis, we propose a collection of novel algorithms to explore the structure and evolution of large, noisy, and sparsely annotated biological networks.
We first introduce two information-theoretic algorithms to extract interesting patterns and modules embedded in large graphs. The first, graph summarization, uses the minimum description length principle to find compressible parts of the graph. The second, VI-Cut, uses the variation of information to non-parametrically find groups of topologically cohesive and similarly annotated nodes in the network. We show that both algorithms find structure in biological data that is consistent with known biological processes, protein complexes, genetic diseases, and operational taxonomic units. We also propose several algorithms to systematically generate an ensemble of near-optimal network clusterings and show how these multiple views can be used together to identify clustering dynamics that any single solution approach would miss.
To facilitate the study of ancient networks, we introduce a framework called ``network archaeology'') for reconstructing the node-by-node and edge-by-edge arrival history of a network. Starting with a present-day network, we apply a probabilistic growth model backwards in time to find high-likelihood previous states of the graph. This allows us to explore how interactions and modules may have evolved over time. In experiments with real-world social and biological networks, we find that our algorithms can recover significant features of ancestral networks that have long since disappeared.
Our work is motivated by the need to understand large and complex biological systems that are being revealed to us by imperfect data. As data continues to pour in, we believe that computational network analysis will continue to be an essential tool towards this end
- …