659 research outputs found

    OrthoSelect: a protocol for selecting orthologous groups in phylogenomics

    Get PDF
    Background: Phylogenetic studies using expressed sequence tags (EST) are becoming a standard approach to answer evolutionary questions. Such studies are usually based on large sets of newly generated, unannotated, and error-prone EST sequences from different species. A first crucial step in EST-based phylogeny reconstruction is to identify groups of orthologous sequences. From these data sets, appropriate target genes are selected, and redundant sequences are eliminated to obtain suitable sequence sets as input data for tree-reconstruction software. Generating such data sets manually can be very time consuming. Thus, software tools are needed that carry out these steps automatically. Results: We developed a flexible and user-friendly software pipeline, running on desktop machines or computer clusters, that constructs data sets for phylogenomic analyses. It automatically searches assembled EST sequences against databases of orthologous groups (OG), assigns ESTs to these predefined OGs, translates the sequences into proteins, eliminates redundant sequences assigned to the same OG, creates multiple sequence alignments of identified orthologous sequences and offers the possibility to further process this alignment in a last step by excluding potentially homoplastic sites and selecting sufficiently conserved parts. Our software pipeline can be used as it is, but it can also be adapted by integrating additional external programs. This makes the pipeline useful for non-bioinformaticians as well as to bioinformatic experts. The software pipeline is especially designed for ESTs, but it can also handle protein sequences. Conclusion: OrthoSelect is a tool that produces orthologous gene alignments from assembled ESTs. Our tests show that OrthoSelect detects orthologs in EST libraries with high accuracy. In the absence of a gold standard for orthology prediction, we compared predictions by OrthoSelect to a manually created and published phylogenomic data set. Our tool was not only able to rebuild the data set with a specificity of 98%, but it detected four percent more orthologous sequences. Furthermore, the results OrthoSelect produces are in absolut agreement with the results of other programs, but our tool offers a significant speedup and additional functionality, e.g. handling of ESTs, computing sequence alignments, and refining them. To our knowledge, there is currently no fully automated and freely available tool for this purpose. Thus, OrthoSelect is a valuable tool for researchers in the field of phylogenomics who deal with large quantities of EST sequences. OrthoSelect is written in Perl and runs on Linux/Mac OS X

    The complex hybrid origins of the root knot nematodes revealed through comparative genomics

    Get PDF
    Meloidogyne root knot nematodes (RKN) can infect most of the world's agricultural crop species and are among the most important of all plant pathogens. As yet however we have little understanding of their origins or the genomic basis of their extreme polyphagy. The most damaging pathogens reproduce by mitotic parthenogenesis and are suggested to originate by interspecific hybridizations between unknown parental taxa. We sequenced the genome of the diploid meiotic parthenogen Meloidogyne floridensis, and use a comparative genomic approach to test the hypothesis that it was involved in the hybrid origin of the tropical mitotic parthenogen M. incognita. Phylogenomic analysis of gene families from M. floridensis, M. incognita and an outgroup species M. hapla was used to trace the evolutionary history of these species' genomes, demonstrating that M. floridensis was one of the parental species in the hybrid origins of M. incognita. Analysis of the M. floridensis genome revealed many gene loci present in divergent copies, as they are in M. incognita, indicating that it too had a hybrid origin. The triploid M. incognita is shown to be a complex double-hybrid between M. floridensis and a third, unidentified parent. The agriculturally important RKN have very complex origins involving the mixing of several parental genomes by hybridization and their extreme polyphagy and agricultural success may be related to this hybridization, producing transgressive variation on which natural selection acts. Studying RKN variation via individual marker loci may fail due to the species' convoluted origins, and multi-species population genomics is essential to understand the hybrid diversity and adaptive variation of this important species complex. This comparative genomic analysis provides a compelling example of the importance and complexity of hybridization in generating animal species diversity more generally

    Unravelling the Origins and Evolution of the Animal Kingdom using Genomics

    Get PDF
    There are ~35 classified phyla/sub-phyla within the Animal Kingdom; some of which haveunresolved relationships. The advent of genomics has made it possible to study new aspects of animalevolution, including comparative genomics (e.g., gene loss/gain, non-coding regions, synteny, etc),gene family evolution, and their evolutionary relationships using genome-wide data. No study to date has compared all the wealth of genomic data available to understand theevolution of the Animal Kingdom. Using a core bioinformatics pipeline and dataset to infer HomologyGroups (HGs), the losses and novelties of these HGs were proven integral to the diversification of theanimal kingdom. The same core pipeline was used to extract homeobox gene HGs, a key family usedto understand origin and diversification in animals. Gene trees were inferred from the core datasetHGs to determine the evolution of a gene family iconic in the study of animal body plans. Conservedanimal genes were also mined using the same pipeline and dataset. Animal phylogenomics is one ofthe most controversial areas in modern evolutionary science. Whilst many new methods have beendeveloped, no study to date has tried to assess the impact of gene age in the reconstruction ofevolutionary trees. The phyla with the largest count of HG losses also had the highest counts of HG novelties. Notall of these were strictly de novo, but the numbers suggest a re-manufacturing of the genetic materialfrom the genes reduced to those that were more recently diverged. A comprehensive classification of all the diversity of animal homeobox genes is lacking. Thegene trees showed complex patterns, with similar homeobox expansions between more distant species,and interlapping homeobox families. The highly conserved HGs recovered, for the animal phylogenies, well-established relationshipsbetween some phyla using maximum likelihood and Bayesian inference methods. Ctenophora wasconsistently recovered as sister to all other animals, and interesting relationships between ecdysozoansand lophotrochozoans. However, it was proven that it takes more than a highly conserved set of genesto infer a stable and correct phylogeny. Each of the additional methods used to extend the core bioinformatics pipeline revealed apattern of correlation, particularly among the fast evolving species, such as platyhelminthes, nematodes and tardigrades. These HG losers, and gainers also had lineage specific homeobox clades,and caused artefactual problems in the phylogenies

    The Asian arowana (Scleropages formosus) genome provides new insights into the evolution of an early lineage of teleosts

    Get PDF
    The Asian arowana (Scleropages formosus), one of the world’s most expensive cultivated ornamental fishes, is an endangered species. It represents an ancient lineage of teleosts: the Osteoglossomorpha. Here, we provide a high-quality chromosome-level reference genome of a female golden-variety arowana using a combination of deep shotgun sequencing and high-resolution linkage mapping. In addition, we have also generated two draft genome assemblies for the red and green varieties. Phylogenomic analysis supports a sister group relationship between Osteoglossomorpha (bonytongues) and Elopomorpha (eels and relatives), with the two clades together forming a sister group of Clupeocephala which includes all the remaining teleosts. The arowana genome retains the full complement of eight Hox clusters unlike the African butterfly fish (Pantodon buchholzi), another bonytongue fish, which possess only five Hox clusters. Differential gene expression among three varieties provides insights into the genetic basis of colour variation. A potential heterogametic sex chromosome is identified in the female arowana karyotype, suggesting that the sex is determined by a ZW/ZZ sex chromosomal system. The high-quality reference genome of the golden arowana and the draft assemblies of the red and green varieties are valuable resources for understanding the biology, adaptation and behaviour of Asian arowanas

    The study of plant genome evolution by means of phylogenomics

    Get PDF

    The Asian Arowana (Scleropages formosus) Genome Provides New Insights into the Evolution of an Early Lineage of Teleosts

    Get PDF
    The Asian arowana (Scleropages formosus), one of the world’s most expensive cultivated ornamental fishes, is an endangered species. It represents an ancient lineage of teleosts: the Osteoglossomorpha. Here, we provide a high-quality chromosome-level reference genome of a female golden-variety arowana using a combination of deep shotgun sequencing and high-resolution linkage mapping. In addition, we have also generated two draft genome assemblies for the red and green varieties. Phylogenomic analysis supports a sister group relationship between Osteoglossomorpha (bonytongues) and Elopomorpha (eels and relatives), with the two clades together forming a sister group of Clupeocephala which includes all the remaining teleosts. The arowana genome retains the full complement of eight Hox clusters unlike the African butterfly fish (Pantodon buchholzi), another bonytongue fish, which possess only five Hox clusters. Differential gene expression among three varieties provides insights into the genetic basis of colour variation. A potential heterogametic sex chromosome is identified in the female arowana karyotype, suggesting that the sex is determined by a ZW/ZZ sex chromosomal system. The high-quality reference genome of the golden arowana and the draft assemblies of the red and green varieties are valuable resources for understanding the biology, adaptation and behaviour of Asian arowanas

    Ultra-fast sequence clustering from similarity networks with SiLiX

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The number of gene sequences that are available for comparative genomics approaches is increasing extremely quickly. A current challenge is to be able to handle this huge amount of sequences in order to build families of homologous sequences in a reasonable time.</p> <p>Results</p> <p>We present the software package <monospace>SiLiX</monospace> that implements a novel method which reconsiders single linkage clustering with a graph theoretical approach. A parallel version of the algorithms is also presented. As a demonstration of the ability of our software, we clustered more than 3 millions sequences from about 2 billion BLAST hits in 7 minutes, with a high clustering quality, both in terms of sensitivity and specificity.</p> <p>Conclusions</p> <p>Comparing state-of-the-art software, <monospace>SiLiX</monospace> presents the best up-to-date capabilities to face the problem of clustering large collections of sequences. <monospace>SiLiX</monospace> is freely available at <url>http://lbbe.univ-lyon1.fr/SiLiX</url>.</p

    From Genes to Ecosystems: Resource Availability and DNA Methylation Drive the Diversity and Abundance of Restriction Modification Systems in Prokaryotes

    Get PDF
    Together, prokaryotic hosts and their viruses numerically dominate the planet and are engaged in an eternal struggle of hosts evading viral predation and viruses overcoming defensive mechanisms employed by their hosts. Prokaryotic hosts have been found to carry several viral defense systems in recent years with Restriction Modification systems (RMs) were the first discovered in the 1950s. While we have biochemically elucidated many of these systems in the last 70 years, we still struggle to understand what drives their gain and loss in prokaryotic genomes. In this work, we take a computational approach to understand the underlying evolutionary drivers of RMs by assessing ‘big data’ signals of RMs in prokaryotic genomes and incorporating molecular data in trait-based mathematical models. Focusing on the Cyanobacteria, we found a large discrepancy in the frequency of RMs per genome in different environmental contexts, where Cyanobacteria that live in oligotrophic nutrient conditions have few to no RMs and those in nutrient-rich conditions consistently have many RMs. While our models agree with the observation that increased nutrient inputs make the selective pressure of RMs more intense, they were unable to reconcile the high numbers of RMs per genome with their potent defensive properties- a situation of apparent overkill. By incorporating viral methylation, an unavoidable effect of RMs, we were able to explain how organisms could carry over 15 RMs. With this discovery, we then tried and reassess the distribution of methyltransferases, an essential component of RMs that can also have alternate physiological rolls in the cell. We expand on conventional wisdom, that methyltransferases that are widely phylogenetically conserved are associated with global cellular regulation. However, we also find that organisms with high numbers of RMs also have a surprising amount of conservation in the methyltransferases that they carry. This data suggests caution should be used in associating phylogenic signals with functional rolls in methyltransferases as different functional rolls seem to overlap in their phylogenetic signal. Indeed, we suggest trait-based modeling may be the best tool in elucidating why organisms with a high selective pressure to maintain RMs appear to have conserved methyltransferase
    • 

    corecore