93 research outputs found

    Computing Phylo-k-mers

    Full text link
    Phylogenetically informed k-mers, or phylo-k-mers for short, are k-mers that are predicted to appear within a given genomic region at predefined locations of a fixed phylogeny. Given a reference alignment for this genomic region and assuming a phylogenetic model of sequence evolution, we can compute a probability score for any given k-mer at any given tree node. The k-mers with sufficiently high probabilities can later be used to perform alignment-free phylogenetic classification of new sequences-a procedure recently proposed for the phylogenetic placement of metabarcoding reads and the detection of novel virus recombinants. While computing phylo-k-mers, we need to consider large numbers of k-mers at each tree node, which warrants the development of efficient enumeration algorithms. We consider a formal definition of the problem of phylo-k-mer computation: How to efficiently find all k-mers whose probability lies above a user-defined threshold for a given tree node? We describe and analyze algorithms for this problem, relying on branch-and-bound and divideand-conquer techniques. We exploit the redundancy of adjacent windows of the alignment and the structure of the probability matrix to save on computation. Besides computational complexity analyses, we provide an empirical evaluation of the relative performance of their implementations on real-world and simulated data. The divide-and-conquer algorithms, which to the best of our knowledge are novel, are found to be clear improvements over the branch-and-bound approach, especially when a large number of phylo-k-mers are found

    OrthoInspector: comprehensive orthology analysis and visual exploration

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The accurate determination of orthology and inparalogy relationships is essential for comparative sequence analysis, functional gene annotation and evolutionary studies. Various methods have been developed based on either simple blast all-versus-all pairwise comparisons and/or time-consuming phylogenetic tree analyses.</p> <p>Results</p> <p>We have developed OrthoInspector, a new software system incorporating an original algorithm for the rapid detection of orthology and inparalogy relations between different species. In comparisons with existing methods, OrthoInspector improves detection sensitivity, with a minimal loss of specificity. In addition, several visualization tools have been developed to facilitate in-depth studies based on these predictions. The software has been used to study the orthology/in-paralogy relationships for a large set of 940,855 protein sequences from 59 different eukaryotic species.</p> <p>Conclusion</p> <p>OrthoInspector is a new software system for orthology/paralogy analysis. It is made available as an independent software suite that can be downloaded and installed for local use. Command line querying facilitates the integration of the software in high throughput processing pipelines and a graphical interface provides easy, intuitive access to results for the non-expert.</p

    Controversies in modern evolutionary biology: the imperative for error detection and quality control

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The data from high throughput genomics technologies provide unique opportunities for studies of complex biological systems, but also pose many new challenges. The shift to the genome scale in evolutionary biology, for example, has led to many interesting, but often controversial studies. It has been suggested that part of the conflict may be due to errors in the initial sequences. Most gene sequences are predicted by bioinformatics programs and a number of quality issues have been raised, concerning DNA sequencing errors or badly predicted coding regions, particularly in eukaryotes.</p> <p>Results</p> <p>We investigated the impact of these errors on evolutionary studies and specifically on the identification of important genetic events. We focused on the detection of asymmetric evolution after duplication, which has been the subject of controversy recently. Using the human genome as a reference, we established a reliable set of 688 duplicated genes in 13 complete vertebrate genomes, where significantly different evolutionary rates are observed. We estimated the rates at which protein sequence errors occur and are accumulated in the higher-level analyses. We showed that the majority of the detected events (57%) are in fact artifacts due to the putative erroneous sequences and that these artifacts are sufficient to mask the true functional significance of the events.</p> <p>Conclusions</p> <p>Initial errors are accumulated throughout the evolutionary analysis, generating artificially high rates of event predictions and leading to substantial uncertainty in the conclusions. This study emphasizes the urgent need for error detection and quality control strategies in order to efficiently extract knowledge from the new genome data.</p

    The mitogenome of Hydropsyche pellucidula (Hydropsychidae): first gene arrangement in the insect order Trichoptera

    Get PDF
    International audienceWe describe the mitochondrial genome of Hydropsyche pellucidula Curtis 1834, which is first described for the suborder Annulipalpia and the first in the order Trichoptera to show a non-canonical gene order. The mitogenome was obtained by de novo assembly of shotgun sequenced total genomic DNA using Illumina Miseq technology, which produced an average coverage of 115× and a minimum coverage of 48×. The mitochondrial genome includes 13 protein-coding genes, 2 rRNAs and 22 tRNAs. The genome is characterized by a rearrangement in the relative position of protein-coding and ribosomal genes. This mitogenome sequence will be useful for studying the family Hydropsychidae, which is commonly used for freshwater pollution biomonitoring

    The mitochondrial genome of Iberobaenia (Coleoptera: Iberobaeniidae): first rearrangement of protein-coding genes in the beetles

    Get PDF
    International audienceThe complete mitochondrial genome of the recently discovered beetle family Iberobaeniidae is described and compared with known coleopteran mitogenomes. The mitochondrial sequence was obtained by shotgun metagenomic sequencing using the Illumina Miseq technology and resulted in an average coverage of 130 × and a minimum coverage of 35×. The mitochondrial genome of Iberobaeniidae includes 13 protein-coding genes, 2 rRNAs, 22 tRNAs genes, and 1 putative control region, and showed a unique rearrangement of protein-coding genes. This is the first rearrangement affecting the relative position of protein-coding and ribosomal genes reported for the order Coleoptera

    EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data

    Get PDF
    Evolutionary systems biology aims to uncover the general trends and principles governing the evolution of biological networks. An essential part of this process is the reconstruction and analysis of the evolutionary histories of these complex, dynamic networks. Unfortunately, the methodologies for representing and exploiting such complex evolutionary histories in large scale studies are currently limited. Here, we propose a new formalism, called EvoluCode (Evolutionary barCode), which allows the integration of different evolutionary parameters (eg, sequence conservation, orthology, synteny …) in a unifying format and facilitates the multilevel analysis and visualization of complex evolutionary histories at the genome scale. The advantages of the approach are demonstrated by constructing barcodes representing the evolution of the complete human proteome. Two large-scale studies are then described: (i) the mapping and visualization of the barcodes on the human chromosomes and (ii) automatic clustering of the barcodes to highlight protein subsets sharing similar evolutionary histories and their functional analysis. The methodologies developed here open the way to the efficient application of other data mining and knowledge extraction techniques in evolutionary systems biology studies. A database containing all EvoluCode data is available at: http://lbgi.igbmc.fr/barcodes

    Metagenome skimming of insect specimen pools: potential for comparative genomics

    Get PDF
    Metagenomic analyses are challenging in metazoans, but high-copy number and repeat regions can be assembled from lowcoverage sequencing by “genome skimming,” which is applied here as a new way of characterizing metagenomes obtained in an ecological or taxonomic context. Illumina shotgun sequencing on two pools of Coleoptera (beetles) of approximately 200 species each were assembled into tens of thousands of scaffolds. Repeated low-coverage sequencing recovered similar scaffold sets consistently, although approximately 70% of scaffolds could not be identified against existing genome databases. Identifiable scaffolds included mitochondrial DNA, conserved sequences with hits to expressed sequence tag and protein databases, and knownrepeatelementsof high and low complexity, includingnumerous copies ofrRNAandhistone genes.Assemblies of histones captured a diversity of gene order and primary sequence in Coleoptera. Scaffolds with similarity to multiple sites in available coleopteran genome sequences for Dendroctonus and Tribolium revealed high specificity of scaffolds to either of these genomes, in particular for high-copy number repeats. Numerous “clusters” of scaffolds mapped to the same genomic site revealed intraand/or intergenomic variation within a metagenome pool. In addition to effect of taxonomic composition of the metagenomes, the number of mapped scaffolds also revealed structural differences between the two reference genomes, although the significance of this striking finding remains unclear. Finally, apparently exogenous sequences were recovered, including potential food plants, fungal pathogens, and bacterial symbionts. The “metagenome skimming” approach is useful for capturing the genomic diversity of poorly studied, species-rich lineages and opens new prospects in environmental genomic

    Toward community standards in the quest for orthologs

    Get PDF
    The identification of orthologs—genes pairs descended from a common ancestor through speciation, rather than duplication—has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second ‘Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications. Contact: [email protected]

    Uncovering trophic interactions in arthropod predators through DNA shotgun-sequencing of gut contents

    Get PDF
    Characterizing trophic networks is fundamental to many questions in ecology, but this typically requires painstaking efforts, especially to identify the diet of small generalist predators. Several attempts have been devoted to develop suitable molecular tools to determine predatory trophic interactions through gut content analysis, and the challenge has been to achieve simultaneously high taxonomic breadth and resolution. General and practical methods are still needed, preferably independent of PCR amplification of barcodes, to recover a broader range of interactions. Here we applied shotgun-sequencing of the DNA from arthropod predator gut contents, extracted from four common coccinellid and dermapteran predators co-occurring in an agroecosystem in Brazil. By matching unassembled reads against six DNA reference databases obtained from public databases and newly assembled mitogenomes, and filtering for high overlap length and identity, we identified prey and other foreign DNA in the predator guts. Good taxonomic breadth and resolution was achieved (93% of prey identified to species or genus), but with low recovery of matching reads. Two to nine trophic interactions were found for these predators, some of which were only inferred by the presence of parasitoids and components of the microbiome known to be associated with aphid prey. Intraguild predation was also found, including among closely related ladybird species. Uncertainty arises from the lack of comprehensive reference databases and reliance on low numbers of matching reads accentuating the risk of false positives. We discuss caveats and some future prospects that could improve the use of direct DNA shotgun-sequencing to characterize arthropod trophic networks
    corecore