Search CORE

15 research outputs found

Evaluation of genomic island predictors using a comparative genomics approach

Author: Brinkman Fiona SL
Hsiao William WL
Langille Morgan GI
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin. GIs are disproportionately associated with microbial adaptations of medical or environmental interest. Recently, multiple programs for automated detection of GIs have been developed that utilize sequence composition characteristics, such as G+C ratio and dinucleotide bias. To robustly evaluate the accuracy of such methods, we propose that a dataset of GIs be constructed using criteria that are independent of sequence composition-based analysis approaches. Results We developed a comparative genomics approach (IslandPick) that identifies both very probable islands and non-island regions. The approach involves 1) flexible, automated selection of comparative genomes for each query genome, using a distance function that picks appropriate genomes for identification of GIs, 2) identification of regions unique to the query genome, compared with the chosen genomes (positive dataset) and 3) identification of regions conserved across all genomes (negative dataset). Using our constructed datasets, we investigated the accuracy of several sequence composition-based GI prediction tools. Conclusion Our results indicate that AlienHunter has the highest recall, but the lowest measured precision, while SIGI-HMM is the most precise method. SIGI-HMM and IslandPath/DIMOB have comparable overall highest accuracy. Our comparative genomics approach, IslandPick, was the most accurate, compared with a curated list of GIs, indicating that we have constructed suitable datasets. This represents the first evaluation, using diverse and, independent datasets that were not artificially constructed, of the accuracy of several sequence composition-based GI predictors. The caveats associated with this analysis and proposals for optimal island prediction are discussed.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Simon Fraser University Institutional Repository

Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences.

Author: Langille Morgan GI,
Publication venue
Publication date: 06/11/2018
Field of study

Ezid

BioTorrents: A File Sharing Service for Scientific Data

Author: Langille Morgan GI,
Publication venue
Publication date: 19/05/2020
Field of study

Ezid

MicrobeDB: a locally maintainable database of microbial genomic sequences

Author: Langille Morgan GI,
Publication venue
Publication date: 19/05/2020
Field of study

Ezid

Bioinformatic detection of horizontally transferred DNA in bacterial genomes

Author: Fiona SL Brinkman
Morgan GI Langille
Publication venue: 'Faculty Opinions Ltd'
Publication date
Field of study

Crossref

Which is more important for classifying microbial communities: who's there or what they can do?

Author: Knight Rob
Langille Morgan GI
Malmer Daniel
Way Samuel F
Xu Zhenjiang
Publication venue: eScholarship, University of California
Publication date: 29/08/2014
Field of study

PubMed Central

eScholarship - University of California

Recommended from our members

Which is more important for classifying microbial communities: who's there or what they can do?

Author: Knight Rob
Langille Morgan GI
Malmer Daniel
Way Samuel F
Xu Zhenjiang
Publication venue: eScholarship, University of California
Publication date: 01/12/2014
Field of study

eScholarship - University of California

Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource

Author: Eisen Jonathan A
Jospin Guillaume
Langille Morgan GI
Pollard Katherine S
Sharpton Thomas J
Wu Dongying
Publication venue: eScholarship, University of California
Publication date: 13/10/2012
Field of study

Abstract Background New computational resources are needed to manage the increasing volume of biological data from genome sequencing projects. One fundamental challenge is the ability to maintain a complete and current catalog of protein diversity. We developed a new approach for the identification of protein families that focuses on the rapid discovery of homologous protein sequences. Results We implemented fully automated and high-throughput procedures to de novo cluster proteins into families based upon global alignment similarity. Our approach employs an iterative clustering strategy in which homologs of known families are sifted out of the search for new families. The resulting reduction in computational complexity enables us to rapidly identify novel protein families found in new genomes and to perform efficient, automated updates that keep pace with genome sequencing. We refer to protein families identified through this approach as “Sifting Families,” or SFams. Our analysis of ~10.5 million protein sequences from 2,928 genomes identified 436,360 SFams, many of which are not represented in other protein family databases. We validated the quality of SFam clustering through statistical as well as network topology–based analyses. Conclusions We describe the rapid identification of SFams and demonstrate how they can be used to annotate genomes and metagenomes. The SFam database catalogs protein-family quality metrics, multiple sequence alignments, hidden Markov models, and phylogenetic trees. Our source code and database are publicly available and will be subject to frequent updates (http://edhar.genomecenter.ucdavis.edu/sifting_families/)

Crossref

PubMed Central

eScholarship - University of California