19 research outputs found
Array2BIO: from microarray expression data to functional annotation of co-regulated genes
BACKGROUND: There are several isolated tools for partial analysis of microarray expression data. To provide an integrative, easy-to-use and automated toolkit for the analysis of Affymetrix microarray expression data we have developed Array2BIO, an application that couples several analytical methods into a single web based utility. RESULTS: Array2BIO converts raw intensities into probe expression values, automatically maps those to genes, and subsequently identifies groups of co-expressed genes using two complementary approaches: (1) comparative analysis of signal versus control and (2) clustering analysis of gene expression across different conditions. The identified genes are assigned to functional categories based on Gene Ontology classification and KEGG protein interaction pathways. Array2BIO reliably handles low-expressor genes and provides a set of statistical methods for quantifying expression levels, including Benjamini-Hochberg and Bonferroni multiple testing corrections. An automated interface with the ECR Browser provides evolutionary conservation analysis for the identified gene loci while the interconnection with Crème allows prediction of gene regulatory elements that underlie observed expression patterns. CONCLUSION: We have developed Array2BIO – a web based tool for rapid comprehensive analysis of Affymetrix microarray expression data, which also allows users to link expression data to Dcode.org comparative genomics tools and integrates a system for translating co-expression data into mechanisms of gene co-regulation. Array2BIO is publicly available a
Artificial Polyploidy Improves Bacterial Single Cell Genome Recovery
BACKGROUND: Single cell genomics (SCG) is a combination of methods whose goal is to decipher the complete genomic sequence from a single cell and has been applied mostly to organisms with smaller genomes, such as bacteria and archaea. Prior single cell studies showed that a significant portion of a genome could be obtained. However, breakages of genomic DNA and amplification bias have made it very challenging to acquire a complete genome with single cells. We investigated an artificial method to induce polyploidy in Bacillus subtilis ATCC 6633 by blocking cell division and have shown that we can significantly improve the performance of genomic sequencing from a single cell. METHODOLOGY/PRINCIPAL FINDINGS: We inhibited the bacterial cytoskeleton protein FtsZ in B.subtilis with an FtsZ-inhibiting compound, PC190723, resulting in larger undivided single cells with multiple copies of its genome. qPCR assays of these larger, sorted cells showed higher DNA content, have less amplification bias, and greater genomic recovery than untreated cells. SIGNIFICANCE: The method presented here shows the potential to obtain a nearly complete genome sequence from a single bacterial cell. With millions of uncultured bacterial species in nature, this method holds tremendous promise to provide insight into the genomic novelty of yet-to-be discovered species, and given the temporary effects of artificial polyploidy coupled with the ability to sort and distinguish differences in cell size and genomic DNA content, may allow recovery of specific organisms in addition to their genomes
Recommended from our members
Identification of mobile genetic elements with geNomad
Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a dataset of more than 200,000 marker protein profiles to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks, geNomad achieved high classification performance for diverse plasmids and viruses (Matthews correlation coefficient of 77.8% and 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad's speed and scalability, we processed over 2.7 trillion base pairs of sequencing data, leading to the discovery of millions of viruses and plasmids that are available through the IMG/VR and IMG/PR databases. geNomad is available at https://portal.nersc.gov/genomad
Recommended from our members
Discovery of an Antarctic Ascidian-Associated Uncultivated Verrucomicrobia with Antimelanoma Palmerolide Biosynthetic Potential
The Antarctic marine ecosystem harbors a wealth of biological and chemical innovation that has risen in concert over millennia since the isolation of the continent and formation of the Antarctic circumpolar current. Scientific inquiry into the novelty of marine natural products produced by Antarctic benthic invertebrates led to the discovery of a bioactive macrolide, palmerolide A, that has specific activity against melanoma and holds considerable promise as an anticancer therapeutic. While this compound was isolated from the Antarctic ascidian Synoicum adareanum, its biosynthesis has since been hypothesized to be microbially mediated, given structural similarities to microbially produced hybrid nonribosomal peptide-polyketide macrolides. Here, we describe a metagenome-enabled investigation aimed at identifying the biosynthetic gene cluster (BGC) and palmerolide A-producing organism. A 74-kbp candidate BGC encoding the multimodular enzymatic machinery (hybrid type I-trans-AT polyketide synthase-nonribosomal peptide synthetase and tailoring functional domains) was identified and found to harbor key features predicted as necessary for palmerolide A biosynthesis. Surveys of ascidian microbiome samples targeting the candidate BGC revealed a high correlation between palmerolide gene targets and a single 16S rRNA gene variant (R = 0.83 to 0.99). Through repeated rounds of metagenome sequencing followed by binning contigs into metagenome-assembled genomes, we were able to retrieve a nearly complete genome (10 contigs) of the BGC-producing organism, a novel verrucomicrobium within the Opitutaceae family that we propose here as "Candidatus Synoicihabitans palmerolidicus." The refined genome assembly harbors five highly similar BGC copies, along with structural and functional features that shed light on the host-associated nature of this unique bacterium. IMPORTANCE Palmerolide A has potential as a chemotherapeutic agent to target melanoma. We interrogated the microbiome of the Antarctic ascidian, Synoicum adareanum, using a cultivation-independent high-throughput sequencing and bioinformatic strategy. The metagenome-encoded biosynthetic machinery predicted to produce palmerolide A was found to be associated with the genome of a member of the S. adareanum core microbiome. Phylogenomic analysis suggests the organism represents a new deeply branching genus, "Candidatus Synoicihabitans palmerolidicus," in the Opitutaceae family of the Verrucomicrobia phylum. The Ca. Synoicihabitans palmerolidicus 4.29-Mb genome encodes a repertoire of carbohydrate-utilizing and transport pathways, a chemotaxis system, flagellar biosynthetic capacity, and other regulatory elements enabling its ascidian-associated lifestyle. The palmerolide producer's genome also contains five distinct copies of the large palmerolide biosynthetic gene cluster that may provide structural complexity of palmerolide variants
Recommended from our members
Comparative metagenomics reveals impact of contaminants on groundwater microbiomes.
To understand patterns of geochemical cycling in pristine versus contaminated groundwater ecosystems, pristine shallow groundwater (FW301) and contaminated groundwater (FW106) samples from the Oak Ridge Integrated Field Research Center (OR-IFRC) were sequenced and compared to each other to determine phylogenetic and metabolic difference between the communities. Proteobacteria (e.g., Burkholderia, Pseudomonas) are the most abundant lineages in the pristine community, though a significant proportion ( >55%) of the community is composed of poorly characterized low abundance (individually <1%) lineages. The phylogenetic diversity of the pristine community contributed to a broader diversity of metabolic networks than the contaminated community. In addition, the pristine community encodes redundant and mostly complete geochemical cycles distributed over multiple lineages and appears capable of a wide range of metabolic activities. In contrast, many geochemical cycles in the contaminated community appear truncated or minimized due to decreased biodiversity and dominance by Rhodanobacter populations capable of surviving the combination of stresses at the site. These results indicate that the pristine site contains more robust and encodes more functional redundancy than the stressed community, which contributes to more efficient nutrient cycling and adaptability than the stressed community
Recommended from our members
A Practical Approach to Using the Genomic Standards Consortium MIxS Reporting Standard for Comparative Genomics and Metagenomics
Comparative analysis of (meta)genomes necessitates aggregation, integration, and synthesis of well-annotated data using standards. The Genomic Standards Consortium (GSC) collaborates with the research community to develop and maintain the Minimum Information about any (x) Sequence (MIxS) reporting standard for genomic data. To facilitate the use of the GSC's MIxS reporting standard, we provide a description of the structure and terminology, how to navigate ontologies for required terms in MIxS, and demonstrate practical usage through a soil metagenome example
Challenges in Bioinformatics Workflows for Processing Microbiome Omics Data at Scale
The nascent field of microbiome science is transitioning from a descriptive approach of cataloging taxa and functions present in an environment to applying multi-omics methods to investigate microbiome dynamics and function. A large number of new tools and algorithms have been designed and used for very specific purposes on samples collected by individual investigators or groups. While these developments have been quite instructive, the ability to compare microbiome data generated by many groups of researchers is impeded by the lack of standardized application of bioinformatics methods. Additionally, there are few examples of broad bioinformatics workflows that can process metagenome, metatranscriptome, metaproteome and metabolomic data at scale, and no central hub that allows processing, or provides varied omics data that are findable, accessible, interoperable and reusable (FAIR). Here, we review some of the challenges that exist in analyzing omics data within the microbiome research sphere, and provide context on how the National Microbiome Data Collaborative has adopted a standardized and open access approach to address such challenges
Recommended from our members
Standardized and accessible multi-omics bioinformatics workflows through the NMDC EDGE resource
Accessible and easy-to-use standardized bioinformatics workflows are necessary to advance microbiome research from observational studies to large-scale, data-driven approaches. Standardized multi-omics data enables comparative studies, data reuse, and applications of machine learning to model biological processes. To advance broad accessibility of standardized multi-omics bioinformatics workflows, the National Microbiome Data Collaborative (NMDC) has developed the Empowering the Development of Genomics Expertise (NMDC EDGE) resource, a user-friendly, open-source web application (https://nmdc-edge.org). Here, we describe the design and main functionality of the NMDC EDGE resource for processing metagenome, metatranscriptome, natural organic matter, and metaproteome data. The architecture relies on three main layers (web application, orchestration, and execution) to ensure flexibility and expansion to future workflows. The orchestration and execution layers leverage best practices in software containers and accommodate high-performance computing and cloud computing services. Further, we have adopted a robust user research process to collect feedback for continuous improvement of the resource. NMDC EDGE provides an accessible interface for researchers to process multi-omics microbiome data using production-quality workflows to facilitate improved data standardization and interoperability