44 research outputs found

    PhagePhisher: a Pipeline for the Discovery of Covert Viral Sequences in Complex Genomic Datasets

    Get PDF
    Obtaining meaningful viral information from large sequencing datasets presents unique challenges distinct from prokaryotic and eukaryotic sequencing efforts. The difficulties surrounding this issue can be ascribed in part to the genomic plasticity of viruses themselves as well as the scarcity of existing information in genomic databases. The open-source software PhagePhisher (http://www.putonti-lab.com/phagephisher) has been designed as a simple pipeline to extract relevant information from complex and mixed datasets, and will improve the examination of bacteriophages, viruses, and virally related sequences, in a range of environments. Key aspects of the software include speed and ease of use; PhagePhisher can be used with limited operator knowledge of bioinformatics on a standard workstation. As a proof-of-concept, PhagePhisher was successfully implemented with bacteria–virus mixed samples of varying complexity. Furthermore, viral signals within microbial metagenomic datasets were easily and quickly identified by PhagePhisher, including those from prophages as well as lysogenic phages, an important and often neglected aspect of examining phage populations in the environment. PhagePhisher resolves viral-related sequences which may be obscured by or imbedded in bacterial genomes

    A Polyglot Approach to Bioinformatics Data Integration: a Phylogenetic Analysis of HIV-1

    Get PDF
    As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for \u3e6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest

    virMine: automated detection of viral sequences from complex metagenomic samples

    Get PDF
    Metagenomics has enabled sequencing of viral communities from a myriad of different environments. Viral metagenomic studies routinely uncover sequences with no recognizable homology to known coding regions or genomes. Nevertheless, complete viral genomes have been constructed directly from complex community metagenomes, often through tedious manual curation. To address this, we developed the software tool virMine to identify viral genomes from raw reads representative of viral or mixed (viral and bacterial) communities. virMine automates sequence read quality control, assembly, and annotation. Researchers can easily refine their search for a specific study system and/or feature(s) of interest. In contrast to other viral genome detection tools that often rely on the recognition of viral signature sequences, virMine is not restricted by the insufficient representation of viral diversity in public data repositories. Rather, viral genomes are identified through an iterative approach, first omitting non-viral sequences. Thus, both relatives of previously characterized viruses and novel species can be detected, including both eukaryotic viruses and bacteriophages. Here we present virMine and its analysis of synthetic communities as well as metagenomic data sets from three distinctly different environments: the gut microbiota, the urinary microbiota, and freshwater viromes. Several new viral genomes were identified and annotated, thus contributing to our understanding of viral genetic diversity in these three environments

    Bacteriophages isolated from Lake Michigan demonstrate broad host-range across several bacterial phyla

    Get PDF
    BACKGROUND: The study of bacteriophages continues to generate key information about microbial interactions in the environment. Many phenotypic characteristics of bacteriophages cannot be examined by sequencing alone, further highlighting the necessity for isolation and examination of phages from environmental samples. While much of our current knowledge base has been generated by the study of marine phages, freshwater viruses are understudied in comparison. Our group has previously conducted metagenomics-based studies samples collected from Lake Michigan - the data presented in this study relate to four phages that were extracted from the same samples. FINDINGS: Four phages were extracted from Lake Michigan on the same bacterial host, exhibiting similar morphological characteristics as shown under transmission electron microscopy. Growth characteristics of the phages were unique to each isolate. Each phage demonstrated a host-range spanning several phyla of bacteria - to date, such a broad host-range is yet to be reported. Genomic data reveals genomes of a similar size, and close similarities between the Lake Michigan phages and the Pseudomonas phage PB1, however, the majority of annotated genes present were ORFans and little insight was offered into mechanisms for host-range. CONCLUSIONS: The phages isolated from Lake Michigan are capable of infecting several bacterial phyla, and demonstrate varied phenotypic characteristics despite similarities in host preference, and at the genomic level. We propose that such a broad host-range is likely related to the oligotrophic nature of Lake Michigan, and the competitive benefit that this characteristic may lend to phages in nature

    Egr-1 induces DARPP-32 expression in striatal medium spiny neurons via a conserved intragenic element.

    Get PDF
    DARPP-32 (dopamine and adenosine 3\u27, 5\u27-cyclic monophosphate cAMP-regulated phosphoprotein, 32 kDa) is a striatal-enriched protein that mediates signaling by dopamine and other first messengers in the medium spiny neurons. The transcriptional mechanisms that regulate striatal DARPP-32 expression remain enigmatic and are a subject of much interest in the efforts to induce a striatal phenotype in stem cells. We report the identification and characterization of a conserved region, also known as H10, in intron IV of the gene that codes for DARPP-32 (Ppp1r1b). This DNA sequence forms multiunit complexes with nuclear proteins from adult and embryonic striata of mice and rats. Purification of proteins from these complexes identified early growth response-1 (Egr-1). The interaction between Egr-1 and H10 was confirmed in vitro and in vivo by super-shift and chromatin immunoprecipitation assays, respectively. Importantly, brain-derived neurotrophic factor (BDNF), a known inducer of DARPP-32 and Egr-1 expression, enhanced Egr-1 binding to H10 in vitro. Moreover, overexpression of Egr-1 in primary striatal neurons induced the expression of DARPP-32, whereas a dominant-negative Egr-1 blocked DARPP-32 induction by BDNF. Together, this study identifies Egr-1 as a transcriptional activator of the Ppp1r1b gene and provides insight into the molecular mechanisms that regulate medium spiny neuron maturation

    Pseudomonas Diversity Within Urban Freshwaters

    Get PDF
    Freshwater lakes are home to bacterial communities with 1000s of interdependent species. Numerous high-throughput 16S rRNA gene sequence surveys have provided insight into the microbial taxa found within these waters. Prior surveys of Lake Michigan waters have identified bacterial species common to freshwater lakes as well as species likely introduced from the urban environment. We cultured bacterial isolates from samples taken from the Chicago nearshore waters of Lake Michigan in an effort to look more closely at the genetic diversity of species found there within. The most abundant genus detected was Pseudomonas, whose presence in freshwaters is often attributed to storm water or runoff. Whole genome sequencing was conducted for 15 Lake Michigan Pseudomonas strains, representative of eight species and three isolates that could not be resolved with named species. These genomes were examined specifically for genes encoding functionality which may be advantageous in their urban environment. Antibiotic resistance, amidst other known virulence factors and defense mechanisms, were identified in the genome annotations and verified in the lab. We also tested the Lake Michigan Pseudomonas strains for siderophore production and resistance to the heavy metals mercury and copper. As the study presented here shows, a variety of pseudomonads have inhabited the urban coastal waters of Lake Michigan

    The FunGenES Database: A Genomics Resource for Mouse Embryonic Stem Cell Differentiation

    Get PDF
    Embryonic stem (ES) cells have high self-renewal capacity and the potential to differentiate into a large variety of cell types. To investigate gene networks operating in pluripotent ES cells and their derivatives, the “Functional Genomics in Embryonic Stem Cells” consortium (FunGenES) has analyzed the transcriptome of mouse ES cells in eleven diverse settings representing sixty-seven experimental conditions. To better illustrate gene expression profiles in mouse ES cells, we have organized the results in an interactive database with a number of features and tools. Specifically, we have generated clusters of transcripts that behave the same way under the entire spectrum of the sixty-seven experimental conditions; we have assembled genes in groups according to their time of expression during successive days of ES cell differentiation; we have included expression profiles of specific gene classes such as transcription regulatory factors and Expressed Sequence Tags; transcripts have been arranged in “Expression Waves” and juxtaposed to genes with opposite or complementary expression patterns; we have designed search engines to display the expression profile of any transcript during ES cell differentiation; gene expression data have been organized in animated graphs of KEGG signaling and metabolic pathways; and finally, we have incorporated advanced functional annotations for individual genes or gene clusters of interest and links to microarray and genomic resources. The FunGenES database provides a comprehensive resource for studies into the biology of ES cells

    Assessment of microbial populations within Chicago area nearshore waters and interfaces with river systems

    Get PDF
    The Chicago area locks separate and control water flow between the freshwaters of Lake Michigan and the network of Illinois waterways. Under extreme storm conditions, however, the locks are opened and storm waters, untreated waste, and runoff are released directly into the lake. These combined sewer overflow (CSO) events introduce microbes, viruses, and nutrients such as nitrogen and phosphorous into nearshore waters which likely affect the native species. We collected surface water samples from four Chicago area beaches – Gillson Park, Montrose Beach, 57th Street Beach, and Calumet Beach – every two weeks from May 13 through August 5, 2014. Sampling was conducted with four biological replicates for each sampling date and location, resulting in 112 samples. Each community was surveyed through targeted sequencing of the V4 16S rRNA gene. Technical replicates were also sequenced and are included in this dataset. Taxa were identified using Mothur. Raw sequence data is available via NCBI׳s SRA database (part of BioProject PRJNA245802)
    corecore