7 research outputs found

    Experimental Incubations Elicit Profound Changes in Community Transcription in OMZ Bacterioplankton

    Get PDF
    Sequencing of microbial community RNA (metatranscriptome) is a useful approach for assessing gene expression in microorganisms from the natural environment. This method has revealed transcriptional patterns in situ, but can also be used to detect transcriptional cascades in microcosms following experimental perturbation. Unambiguously identifying differential transcription between control and experimental treatments requires constraining effects that are simply due to sampling and bottle enclosure. These effects remain largely uncharacterized for “challenging” microbial samples, such as those from anoxic regions that require special handling to maintain in situ conditions. Here, we demonstrate substantial changes in microbial transcription induced by sample collection and incubation in experimental bioreactors. Microbial communities were sampled from the water column of a marine oxygen minimum zone by a pump system that introduced minimal oxygen contamination and subsequently incubated in bioreactors under near in situ oxygen and temperature conditions. Relative to the source water, experimental samples became dominated by transcripts suggestive of cell stress, including chaperone, protease, and RNA degradation genes from diverse taxa, with strong representation from SAR11-like alphaproteobacteria. In tandem, transcripts matching facultative anaerobic gammaproteobacteria of the Alteromonadales (e.g., Colwellia) increased 4–13 fold up to 43% of coding transcripts, and encoded a diverse gene set suggestive of protein synthesis and cell growth. We interpret these patterns as taxon-specific responses to combined environmental changes in the bioreactors, including shifts in substrate or oxygen availability, and minor temperature and pressure changes during sampling with the pump system. Whether such changes confound analysis of transcriptional patterns may vary based on the design of the experiment, the taxonomic composition of the source community, and on the metabolic linkages between community members. These data highlight the impressive capacity for transcriptional changes within complex microbial communities, underscoring the need for caution when inferring in situ metabolism based on transcript abundances in experimental incubations

    BiobankUniverse:Automatic matchmaking between datasets for biobank data discovery and integration

    Get PDF
    Motivation: Biobanks are indispensable for large-scale genetic/epidemiological studies, yet it remains difficult for researchers to determine which biobanks contain data matching their research questions. Results: To overcome this, we developed a new matching algorithm that identifies pairs of related data elements between biobanks and research variables with high precision and recall. It integrates lexical comparison, Unified Medical Language System ontology tagging and semantic query expansion. The result is BiobankUniverse, a fast matchmaking service for biobanks and researchers. Biobankers upload their data elements and researchers their desired study variables, BiobankUniverse automatically shortlists matching attributes between them. Users can quickly explore matching potential and search for biobanks/data elements matching their research. They can also curate matches and define personalized data-universes

    DataSHIELD: taking the analysis to the data, not the data to the analysis

    Get PDF
    Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises important ethico-legal questions and can be controversial. In the UK this has been highlighted by recent debate and controversy relating to the UK's proposed 'care.data' initiative, and these issues reflect important societal and professional concerns about privacy, confidentiality and intellectual property. DataSHIELD provides a novel technological solution that can circumvent some of the most basic challenges in facilitating the access of researchers and other healthcare professionals to individual-level data. Commands are sent from a central analysis computer (AC) to several data computers (DCs) storing the data to be co-analysed. The data sets are analysed simultaneously but in parallel. The separate parallelized analyses are linked by non-disclosive summary statistics and commands transmitted back and forth between the DCs and the AC. This paper describes the technical implementation of DataSHIELD using a modified R statistical environment linked to an Opal database deployed behind the computer firewall of each DC. Analysis is controlled through a standard R environment at the AC. Based on this Opal/R implementation, DataSHIELD is currently used by the Healthy Obese Project and the Environmental Core Project (BioSHaRE-EU) for the federated analysis of 10 data sets across eight European countries, and this illustrates the opportunities and challenges presented by the DataSHIELD approach. DataSHIELD facilitates important research in settings where: (i) a co-analysis of individual-level data from several studies is scientifically necessary but governance restrictions prohibit the release or sharing of some of the required data, and/or render data access unacceptably slow; (ii) a research group (e.g. in a developing nation) is particularly vulnerable to loss of intellectual property-the researchers want to fully share the information held in their data with national and international collaborators, but do not wish to hand over the physical data themselves; and (iii) a data set is to be included in an individual-level co-analysis but the physical size of the data precludes direct transfer to a new site for analysis
    corecore