35 research outputs found
Design and implementation of a generalized laboratory data model
<p>Abstract</p> <p>Background</p> <p>Investigators in the biological sciences continue to exploit laboratory automation methods and have dramatically increased the rates at which they can generate data. In many environments, the methods themselves also evolve in a rapid and fluid manner. These observations point to the importance of robust information management systems in the modern laboratory. Designing and implementing such systems is non-trivial and it appears that in many cases a database project ultimately proves unserviceable.</p> <p>Results</p> <p>We describe a general modeling framework for laboratory data and its implementation as an information management system. The model utilizes several abstraction techniques, focusing especially on the concepts of inheritance and meta-data. Traditional approaches commingle event-oriented data with regular entity data in <it>ad hoc </it>ways. Instead, we define distinct regular entity and event schemas, but fully integrate these via a standardized interface. The design allows straightforward definition of a "processing pipeline" as a sequence of events, obviating the need for separate workflow management systems. A layer above the event-oriented schema integrates events into a workflow by defining "processing directives", which act as automated project managers of items in the system. Directives can be added or modified in an almost trivial fashion, i.e., without the need for schema modification or re-certification of applications. Association between regular entities and events is managed via simple "many-to-many" relationships. We describe the programming interface, as well as techniques for handling input/output, process control, and state transitions.</p> <p>Conclusion</p> <p>The implementation described here has served as the Washington University Genome Sequencing Center's primary information system for several years. It handles all transactions underlying a throughput rate of about 9 million sequencing reactions of various kinds per month and has handily weathered a number of major pipeline reconfigurations. The basic data model can be readily adapted to other high-volume processing environments.</p
Data Descriptor : A European Multi Lake Survey dataset of environmental variables, phytoplankton pigments and cyanotoxins
Under ongoing climate change and increasing anthropogenic activity, which continuously challenge ecosystem resilience, an in-depth understanding of ecological processes is urgently needed. Lakes, as providers of numerous ecosystem services, face multiple stressors that threaten their functioning. Harmful cyanobacterial blooms are a persistent problem resulting from nutrient pollution and climate-change induced stressors, like poor transparency, increased water temperature and enhanced stratification. Consistency in data collection and analysis methods is necessary to achieve fully comparable datasets and for statistical validity, avoiding issues linked to disparate data sources. The European Multi Lake Survey (EMLS) in summer 2015 was an initiative among scientists from 27 countries to collect and analyse lake physical, chemical and biological variables in a fully standardized manner. This database includes in-situ lake variables along with nutrient, pigment and cyanotoxin data of 369 lakes in Europe, which were centrally analysed in dedicated laboratories. Publishing the EMLS methods and dataset might inspire similar initiatives to study across large geographic areas that will contribute to better understanding lake responses in a changing environment.Peer reviewe
Insights into hominid evolution from the gorilla genome sequence.
Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution
Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel
A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved
Steroid receptor coactivator-1 modulates the function of Pomc neurons and energy homeostasis
Hypothalamic neurons expressing the anorectic peptide Pro-opiomelanocortin (Pomc) regulate food intake and body weight. Here, we show that Steroid Receptor Coactivator-1 (SRC-1) interacts with a target of leptin receptor activation, phosphorylated STAT3, to potentiate Pomc transcription. Deletion of SRC-1 in Pomc neurons in mice attenuates their depolarization by leptin, decreases Pomc expression and increases food intake leading to high-fat diet-induced obesity. In humans, fifteen rare heterozygous variants in SRC-1 found in severely obese individuals impair leptin-mediated Pomc reporter activity in cells, whilst four variants found in non-obese controls do not. In a knock-in mouse model of a loss of function human variant (SRC-1L1376P), leptin-induced depolarization of Pomc neurons and Pomc expression are significantly reduced, and food intake and body weight are increased. In summary, we demonstrate that SRC-1 modulates the function of hypothalamic Pomc neurons, and suggest that targeting SRC-1 may represent a useful therapeutic strategy for weight loss.Peer reviewe
Temperature Effects Explain Continental Scale Distribution of Cyanobacterial Toxins
Insight into how environmental change determines the production and distribution of cyanobacterial toxins is necessary for risk assessment. Management guidelines currently focus on hepatotoxins (microcystins). Increasing attention is given to other classes, such as neurotoxins (e.g., anatoxin-a) and cytotoxins (e.g., cylindrospermopsin) due to their potency. Most studies examine the relationship between individual toxin variants and environmental factors, such as nutrients, temperature and light. In summer 2015, we collected samples across Europe to investigate the effect of nutrient and temperature gradients on the variability of toxin production at a continental scale. Direct and indirect effects of temperature were the main drivers of the spatial distribution in the toxins produced by the cyanobacterial community, the toxin concentrations and toxin quota. Generalized linear models showed that a Toxin Diversity Index (TDI) increased with latitude, while it decreased with water stability. Increases in TDI were explained through a significant increase in toxin variants such as MC-YR, anatoxin and cylindrospermopsin, accompanied by a decreasing presence of MC-LR. While global warming continues, the direct and indirect effects of increased lake temperatures will drive changes in the distribution of cyanobacterial toxins in Europe, potentially promoting selection of a few highly toxic species or strains.Peer reviewe
Recommended from our members
Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression
Abstract: Recent developments in stem cell biology have enabled the study of cell fate decisions in early human development that are impossible to study in vivo. However, understanding how development varies across individuals and, in particular, the influence of common genetic variants during this process has not been characterised. Here, we exploit human iPS cell lines from 125 donors, a pooled experimental design, and single-cell RNA-sequencing to study population variation of endoderm differentiation. We identify molecular markers that are predictive of differentiation efficiency of individual lines, and utilise heterogeneity in the genetic background across individuals to map hundreds of expression quantitative trait loci that influence expression dynamically during differentiation and across cellular contexts
Recommended from our members
Human genomic regions with exceptionally high levels of population differentiation identified from 911 whole-genome sequences
Data availability: The 1000 Genomes phase I integrated callset used in this study is publicly available at [The 1000 Genomes Project. http://www.1000genomes.org/]. For a list of samples used in this study, refer to Table S1 in Additional file 1. Additional files: Additional file 1: This file contains supplementary Tables ST1 to ST4 (https://static-content.springer.com/esm/art%3A10.1186%2Fgb-2014-15-6-r88/MediaObjects/13059_2014_3364_MOESM1_ESM.xlsx). Additional file 2: This file contains supplementary Figures F1 to F16 (https://static-content.springer.com/esm/art%3A10.1186%2Fgb-2014-15-6-r88/MediaObjects/13059_2014_3364_MOESM2_ESM.pdf).
Additional file 3: Full list of participants and institutions in the 1000 Genomes Project (https://static-content.springer.com/esm/art%3A10.1186%2Fgb-2014-15-6-r88/MediaObjects/13059_2014_3364_MOESM3_ESM.pdf).Copyright © 2014 Colonna et al. Background: Population differentiation has proved to be effective for identifying loci under geographically localized positive selection, and has the potential to identify loci subject to balancing selection. We have previously investigated the pattern of genetic differentiation among human populations at 36.8 million genomic variants to identify sites in the genome showing high frequency differences. Here, we extend this dataset to include additional variants, survey sites with low levels of differentiation, and evaluate the extent to which highly differentiated sites are likely to result from selective or other processes. Results: We demonstrate that while sites with low differentiation represent sampling effects rather than balancing selection, sites showing extremely high population differentiation are enriched for positive selection events and that one half may be the result of classic selective sweeps. Among these, we rediscover known examples, where we actually identify the established functional SNP, and discover novel examples including the genes ABCA12, CALD1 and ZNF804, which we speculate may be linked to adaptations in skin, calcium metabolism and defense, respectively. Conclusions: We identify known and many novel candidate regions for geographically restricted positive selection, and suggest several directions for further research.The Wellcome Trust (098051), an Italian National Research Council (CNR) short-term mobility fellowship from the 2013 program to VC, and an EMBO Short Term Fellowship ASTF 324–2010 to VC
Stratification strength and light climate explain variation in chlorophyll a at the continental scale in a European multilake survey in a heatwave summer
To determine the drivers of phytoplankton biomass, we collected standardized morphometric, physical, and biological data in 230 lakes across the Mediterranean, Continental, and Boreal climatic zones of the European continent. Multilinear regression models tested on this snapshot of mostly eutrophic lakes (median total phosphorus [TP] = 0.06 and total nitrogen [TN] = 0.7 mg L−1), and its subsets (2 depth types and 3 climatic zones), show that light climate and stratification strength were the most significant explanatory variables for chlorophyll a (Chl a) variance. TN was a significant predictor for phytoplankton biomass for shallow and continental lakes, while TP never appeared as an explanatory variable, suggesting that under high TP, light, which partially controls stratification strength, becomes limiting for phytoplankton development. Mediterranean lakes were the warmest yet most weakly stratified and had significantly less Chl a than Boreal lakes, where the temperature anomaly from the long-term average, during a summer heatwave was the highest (+4°C) and showed a significant, exponential relationship with stratification strength. This European survey represents a summer snapshot of phytoplankton biomass and its drivers, and lends support that light and stratification metrics, which are both affected by climate change, are better predictors for phytoplankton biomass in nutrient-rich lakes than nutrient concentrations and surface temperature