2,228 research outputs found

    The Art of Data Science

    Full text link
    To flourish in the new data-intensive environment of 21st century science, we need to evolve new skills. These can be expressed in terms of the systemized framework that formed the basis of mediaeval education - the trivium (logic, grammar, and rhetoric) and quadrivium (arithmetic, geometry, music, and astronomy). However, rather than focusing on number, data is the new keystone. We need to understand what rules it obeys, how it is symbolized and communicated and what its relationship to physical space and time is. In this paper, we will review this understanding in terms of the technologies and processes that it requires. We contend that, at least, an appreciation of all these aspects is crucial to enable us to extract scientific information and knowledge from the data sets which threaten to engulf and overwhelm us.Comment: 12 pages, invited talk at Astrostatistics and Data Mining in Large Astronomical Databases workshop, La Palma, Spain, 30 May - 3 June 2011, to appear in Springer Series on Astrostatistic

    Insight into glucocorticoid receptor signalling through interactome model analysis

    Get PDF
    Glucocorticoid hormones (GCs) are used to treat a variety of diseases because of their potent anti-inflammatory effect and their ability to induce apoptosis in lymphoid malignancies through the glucocorticoid receptor (GR). Despite ongoing research, high glucocorticoid efficacy and widespread usage in medicine, resistance, disease relapse and toxicity remain factors that need addressing. Understanding the mechanisms of glucocorticoid signalling and how resistance may arise is highly important towards improving therapy. To gain insight into this we undertook a systems biology approach, aiming to generate a Boolean model of the glucocorticoid receptor protein interaction network that encapsulates functional relationships between the GR, its target genes or genes that target GR, and the interactions between the genes that interact with the GR. This model named GEB052 consists of 52 nodes representing genes or proteins, the model input (GC) and model outputs (cell death and inflammation), connected by 241 logical interactions of activation or inhibition. 323 changes in the relationships between model constituents following in silico knockouts were uncovered, and steady-state analysis followed by cell-based microarray genome-wide model validation led to an average of 57% correct predictions, which was taken further by assessment of model predictions against patient microarray data. Lastly, semi-quantitative model analysis via microarray data superimposed onto the model with a score flow algorithm has also been performed, which demonstrated significantly higher correct prediction ratios (average of 80%), and the model has been assessed as a predictive clinical tool using published patient microarray data. In summary we present an in silico simulation of the glucocorticoid receptor interaction network, linked to downstream biological processes that can be analysed to uncover relationships between GR and its interactants. Ultimately the model provides a platform for future development both by directing laboratory research and allowing for incorporation of further components, encapsulating more interactions/genes involved in glucocorticoid receptor signalling

    Seasonal variations in carbon, nitrogen and phosphorus concentrations and C:N:P stoichiometry in different organs of a Larix principis-rupprechtii Mayr. plantation in the Qinling Mountains, China

    Get PDF
    Understanding how concentrations of elements and their stoichiometry change with plant growth and age is critical for predicting plant community responses to environmental change. Weusedlong-term field experiments to explore how the leaf, stem and root carbon (C), nitrogen (N) and phosphorous (P) concentrations and their stoichiometry changed with growth and stand age in a L.principis-rupprechtii Mayr. plantation from 2012–2015 in the Qinling Mountains, China. Our results showed that the C, N and P concentrations and stoichiometric ratios in different tissues of larch stands were affected by stand age, organ type andsampling month and displayed multiple correlations with increased stand age in different growing seasons. Generally, leaf C and N concentrations were greatest in the fast-growing season, but leaf P concentrations were greatest in the early growing season. However, no clear seasonal tendencies in the stem and root C, N and P concentrations were observed with growth. In contrast to N and P, few differences were found in organ-specific C concentrations. Leaf N:P was greatest in the fast-growing season, while C:N and C:P were greatest in the late-growing season. No clear variations were observed in stem and root C:N, C:P andN:Pthroughout the entire growing season, but leaf N:P was less than 14, suggesting that the growth of larch stands was limited by N in our study region. Compared to global plant element concentrations and stoichiometry, the leaves of larch stands had higher C, P, C:NandC:PbutlowerNandN:P,andtherootshadgreater PandC:NbutlowerN,C:Pand N:P. Our study provides baseline information for describing the changes in nutritional elements with plant growth, which will facilitates plantation forest management and restoration, and makes avaluable contribution to the global data pool on leaf nutrition and stoichiometry

    Quantitative Bias in Illumina TruSeq and a Novel Post Amplification Barcoding Strategy for Multiplexed DNA and Small RNA Deep Sequencing

    Get PDF
    Here we demonstrate a method for unbiased multiplexed deep sequencing of RNA and DNA libraries using a novel, efficient and adaptable barcoding strategy called Post Amplification Ligation-Mediated (PALM). PALM barcoding is performed as the very last step of library preparation, eliminating a potential barcode-induced bias and allowing the flexibility to synthesize as many barcodes as needed. We sequenced PALM barcoded micro RNA (miRNA) and DNA reference samples and evaluated the quantitative barcode-induced bias in comparison to the same reference samples prepared using the Illumina TruSeq barcoding strategy. The Illumina TruSeq small RNA strategy introduces the barcode during the PCR step using differentially barcoded primers, while the TruSeq DNA strategy introduces the barcode before the PCR step by ligation of differentially barcoded adaptors. Results show virtually no bias between the differentially barcoded miRNA and DNA samples, both for the PALM and the TruSeq sample preparation methods. We also multiplexed miRNA reference samples using a pre-PCR barcode ligation. This barcoding strategy results in significant bias

    Metabolic Engineering of Potato Carotenoid Content through Tuber-Specific Overexpression of a Bacterial Mini-Pathway

    Get PDF
    BACKGROUND: Since the creation of “Golden Rice”, biofortification of plant-derived foods is a promising strategy for the alleviation of nutritional deficiencies. Potato is the most important staple food for mankind after the cereals rice, wheat and maize, and is extremely poor in provitamin A carotenoids. METHODOLOGY: We transformed potato with a mini-pathway of bacterial origin, driving the synthesis of beta-carotene (Provitamin A) from geranylgeranyl diphosphate. Three genes, encoding phytoene synthase (CrtB), phytoene desaturase (CrtI) and lycopene beta-cyclase (CrtY) from Erwinia, under tuber-specific or constitutive promoter control, were used. 86 independent transgenic lines, containing six different promoter/gene combinations, were produced and analyzed. Extensive regulatory effects on the expression of endogenous genes for carotenoid biosynthesis are observed in transgenic lines. Constitutive expression of the CrtY and/or CrtI genes interferes with the establishment of transgenosis and with the accumulation of leaf carotenoids. Expression of all three genes, under tuber-specific promoter control, results in tubers with a deep yellow (“golden”) phenotype without any adverse leaf phenotypes. In these tubers, carotenoids increase approx. 20-fold, to 114 mcg/g dry weight and beta-carotene 3600-fold, to 47 mcg/g dry weight. CONCLUSIONS: This is the highest carotenoid and beta-carotene content reported for biofortified potato as well as for any of the four major staple foods (the next best event being “Golden Rice 2”, with 31 mcg/g dry weight beta-carotene). Assuming a beta-carotene to retinol conversion of 6∶1, this is sufficient to provide 50% of the Recommended Daily Allowance of Vitamin A with 250 gms (fresh weight) of “golden” potatoes

    Global parameter estimation methods for stochastic biochemical systems

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The importance of stochasticity in cellular processes having low number of molecules has resulted in the development of stochastic models such as chemical master equation. As in other modelling frameworks, the accompanying rate constants are important for the end-applications like analyzing system properties (e.g. robustness) or predicting the effects of genetic perturbations. Prior knowledge of kinetic constants is usually limited and the model identification routine typically includes parameter estimation from experimental data. Although the subject of parameter estimation is well-established for deterministic models, it is not yet routine for the chemical master equation. In addition, recent advances in measurement technology have made the quantification of genetic substrates possible to single molecular levels. Thus, the purpose of this work is to develop practical and effective methods for estimating kinetic model parameters in the chemical master equation and other stochastic models from single cell and cell population experimental data.</p> <p>Results</p> <p>Three parameter estimation methods are proposed based on the maximum likelihood and density function distance, including probability and cumulative density functions. Since stochastic models such as chemical master equations are typically solved using a Monte Carlo approach in which only a finite number of Monte Carlo realizations are computationally practical, specific considerations are given to account for the effect of finite sampling in the histogram binning of the state density functions. Applications to three practical case studies showed that while maximum likelihood method can effectively handle low replicate measurements, the density function distance methods, particularly the cumulative density function distance estimation, are more robust in estimating the parameters with consistently higher accuracy, even for systems showing multimodality.</p> <p>Conclusions</p> <p>The parameter estimation methodologies described in this work have provided an effective and practical approach in the estimation of kinetic parameters of stochastic systems from either sparse or dense cell population data. Nevertheless, similar to kinetic parameter estimation in other modelling frameworks, not all parameters can be estimated accurately, which is a common problem arising from the lack of complete parameter identifiability from the available data.</p
    corecore