291 research outputs found

    Gene expression trees in lymphoid development

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The regulatory processes that govern cell proliferation and differentiation are central to developmental biology. Particularly well studied in this respect is the lymphoid system due to its importance for basic biology and for clinical applications. Gene expression measured in lymphoid cells in several distinguishable developmental stages helps in the elucidation of underlying molecular processes, which change gradually over time and lock cells in either the B cell, T cell or Natural Killer cell lineages. Large-scale analysis of these <it>gene expression trees </it>requires computational support for tasks ranging from visualization, querying, and finding clusters of similar genes, to answering detailed questions about the functional roles of individual genes.</p> <p>Results</p> <p>We present the first statistical framework designed to analyze gene expression data as it is collected in the course of lymphoid development through clusters of co-expressed genes and additional heterogeneous data. We introduce dependence trees for continuous variates, which model the inherent dependencies during the differentiation process naturally as gene expression trees. Several trees are combined in a mixture model to allow inference of potentially overlapping clusters of co-expressed genes. Additionally, we predict microRNA targets.</p> <p>Conclusion</p> <p>Computational results for several data sets from the lymphoid system demonstrate the relevance of our framework. We recover well-known biological facts and identify promising novel regulatory elements of genes and their functional assignments. The implementation of our method (licensed under the GPL) is available at <url>http://algorithmics.molgen.mpg.de/Supplements/ExpLym/</url>.</p

    Impact of biopower generation on eastern US forests

    Get PDF
    Biopower, electricity generated from biomass, is a major source of renewable energy in the US. About ten percent of US non-hydro renewable electricity in 2020 was generated from biomass. Despite significant growth in woody biomass use for electricity in recent decades, a systematic assessment of associated impacts on forest resources is lacking. This study assessed associations between biopower generation, and selected timberland structure indicators and carbon stocks across 438 areas surrounding wood-using and coal-burning power plants in the Eastern US from 2005 to 2017. Timberland areas around plants generating biopower were associated with more live and standing-dead trees, and carbon in their respective stocks, than comparable areas of neighboring plants only burning coal. We also detected an inverse association between the number of biopower plants and number of live and dead trees, and respective carbon stocks. We discerned an upward temporal trajectory in carbon stocks within live trees with continued biopower generation. We found no significant differences related to the amount of MWh biopower generation within the analysis areas. Net impacts of biopower descriptors on timberland attributes point to a positive trend in selected ecological conditions and carbon balances. The upward temporal trend in carbon stocks with longer generation of wood-based biopower may point to a plausibly sustainable contribution to the decarbonization of the US electricity sector

    pGQL: A probabilistic graphical query language for gene expression time courses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Timeboxes are graphical user interface widgets that were proposed to specify queries on time course data. As queries can be very easily defined, an exploratory analysis of time course data is greatly facilitated. While timeboxes are effective, they have no provisions for dealing with noisy data or data with fluctuations along the time axis, which is very common in many applications. In particular, this is true for the analysis of gene expression time courses, which are mostly derived from noisy microarray measurements at few unevenly sampled time points. From a data mining point of view the robust handling of data through a sound statistical model is of great importance.</p> <p>Results</p> <p>We propose probabilistic timeboxes, which correspond to a specific class of Hidden Markov Models, that constitutes an established method in data mining. Since HMMs are a particular class of probabilistic graphical models we call our method Probabilistic Graphical Query Language. Its implementation was realized in the free software package pGQL. We evaluate its effectiveness in exploratory analysis on a yeast sporulation data set.</p> <p>Conclusions</p> <p>We introduce a new approach to define dynamic, statistical queries on time course data. It supports an interactive exploration of reasonably large amounts of data and enables users without expert knowledge to specify fairly complex statistical models with ease. The expressivity of our approach is by its statistical nature greater and more robust with respect to amplitude and frequency fluctuation than the prior, deterministic timeboxes.</p

    Constrained mixture estimation for analysis and robust classification of clinical time series

    Get PDF
    Motivation: Personalized medicine based on molecular aspects of diseases, such as gene expression profiling, has become increasingly popular. However, one faces multiple challenges when analyzing clinical gene expression data; most of the well-known theoretical issues such as high dimension of feature spaces versus few examples, noise and missing data apply. Special care is needed when designing classification procedures that support personalized diagnosis and choice of treatment. Here, we particularly focus on classification of interferon-β (IFNβ) treatment response in Multiple Sclerosis (MS) patients which has attracted substantial attention in the recent past. Half of the patients remain unaffected by IFNβ treatment, which is still the standard. For them the treatment should be timely ceased to mitigate the side effects

    The Entropy of a Binary Hidden Markov Process

    Full text link
    The entropy of a binary symmetric Hidden Markov Process is calculated as an expansion in the noise parameter epsilon. We map the problem onto a one-dimensional Ising model in a large field of random signs and calculate the expansion coefficients up to second order in epsilon. Using a conjecture we extend the calculation to 11th order and discuss the convergence of the resulting series

    Clustering cancer gene expression data: a comparative study

    Get PDF
    Background The use of clustering methods for the discovery of cancer subtypes has drawn a great deal of attention in the scientific community. While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using "classic" clustering methods. There have been no studies thus far performing a large-scale evaluation of different clustering methods in this context. Results/Conclusion We present the first large-scale analysis of seven different clustering methods and four proximity measures for the analysis of 35 cancer gene expression data sets. Our results reveal that the finite mixture of Gaussians, followed closely by k-means, exhibited the best performance in terms of recovering the true structure of the data sets. These methods also exhibited, on average, the smallest difference between the actual number of classes in the data sets and the best number of clusters as indicated by our validation criteria. Furthermore, hierarchical methods, which have been widely used by the medical community, exhibited a poorer recovery performance than that of the other methods evaluated. Moreover, as a stable basis for the assessment and comparison of different clustering methods for cancer gene expression data, this study provides a common group of data sets (benchmark data sets) to be shared among researchers and used for comparisons with new methods. The data sets analyzed in this study are available at http://algorithmics.molgen.mpg.de/Supplements/CompCancer/ webcite

    SeagrassDB: An open-source transcriptomics landscape for phylogenetically profiled seagrasses and aquatic plants

    Full text link
    © 2018, The Author(s). Seagrasses and aquatic plants are important clades of higher plants, significant for carbon sequestration and marine ecological restoration. They are valuable in the sense that they allow us to understand how plants have developed traits to adapt to high salinity and photosynthetically challenged environments. Here, we present a large-scale phylogenetically profiled transcriptomics repository covering seagrasses and aquatic plants. SeagrassDB encompasses a total of 1,052,262 unigenes with a minimum and maximum contig length of 8,831 bp and 16,705 bp respectively. SeagrassDB provides access to 34,455 transcription factors, 470,568 PFAM domains, 382,528 prosite models and 482,121 InterPro domains across 9 species. SeagrassDB allows for the comparative gene mining using BLAST-based approaches and subsequent unigenes sequence retrieval with associated features such as expression (FPKM values), gene ontologies, functional assignments, family level classification, Interpro domains, KEGG orthology (KO), transcription factors and prosite information. SeagrassDB is available to the scientific community for exploring the functional genic landscape of seagrass and aquatic plants at: http://115.146.91.129/index.php

    Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data

    Get PDF
    Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns. Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results. Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance

    Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm

    Get PDF
    We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/

    Molecular physiology reveals ammonium uptake and related gene expression in the seagrass Zostera muelleri

    Full text link
    © 2016 Elsevier Ltd Seagrasses are important marine foundation species, which are presently threatened by coastal development and global change worldwide. The molecular mechanisms that drive seagrass responses to anthropogenic stresses, including elevated levels of nutrients such as ammonium, remains poorly understood. Despite the evidence that seagrasses can assimilate ammonium by using glutamine synthetase (GS)/glutamate synthase (glutamine-oxoglutarate amidotransferase or GOGAT) cycle, the regulation of this fundamental metabolic pathway has never been studied at the gene expression level in seagrasses so far. Here, we combine (i) reverse transcription quantitative real-time PCR (RT-qPCR) to measure expression of key genes involved in the GS/GOGAT cycle, and (ii) stable isotope labelling and mass spectrometry to investigate 15N-ammonium assimilation in the widespread Australian species Zostera muelleri subsp. capricorni (Z. muelleri). We demonstrate that exposure to a pulse of ammonium in seawater can induce changes in GS gene expression of Z. muelleri, and further correlate these changes in gene expression with 15N-ammonium uptake rate in above- and below-ground tissue
    corecore