10,551 research outputs found

    The KM-Algorithm Identifies Regulated Genes in Time Series Expression Data

    Get PDF
    We present a statistical method to rank observed genes in gene expression time series experiments according to their degree of regulation in a biological process. The ranking may be used to focus on specific genes or to select meaningful subsets of genes from which gene regulatory networks can be built. Our approach is based on a state space model that incorporates hidden regulators of gene expression. Kalman (K) smoothing and maximum (M) likelihood estimation techniques are used to derive optimal estimates of the model parameters upon which a proposed regulation criterion is based. The statistical power of the proposed algorithm is investigated, and a real data set is analyzed for the purpose of identifying regulated genes in time dependent gene expression data. This statistical approach supports the concept that meaningful biological conclusions can be drawn from gene expression time series experiments by focusing on strong regulation rather than large expression values

    Network estimation in State Space Model with L1-regularization constraint

    Full text link
    Biological networks have arisen as an attractive paradigm of genomic science ever since the introduction of large scale genomic technologies which carried the promise of elucidating the relationship in functional genomics. Microarray technologies coupled with appropriate mathematical or statistical models have made it possible to identify dynamic regulatory networks or to measure time course of the expression level of many genes simultaneously. However one of the few limitations fall on the high-dimensional nature of such data coupled with the fact that these gene expression data are known to include some hidden process. In that regards, we are concerned with deriving a method for inferring a sparse dynamic network in a high dimensional data setting. We assume that the observations are noisy measurements of gene expression in the form of mRNAs, whose dynamics can be described by some unknown or hidden process. We build an input-dependent linear state space model from these hidden states and demonstrate how an incorporated L1L_{1} regularization constraint in an Expectation-Maximization (EM) algorithm can be used to reverse engineer transcriptional networks from gene expression profiling data. This corresponds to estimating the model interaction parameters. The proposed method is illustrated on time-course microarray data obtained from a well established T-cell data. At the optimum tuning parameters we found genes TRAF5, JUND, CDK4, CASP4, CD69, and C3X1 to have higher number of inwards directed connections and FYB, CCNA2, AKT1 and CASP8 to be genes with higher number of outwards directed connections. We recommend these genes to be object for further investigation. Caspase 4 is also found to activate the expression of JunD which in turn represses the cell cycle regulator CDC2.Comment: arXiv admin note: substantial text overlap with arXiv:1308.359

    GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

    Get PDF
    Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.Department of Agriculture, Food and the MarineEuropean Commission - Seventh Framework Programme (FP7)Science Foundation IrelandUniversity College Dubli

    Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data

    Get PDF
    Determining the functional structure of biological networks is a central goal of systems biology. One approach is to analyze gene expression data to infer a network of gene interactions on the basis of their correlated responses to environmental and genetic perturbations. The inferred network can then be analyzed to identify functional communities. However, commonly used algorithms can yield unreliable results due to experimental noise, algorithmic stochasticity, and the influence of arbitrarily chosen parameter values. Furthermore, the results obtained typically provide only a simplistic view of the network partitioned into disjoint communities and provide no information of the relationship between communities. Here, we present methods to robustly detect coregulated and functionally enriched gene communities and demonstrate their application and validity for Escherichia coli gene expression data. Applying a recently developed community detection algorithm to the network of interactions identified with the context likelihood of relatedness (CLR) method, we show that a hierarchy of network communities can be identified. These communities significantly enrich for gene ontology (GO) terms, consistent with them representing biologically meaningful groups. Further, analysis of the most significantly enriched communities identified several candidate new regulatory interactions. The robustness of our methods is demonstrated by showing that a core set of functional communities is reliably found when artificial noise, modeling experimental noise, is added to the data. We find that noise mainly acts conservatively, increasing the relatedness required for a network link to be reliably assigned and decreasing the size of the core communities, rather than causing association of genes into new communities.Comment: Due to appear in PLoS Computational Biology. Supplementary Figure S1 was not uploaded but is available by contacting the author. 27 pages, 5 figures, 15 supplementary file

    Transcriptional landscape of neuronal and cancer stem cells

    Get PDF
    Tumor mass is composed by heterogeneous cell population including a subset of “cancer stem cells” (CSC). Oncogenic signals foster CSC by transforming tissue stem cells or by reprogramming progenitor/differentiated cells towards stemness. Thus, CSC share features with cancer and stem cells (e.g. self-renewal, hierarchical developmental program leading to differentiated cells, epithelial/mesenchimal transition) and these latter are maintained by the constitutive activation of stemness-promoting signals. CSC could trigger tumor formation, drive to resistance to conventional therapeutics and underlie patients’ relapse. Indeed, stem cell signatures have been associated with poor prognosis in various. This background makes the identification of CSC molecular features mandatory to highlight the survival inner working and to design novel CSC specific therapeutic strategies. Medulloblastoma (MB) is the most common childhood malignant brain tumor and a leading cause of cancerrelated morbidity and mortality. Current multimodal therapies are effective in about 50% of patients but often cause long-term side effects, i.e. developmental, neurological, neuroendocrine and psychosocial deficits (Northcott PA Nature Rev cancer 2012). For many years, MB treated as a single tumor entity despite the divergent tumor histology, patients’ outcome and drug sensitivity, and also by the diversity of the stem cell of origin. Very recently the scenario of human MB has dramatically changed since its heterogeneous biology has been addressed by high-throughput gene expression analysis (oligonucleotide microarrays) or by the powerful genomic next-generation sequencing. These led to the identification of four tumor subgroups (WNT, SHH, Group 3 and Group 4) uncovering the existence of a highly diverse mutational spectra and gene expression. However a quantitative approach has not yet been applied to the transcriptional landscape of Medulloblastoma stem cells (MbSC) through RNA Next Generation Sequencing (RNA-Seq) technology. This is a relevant issue, since RNA-Seq is able to interrogate the genome wide global transcriptome including new transcripts, alternative spliced isoforms and non-coding RNAs. Lower rhombic lip progenitors of the dorsal brainstem are considered the trigger cells in WNT tumors; in SHH subgroup initiation cells are Prominin1+ CD15+ stem cells from the subventricular zone requiring the commitment to Math1+ granule cell progenitors [GCP] of the external granule cell layer [EGL]; while Math1+ or Math1- EGL-GCP or Prominin1+/lineage-negative stem cells sustain the MYC driven Group 3. MbSC derived from SHH tumors and postnatal normal cerebellar stem cells (NcSC) have been reported to share several features. A key signal for both of them is Hedgehog. Furthermore, both NcSC and MbSC display up-regulation of stemness genes (e.g Sox2, Nestin, Nanog, Prom1). Finally, constitutive activation of the Shh pathway by conditional deletion of Ptch1 inhibitory receptor in NcSC, promote medulloblastoma in vivo, producing a mouse model of the human SHH tumor. Acquisition of stemness features may therefore represent the first step of oncogenic conversion. Cooperation with additional oncogenic signals is however needed to enhance MbSC tumorigenicity. In order to understand the MbSCs transcriptional programs, we analyze by RNA-Seq, MbSC derived from Ptch1+/- tumors (Ptch1+/- MbSC). This choice, of a genetically determined model of MB, has allowed us to work with Ptch1+/- MbSC together with appropriate NcSC counterpart, and to analyze biological replicates doing statistical analysis. We identify a number of transcripts, annotated ones, novel isoforms, and long non-coding RNAs, characterizing MbSC and/or NcSC. Some of these genes control stemness or are cancer related and conserved in human medulloblastomas. Interestingly a subset of them, belonging to cell stress response, are of prognostic relevance being significantly related to clinical outcome. Correlation of genes expression characterizing MbSC with survival information from our human medulloblastomas database further demonstrates the significance of these findings. Our data suggest that the modulation of normal and cancer stem cell functions observed in vitro is effective in dissecting the transcriptional programs underlying the in vivo behavior of human medulloblastomas

    Dissecting interferon-induced transcriptional programs in human peripheral blood cells

    Get PDF
    Interferons are key modulators of the immune system, and are central to the control of many diseases. The response of immune cells to stimuli in complex populations is the product of direct and indirect effects, and of homotypic and heterotypic cell interactions. Dissecting the global transcriptional profiles of immune cell populations may provide insights into this regulatory interplay. The host transcriptional response may also be useful in discriminating between disease states, and in understanding pathophysiology. The transcriptional programs of cell populations in health therefore provide a paradigm for deconvoluting disease-associated gene expression profiles.We used human cDNA microarrays to (1) compare the gene expression programs in human peripheral blood mononuclear cells (PBMCs) elicited by 6 major mediators of the immune response: interferons alpha, beta, omega and gamma, IL12 and TNFalpha; and (2) characterize the transcriptional responses of purified immune cell populations (CD4+ and CD8+ T cells, B cells, NK cells and monocytes) to IFNgamma stimulation. We defined a highly stereotyped response to type I interferons, while responses to IFNgamma and IL12 were largely restricted to a subset of type I interferon-inducible genes. TNFalpha stimulation resulted in a distinct pattern of gene expression. Cell type-specific transcriptional programs were identified, highlighting the pronounced response of monocytes to IFNgamma, and emergent properties associated with IFN-mediated activation of mixed cell populations. This information provides a detailed view of cellular activation by immune mediators, and contributes an interpretive framework for the definition of host immune responses in a variety of disease settings

    Bayesian correlated clustering to integrate multiple datasets

    Get PDF
    Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct – but often complementary – information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets. Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDI’s performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques – as well as to non-integrative approaches – demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods

    Topological Analysis of Metabolic Networks Integrating Co-Segregating Transcriptomes and Metabolomes in Type 2 Diabetic Rat Congenic Series

    Get PDF
    Background: The genetic regulation of metabolic phenotypes (i.e., metabotypes) in type 2 diabetes mellitus is caused by complex organ-specific cellular mechanisms contributing to impaired insulin secretion and insulin resistance. Methods: We used systematic metabotyping by 1H NMR spectroscopy and genome-wide gene expression in white adipose tissue to map molecular phenotypes to genomic blocks associated with obesity and insulin secretion in a series of rat congenic strains derived from spontaneously diabetic Goto-Kakizaki (GK) and normoglycemic Brown-Norway (BN) rats. We implemented a network biology strategy approach to visualise shortest paths between metabolites and genes significantly associated with each genomic block. Results: Despite strong genomic similarities (95-99%) among congenics, each strain exhibited specific patterns of gene expression and metabotypes, reflecting metabolic consequences of series of linked genetic polymorphisms in the congenic intervals. We subsequently used the congenic panel to map quantitative trait loci underlying specific metabotypes (mQTL) and genome-wide expression traits (eQTL). Variation in key metabolites like glucose, succinate, lactate or 3-hydroxybutyrate, and second messenger precursors like inositol was associated with several independent genomic intervals, indicating functional redundancy in these regions. To navigate through the complexity of these association networks we mapped candidate genes and metabolites onto metabolic pathways and implemented a shortest path strategy to highlight potential mechanistic links between metabolites and transcripts at colocalized mQTLs and eQTLs. Minimizing shortest path length drove prioritization of biological validations by gene silencing. Conclusions: These results underline the importance of network-based integration of multilevel systems genetics datasets to improve understanding of the genetic architecture of metabotype and transcriptomic regulations and to characterize novel functional roles for genes determining tissue-specific metabolism
    corecore