10,551 research outputs found
The KM-Algorithm Identifies Regulated Genes in Time Series Expression Data
We present a statistical method to rank observed genes in gene expression time series experiments according to their degree of regulation in a biological process. The ranking may be used to focus on specific genes or to select meaningful subsets of genes from which gene regulatory networks can be built. Our approach is based on a state
space model that incorporates hidden regulators of gene expression. Kalman (K) smoothing and maximum (M) likelihood estimation techniques are used to derive optimal estimates of the model parameters upon which a proposed regulation criterion is based. The statistical power of the proposed algorithm is investigated, and a real data set is analyzed for the purpose of identifying regulated genes in time dependent gene expression data. This statistical approach supports the concept that meaningful biological conclusions can be drawn from gene expression time series experiments by focusing on strong regulation rather than large expression values
Recommended from our members
Unbiased Boolean analysis of public gene expression data for cell cycle gene identification.
Cell proliferation is essential for the development and maintenance of all organisms and is dysregulated in cancer. Using synchronized cells progressing through the cell cycle, pioneering microarray studies defined cell cycle genes based on cyclic variation in their expression. However, the concordance of the small number of synchronized cell studies has been limited, leading to discrepancies in definition of the transcriptionally regulated set of cell cycle genes within and between species. Here we present an informatics approach based on Boolean logic to identify cell cycle genes. This approach used the vast array of publicly available gene expression data sets to query similarity to CCNB1, which encodes the cyclin subunit of the Cdk1-cyclin B complex that triggers the G2-to-M transition. In addition to highlighting conservation of cell cycle genes across large evolutionary distances, this approach identified contexts where well-studied genes known to act during the cell cycle are expressed and potentially acting in nondivision contexts. An accessible web platform enables a detailed exploration of the cell cycle gene lists generated using the Boolean logic approach. The methods employed are straightforward to extend to processes other than the cell cycle
Network estimation in State Space Model with L1-regularization constraint
Biological networks have arisen as an attractive paradigm of genomic science
ever since the introduction of large scale genomic technologies which carried
the promise of elucidating the relationship in functional genomics. Microarray
technologies coupled with appropriate mathematical or statistical models have
made it possible to identify dynamic regulatory networks or to measure time
course of the expression level of many genes simultaneously. However one of the
few limitations fall on the high-dimensional nature of such data coupled with
the fact that these gene expression data are known to include some hidden
process. In that regards, we are concerned with deriving a method for inferring
a sparse dynamic network in a high dimensional data setting. We assume that the
observations are noisy measurements of gene expression in the form of mRNAs,
whose dynamics can be described by some unknown or hidden process. We build an
input-dependent linear state space model from these hidden states and
demonstrate how an incorporated regularization constraint in an
Expectation-Maximization (EM) algorithm can be used to reverse engineer
transcriptional networks from gene expression profiling data. This corresponds
to estimating the model interaction parameters. The proposed method is
illustrated on time-course microarray data obtained from a well established
T-cell data. At the optimum tuning parameters we found genes TRAF5, JUND, CDK4,
CASP4, CD69, and C3X1 to have higher number of inwards directed connections and
FYB, CCNA2, AKT1 and CASP8 to be genes with higher number of outwards directed
connections. We recommend these genes to be object for further investigation.
Caspase 4 is also found to activate the expression of JunD which in turn
represses the cell cycle regulator CDC2.Comment: arXiv admin note: substantial text overlap with arXiv:1308.359
GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.Department of Agriculture, Food and the MarineEuropean Commission - Seventh Framework Programme (FP7)Science Foundation IrelandUniversity College Dubli
Recommended from our members
Boolean analysis identifies CD38 as a biomarker of aggressive localized prostate cancer.
The introduction of serum Prostate Specific Antigen (PSA) testing nearly 30 years ago has been associated with a significant shift towards localized disease and decreased deaths due to prostate cancer. Recognition that PSA testing has caused over diagnosis and over treatment of prostate cancer has generated considerable controversy over its value, and has spurred efforts to identify prognostic biomarkers to distinguish patients who need treatment from those that can be observed. Recent studies show that cancer is heterogeneous and forms a hierarchy of tumor cell populations. We developed a method of identifying prostate cancer differentiation states related to androgen signaling using Boolean logic. Using gene expression data, we identified two markers, CD38 and ARG2, that group prostate cancer into three differentiation states. Cancers with CD38-, ARG2- expression patterns, corresponding to an undifferentiated state, had significantly lower 10-year recurrence-free survival compared to the most differentiated group (CD38+ARG2+). We carried out immunohistochemical (IHC) staining for these two markers in a single institution (Stanford; n = 234) and multi-institution (Canary; n = 1326) cohorts. IHC staining for CD38 and ARG2 in the Stanford cohort demonstrated that combined expression of CD38 and ARG2 was prognostic. In the Canary cohort, low CD38 protein expression by IHC was significantly associated with recurrence-free survival (RFS), seminal vesicle invasion (SVI), extra-capsular extension (ECE) in univariable analysis. In multivariable analysis, ARG2 and CD38 IHC staining results were not independently associated with RFS, overall survival, or disease-specific survival after adjusting for other factors including SVI, ECE, Gleason score, pre-operative PSA, and surgical margins
Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data
Determining the functional structure of biological networks is a central goal
of systems biology. One approach is to analyze gene expression data to infer a
network of gene interactions on the basis of their correlated responses to
environmental and genetic perturbations. The inferred network can then be
analyzed to identify functional communities. However, commonly used algorithms
can yield unreliable results due to experimental noise, algorithmic
stochasticity, and the influence of arbitrarily chosen parameter values.
Furthermore, the results obtained typically provide only a simplistic view of
the network partitioned into disjoint communities and provide no information of
the relationship between communities. Here, we present methods to robustly
detect coregulated and functionally enriched gene communities and demonstrate
their application and validity for Escherichia coli gene expression data.
Applying a recently developed community detection algorithm to the network of
interactions identified with the context likelihood of relatedness (CLR)
method, we show that a hierarchy of network communities can be identified.
These communities significantly enrich for gene ontology (GO) terms, consistent
with them representing biologically meaningful groups. Further, analysis of the
most significantly enriched communities identified several candidate new
regulatory interactions. The robustness of our methods is demonstrated by
showing that a core set of functional communities is reliably found when
artificial noise, modeling experimental noise, is added to the data. We find
that noise mainly acts conservatively, increasing the relatedness required for
a network link to be reliably assigned and decreasing the size of the core
communities, rather than causing association of genes into new communities.Comment: Due to appear in PLoS Computational Biology. Supplementary Figure S1
was not uploaded but is available by contacting the author. 27 pages, 5
figures, 15 supplementary file
Transcriptional landscape of neuronal and cancer stem cells
Tumor mass is composed by heterogeneous cell population including a subset of “cancer stem cells” (CSC).
Oncogenic signals foster CSC by transforming tissue stem cells or by reprogramming progenitor/differentiated
cells towards stemness. Thus, CSC share features with cancer and stem cells (e.g. self-renewal, hierarchical
developmental program leading to differentiated cells, epithelial/mesenchimal transition) and these latter are
maintained by the constitutive activation of stemness-promoting signals. CSC could trigger tumor formation,
drive to resistance to conventional therapeutics and underlie patients’ relapse. Indeed, stem cell signatures
have been associated with poor prognosis in various.
This background makes the identification of CSC molecular features mandatory to highlight the survival inner
working and to design novel CSC specific therapeutic strategies.
Medulloblastoma (MB) is the most common childhood malignant brain tumor and a leading cause of cancerrelated
morbidity and mortality. Current multimodal therapies are effective in about 50% of patients but often
cause long-term side effects, i.e. developmental, neurological, neuroendocrine and psychosocial deficits
(Northcott PA Nature Rev cancer 2012). For many years, MB treated as a single tumor entity despite the
divergent tumor histology, patients’ outcome and drug sensitivity, and also by the diversity of the stem cell of
origin. Very recently the scenario of human MB has dramatically changed since its heterogeneous biology has
been addressed by high-throughput gene expression analysis (oligonucleotide microarrays) or by the powerful
genomic next-generation sequencing. These led to the identification of four tumor subgroups (WNT, SHH,
Group 3 and Group 4) uncovering the existence of a highly diverse mutational spectra and gene expression.
However a quantitative approach has not yet been applied to the transcriptional landscape of Medulloblastoma
stem cells (MbSC) through RNA Next Generation Sequencing (RNA-Seq) technology. This is a relevant issue,
since RNA-Seq is able to interrogate the genome wide global transcriptome including new transcripts,
alternative spliced isoforms and non-coding RNAs.
Lower rhombic lip progenitors of the dorsal brainstem are considered the trigger cells in WNT tumors; in SHH
subgroup initiation cells are Prominin1+ CD15+ stem cells from the subventricular zone requiring the
commitment to Math1+ granule cell progenitors [GCP] of the external granule cell layer [EGL]; while Math1+ or
Math1- EGL-GCP or Prominin1+/lineage-negative stem cells sustain the MYC driven Group 3.
MbSC derived from SHH tumors and postnatal normal cerebellar stem cells (NcSC) have been reported to
share several features. A key signal for both of them is Hedgehog. Furthermore, both NcSC and MbSC display
up-regulation of stemness genes (e.g Sox2, Nestin, Nanog, Prom1). Finally, constitutive activation of the Shh
pathway by conditional deletion of Ptch1 inhibitory receptor in NcSC, promote medulloblastoma in vivo,
producing a mouse model of the human SHH tumor. Acquisition of stemness features may therefore represent
the first step of oncogenic conversion. Cooperation with additional oncogenic signals is however needed to
enhance MbSC tumorigenicity.
In order to understand the MbSCs transcriptional programs, we analyze by RNA-Seq, MbSC derived from
Ptch1+/- tumors (Ptch1+/- MbSC). This choice, of a genetically determined model of MB, has allowed us to
work with Ptch1+/- MbSC together with appropriate NcSC counterpart, and to analyze biological replicates
doing statistical analysis.
We identify a number of transcripts, annotated ones, novel isoforms, and long non-coding RNAs,
characterizing MbSC and/or NcSC. Some of these genes control stemness or are cancer related and
conserved in human medulloblastomas. Interestingly a subset of them, belonging to cell stress response, are
of prognostic relevance being significantly related to clinical outcome. Correlation of genes expression
characterizing MbSC with survival information from our human medulloblastomas database further
demonstrates the significance of these findings. Our data suggest that the modulation of normal and cancer
stem cell functions observed in vitro is effective in dissecting the transcriptional programs underlying the in
vivo behavior of human medulloblastomas
Dissecting interferon-induced transcriptional programs in human peripheral blood cells
Interferons are key modulators of the immune system, and are central to the control of many diseases. The response of immune cells to stimuli in complex populations is the product of direct and indirect effects, and of homotypic and heterotypic cell interactions. Dissecting the global transcriptional profiles of immune cell populations may provide insights into this regulatory interplay. The host transcriptional response may also be useful in discriminating between disease states, and in understanding pathophysiology. The transcriptional programs of cell populations in health therefore provide a paradigm for deconvoluting disease-associated gene expression profiles.We used human cDNA microarrays to (1) compare the gene expression programs in human peripheral blood mononuclear cells (PBMCs) elicited by 6 major mediators of the immune response: interferons alpha, beta, omega and gamma, IL12 and TNFalpha; and (2) characterize the transcriptional responses of purified immune cell populations (CD4+ and CD8+ T cells, B cells, NK cells and monocytes) to IFNgamma stimulation. We defined a highly stereotyped response to type I interferons, while responses to IFNgamma and IL12 were largely restricted to a subset of type I interferon-inducible genes. TNFalpha stimulation resulted in a distinct pattern of gene expression. Cell type-specific transcriptional programs were identified, highlighting the pronounced response of monocytes to IFNgamma, and emergent properties associated with IFN-mediated activation of mixed cell populations. This information provides a detailed view of cellular activation by immune mediators, and contributes an interpretive framework for the definition of host immune responses in a variety of disease settings
Bayesian correlated clustering to integrate multiple datasets
Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct – but often complementary – information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets.
Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDI’s performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques – as well as to non-integrative approaches – demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods
Topological Analysis of Metabolic Networks Integrating Co-Segregating Transcriptomes and Metabolomes in Type 2 Diabetic Rat Congenic Series
Background: The genetic regulation of metabolic phenotypes (i.e., metabotypes) in type 2 diabetes mellitus is caused by complex organ-specific cellular mechanisms contributing to impaired insulin secretion and insulin resistance. Methods: We used systematic metabotyping by 1H NMR spectroscopy and genome-wide gene expression in white adipose tissue to map molecular phenotypes to genomic blocks associated with obesity and insulin secretion in a series of rat congenic strains derived from spontaneously diabetic Goto-Kakizaki (GK) and normoglycemic Brown-Norway (BN) rats. We implemented a network biology strategy approach to visualise shortest paths between metabolites and genes significantly associated with each genomic block. Results: Despite strong genomic similarities (95-99%) among congenics, each strain exhibited specific patterns of gene expression and metabotypes, reflecting metabolic consequences of series of linked genetic polymorphisms in the congenic intervals. We subsequently used the congenic panel to map quantitative trait loci underlying specific metabotypes (mQTL) and genome-wide expression traits (eQTL). Variation in key metabolites like glucose, succinate, lactate or 3-hydroxybutyrate, and second messenger precursors like inositol was associated with several independent genomic intervals, indicating functional redundancy in these regions. To navigate through the complexity of these association networks we mapped candidate genes and metabolites onto metabolic pathways and implemented a shortest path strategy to highlight potential mechanistic links between metabolites and transcripts at colocalized mQTLs and eQTLs. Minimizing shortest path length drove prioritization of biological validations by gene silencing. Conclusions: These results underline the importance of network-based integration of multilevel systems genetics datasets to improve understanding of the genetic architecture of metabotype and transcriptomic regulations and to characterize novel functional roles for genes determining tissue-specific metabolism
- …