Search CORE

10,551 research outputs found

The KM-Algorithm Identifies Regulated Genes in Time Series Expression Data

Author: Bremer Martina
Doerge R. W.
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2009
Field of study

We present a statistical method to rank observed genes in gene expression time series experiments according to their degree of regulation in a biological process. The ranking may be used to focus on specific genes or to select meaningful subsets of genes from which gene regulatory networks can be built. Our approach is based on a state space model that incorporates hidden regulators of gene expression. Kalman (K) smoothing and maximum (M) likelihood estimation techniques are used to derive optimal estimates of the model parameters upon which a proposed regulation criterion is based. The statistical power of the proposed algorithm is investigated, and a real data set is analyzed for the purpose of identifying regulated genes in time dependent gene expression data. This statistical approach supports the concept that meaningful biological conclusions can be drawn from gene expression time series experiments by focusing on strong regulation rather than large expression values

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

Unbiased Boolean analysis of public gene expression data for cell cycle gene identification.

Author: Dabydeen Sarah A
Desai Arshad
Sahoo Debashis
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

Cell proliferation is essential for the development and maintenance of all organisms and is dysregulated in cancer. Using synchronized cells progressing through the cell cycle, pioneering microarray studies defined cell cycle genes based on cyclic variation in their expression. However, the concordance of the small number of synchronized cell studies has been limited, leading to discrepancies in definition of the transcriptionally regulated set of cell cycle genes within and between species. Here we present an informatics approach based on Boolean logic to identify cell cycle genes. This approach used the vast array of publicly available gene expression data sets to query similarity to CCNB1, which encodes the cyclin subunit of the Cdk1-cyclin B complex that triggers the G2-to-M transition. In addition to highlighting conservation of cell cycle genes across large evolutionary distances, this approach identified contexts where well-studied genes known to act during the cell cycle are expressed and potentially acting in nondivision contexts. An accessible web platform enables a detailed exploration of the cell cycle gene lists generated using the Boolean logic approach. The methods employed are straightforward to extend to processes other than the cell cycle

eScholarship - University of California

Network estimation in State Space Model with L1-regularization constraint

Author: Lotsi Anani
Wit Ernst
Publication venue
Publication date: 01/01/2013
Field of study

Biological networks have arisen as an attractive paradigm of genomic science ever since the introduction of large scale genomic technologies which carried the promise of elucidating the relationship in functional genomics. Microarray technologies coupled with appropriate mathematical or statistical models have made it possible to identify dynamic regulatory networks or to measure time course of the expression level of many genes simultaneously. However one of the few limitations fall on the high-dimensional nature of such data coupled with the fact that these gene expression data are known to include some hidden process. In that regards, we are concerned with deriving a method for inferring a sparse dynamic network in a high dimensional data setting. We assume that the observations are noisy measurements of gene expression in the form of mRNAs, whose dynamics can be described by some unknown or hidden process. We build an input-dependent linear state space model from these hidden states and demonstrate how an incorporated

L_{1}

regularization constraint in an Expectation-Maximization (EM) algorithm can be used to reverse engineer transcriptional networks from gene expression profiling data. This corresponds to estimating the model interaction parameters. The proposed method is illustrated on time-course microarray data obtained from a well established T-cell data. At the optimum tuning parameters we found genes TRAF5, JUND, CDK4, CASP4, CD69, and C3X1 to have higher number of inwards directed connections and FYB, CCNA2, AKT1 and CASP8 to be genes with higher number of outwards directed connections. We recommend these genes to be object for further investigation. Caspase 4 is also found to activate the expression of JunD which in turn represses the cell cycle regulator CDC2.Comment: arXiv admin note: substantial text overlap with arXiv:1308.359

arXiv.org e-Print Archive

CiteSeerX

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

Author: Gordon SV
Hernández B
MacHugh DE
Magee DA
McGettigan PA
Nalpas NC
Parnell AC
Rue-Albrecht K
Publication venue: BioMed Central
Publication date: 25/02/2016
Field of study

Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.Department of Agriculture, Food and the MarineEuropean Commission - Seventh Framework Programme (FP7)Science Foundation IrelandUniversity College Dubli

Research Repository UCD

ZENODO

Springer - Publisher Connector

Irish Universities

PubMed Central

Spiral - Imperial College Digital Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Recommended from our members

Boolean analysis identifies CD38 as a biomarker of aggressive localized prostate cancer.

Author: Auman Heidi
Brooks James D
Carroll Peter R
Fazli Ladan
Feng Ziding
Gleave Martin E
Hurtado-Coll Antonio
Leach Robin J
Lin Daniel W
McKenney Jesse K
Nelson Peter S
Sahoo Debashis
Simko Jeff
Thompson Ian M
Troyer Dean A
True Lawrence D
Wei Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

The introduction of serum Prostate Specific Antigen (PSA) testing nearly 30 years ago has been associated with a significant shift towards localized disease and decreased deaths due to prostate cancer. Recognition that PSA testing has caused over diagnosis and over treatment of prostate cancer has generated considerable controversy over its value, and has spurred efforts to identify prognostic biomarkers to distinguish patients who need treatment from those that can be observed. Recent studies show that cancer is heterogeneous and forms a hierarchy of tumor cell populations. We developed a method of identifying prostate cancer differentiation states related to androgen signaling using Boolean logic. Using gene expression data, we identified two markers, CD38 and ARG2, that group prostate cancer into three differentiation states. Cancers with CD38-, ARG2- expression patterns, corresponding to an undifferentiated state, had significantly lower 10-year recurrence-free survival compared to the most differentiated group (CD38+ARG2+). We carried out immunohistochemical (IHC) staining for these two markers in a single institution (Stanford; n = 234) and multi-institution (Canary; n = 1326) cohorts. IHC staining for CD38 and ARG2 in the Stanford cohort demonstrated that combined expression of CD38 and ARG2 was prognostic. In the Canary cohort, low CD38 protein expression by IHC was significantly associated with recurrence-free survival (RFS), seminal vesicle invasion (SVI), extra-capsular extension (ECE) in univariable analysis. In multivariable analysis, ARG2 and CD38 IHC staining results were not independently associated with RFS, overall survival, or disease-specific survival after adjusting for other factors including SVI, ECE, Gleason score, pre-operative PSA, and surgical margins

eScholarship - University of California

Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data

Author: A Beyer
AL Barabási
BH Good
BW Kernighan
CO Daub
D Duewer
D Marbach
DFT Veiga
E Bonnet
E Ravasz
E Segal
EH Davidson
F Luo
G Balázsi
G Getz
G Palla
G Palla
H Zare
HW Ma
J Chen
J Duch
J Hubble
J Lemke
J Reichardt
JJ Faith
JJ Faith
JN Weinstein
K Baggerly
Kevin E. Bassler
KY Yeung
M Blatt
M Riley
MB Eisen
MEJ Newman
MEJ Newman
MF Traxler
MM Barker
N Friedman
N Friedman
O Alter
PD Karp
Q Lu
R Guimerà
RA Irizarry
S Fortunato
S Fortunato
S Gama-Castro
S Raychaudhuri
S Tavazoie
Santiago Treviño
Satoru Miyano
SB Seidman
SB Seidman
SP Borgatii
SP Borgatii
TF Cooper
Tim F. Cooper
TS Gardner
U Brandes
UN Raghavan
X Wen
Y Benjamini
Y Sun
Yudong Sun
Z Shi
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 11/01/2012
Field of study

Determining the functional structure of biological networks is a central goal of systems biology. One approach is to analyze gene expression data to infer a network of gene interactions on the basis of their correlated responses to environmental and genetic perturbations. The inferred network can then be analyzed to identify functional communities. However, commonly used algorithms can yield unreliable results due to experimental noise, algorithmic stochasticity, and the influence of arbitrarily chosen parameter values. Furthermore, the results obtained typically provide only a simplistic view of the network partitioned into disjoint communities and provide no information of the relationship between communities. Here, we present methods to robustly detect coregulated and functionally enriched gene communities and demonstrate their application and validity for Escherichia coli gene expression data. Applying a recently developed community detection algorithm to the network of interactions identified with the context likelihood of relatedness (CLR) method, we show that a hierarchy of network communities can be identified. These communities significantly enrich for gene ontology (GO) terms, consistent with them representing biologically meaningful groups. Further, analysis of the most significantly enriched communities identified several candidate new regulatory interactions. The robustness of our methods is demonstrated by showing that a core set of functional communities is reliably found when artificial noise, modeling experimental noise, is added to the data. We find that noise mainly acts conservatively, increasing the relatedness required for a network link to be reliably assigned and decreasing the size of the core communities, rather than causing association of genes into new communities.Comment: Due to appear in PLoS Computational Biology. Supplementary Figure S1 was not uploaded but is available by contacting the author. 27 pages, 5 figures, 15 supplementary file

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Transcriptional landscape of neuronal and cancer stem cells

Author: Miele Evelina
Publication venue
Publication date: 22/02/2013
Field of study

Tumor mass is composed by heterogeneous cell population including a subset of “cancer stem cells” (CSC). Oncogenic signals foster CSC by transforming tissue stem cells or by reprogramming progenitor/differentiated cells towards stemness. Thus, CSC share features with cancer and stem cells (e.g. self-renewal, hierarchical developmental program leading to differentiated cells, epithelial/mesenchimal transition) and these latter are maintained by the constitutive activation of stemness-promoting signals. CSC could trigger tumor formation, drive to resistance to conventional therapeutics and underlie patients’ relapse. Indeed, stem cell signatures have been associated with poor prognosis in various. This background makes the identification of CSC molecular features mandatory to highlight the survival inner working and to design novel CSC specific therapeutic strategies. Medulloblastoma (MB) is the most common childhood malignant brain tumor and a leading cause of cancerrelated morbidity and mortality. Current multimodal therapies are effective in about 50% of patients but often cause long-term side effects, i.e. developmental, neurological, neuroendocrine and psychosocial deficits (Northcott PA Nature Rev cancer 2012). For many years, MB treated as a single tumor entity despite the divergent tumor histology, patients’ outcome and drug sensitivity, and also by the diversity of the stem cell of origin. Very recently the scenario of human MB has dramatically changed since its heterogeneous biology has been addressed by high-throughput gene expression analysis (oligonucleotide microarrays) or by the powerful genomic next-generation sequencing. These led to the identification of four tumor subgroups (WNT, SHH, Group 3 and Group 4) uncovering the existence of a highly diverse mutational spectra and gene expression. However a quantitative approach has not yet been applied to the transcriptional landscape of Medulloblastoma stem cells (MbSC) through RNA Next Generation Sequencing (RNA-Seq) technology. This is a relevant issue, since RNA-Seq is able to interrogate the genome wide global transcriptome including new transcripts, alternative spliced isoforms and non-coding RNAs. Lower rhombic lip progenitors of the dorsal brainstem are considered the trigger cells in WNT tumors; in SHH subgroup initiation cells are Prominin1+ CD15+ stem cells from the subventricular zone requiring the commitment to Math1+ granule cell progenitors [GCP] of the external granule cell layer [EGL]; while Math1+ or Math1- EGL-GCP or Prominin1+/lineage-negative stem cells sustain the MYC driven Group 3. MbSC derived from SHH tumors and postnatal normal cerebellar stem cells (NcSC) have been reported to share several features. A key signal for both of them is Hedgehog. Furthermore, both NcSC and MbSC display up-regulation of stemness genes (e.g Sox2, Nestin, Nanog, Prom1). Finally, constitutive activation of the Shh pathway by conditional deletion of Ptch1 inhibitory receptor in NcSC, promote medulloblastoma in vivo, producing a mouse model of the human SHH tumor. Acquisition of stemness features may therefore represent the first step of oncogenic conversion. Cooperation with additional oncogenic signals is however needed to enhance MbSC tumorigenicity. In order to understand the MbSCs transcriptional programs, we analyze by RNA-Seq, MbSC derived from Ptch1+/- tumors (Ptch1+/- MbSC). This choice, of a genetically determined model of MB, has allowed us to work with Ptch1+/- MbSC together with appropriate NcSC counterpart, and to analyze biological replicates doing statistical analysis. We identify a number of transcripts, annotated ones, novel isoforms, and long non-coding RNAs, characterizing MbSC and/or NcSC. Some of these genes control stemness or are cancer related and conserved in human medulloblastomas. Interestingly a subset of them, belonging to cell stress response, are of prognostic relevance being significantly related to clinical outcome. Correlation of genes expression characterizing MbSC with survival information from our human medulloblastomas database further demonstrates the significance of these findings. Our data suggest that the modulation of normal and cancer stem cell functions observed in vitro is effective in dissecting the transcriptional programs underlying the in vivo behavior of human medulloblastomas

Pubblicazioni Aperte Digitali Interateneo Sapienza

Archivio della ricerca- Università di Roma La Sapienza

Dissecting interferon-induced transcriptional programs in human peripheral blood cells

Author: David A. Relman
Derya Unutmaz
Kathleen H. Rubins
Michael J. Griffiths
Michael Levin
Patrick O. Brown
Simon J. Waddell
Stephen J. Popper
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/03/2010
Field of study

Interferons are key modulators of the immune system, and are central to the control of many diseases. The response of immune cells to stimuli in complex populations is the product of direct and indirect effects, and of homotypic and heterotypic cell interactions. Dissecting the global transcriptional profiles of immune cell populations may provide insights into this regulatory interplay. The host transcriptional response may also be useful in discriminating between disease states, and in understanding pathophysiology. The transcriptional programs of cell populations in health therefore provide a paradigm for deconvoluting disease-associated gene expression profiles.We used human cDNA microarrays to (1) compare the gene expression programs in human peripheral blood mononuclear cells (PBMCs) elicited by 6 major mediators of the immune response: interferons alpha, beta, omega and gamma, IL12 and TNFalpha; and (2) characterize the transcriptional responses of purified immune cell populations (CD4+ and CD8+ T cells, B cells, NK cells and monocytes) to IFNgamma stimulation. We defined a highly stereotyped response to type I interferons, while responses to IFNgamma and IL12 were largely restricted to a subset of type I interferon-inducible genes. TNFalpha stimulation resulted in a distinct pattern of gene expression. Cell type-specific transcriptional programs were identified, highlighting the pronounced response of monocytes to IFNgamma, and emergent properties associated with IFN-mediated activation of mixed cell populations. This information provides a detailed view of cellular activation by immune mediators, and contributes an interpretive framework for the definition of host immune responses in a variety of disease settings

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Sussex Research Online

Bayesian correlated clustering to integrate multiple datasets

Author: Balasubramanian
Barash
Brock
Carlson
Cheng
Cherry
Cho
Cooke
Datta
David L. Wild
Dempster
Friedman
Fritsch
Granovskaia
Green
Harbison
Hubert
Huttenhower
Ideker
Ishwaran
Jackson
Jackson
Jansen
Jim E. Griffin
Kirk
Lee
Liu
Liu
Lockhart
Mistry
Myers
Myers
Neal
Neal
Nieto-Barajas
Paul Kirk
Puig
Rand
Rasmussen
Rasmussen
Reiss
Rhodes
Richard S. Savage
Rigaut
Rogers
Rogers
Rousseau
Santisteban
Savage
Schena
Shen
Solomon
Stark
Suchard
Troyanskaya
Wei
Wong
Yeung
Yuan
Zoubin Ghahramani
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct – but often complementary – information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets. Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDI’s performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques – as well as to non-integrative approaches – demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods

CiteSeerX

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Kent Academic Repository

Topological Analysis of Metabolic Networks Integrating Co-Segregating Transcriptomes and Metabolomes in Type 2 Diabetic Rat Congenic Series

Author: Argoud K
Ayala R
Calderari S
Cazier JB
Collins S
Domange C
Dumas ME
Gauguier D
Gu Q
Holmes E
Hue C
Lindon JC
Mitchell S
Navratil V
Nicholson JK
Otto GW
Rodriguez Martinez A
Suárez-Zamorano N
Wallis R
Wang Y
Wilder S
Publication venue: BioMed Central
Publication date: 01/01/2016
Field of study

Background: The genetic regulation of metabolic phenotypes (i.e., metabotypes) in type 2 diabetes mellitus is caused by complex organ-specific cellular mechanisms contributing to impaired insulin secretion and insulin resistance. Methods: We used systematic metabotyping by 1H NMR spectroscopy and genome-wide gene expression in white adipose tissue to map molecular phenotypes to genomic blocks associated with obesity and insulin secretion in a series of rat congenic strains derived from spontaneously diabetic Goto-Kakizaki (GK) and normoglycemic Brown-Norway (BN) rats. We implemented a network biology strategy approach to visualise shortest paths between metabolites and genes significantly associated with each genomic block. Results: Despite strong genomic similarities (95-99%) among congenics, each strain exhibited specific patterns of gene expression and metabotypes, reflecting metabolic consequences of series of linked genetic polymorphisms in the congenic intervals. We subsequently used the congenic panel to map quantitative trait loci underlying specific metabotypes (mQTL) and genome-wide expression traits (eQTL). Variation in key metabolites like glucose, succinate, lactate or 3-hydroxybutyrate, and second messenger precursors like inositol was associated with several independent genomic intervals, indicating functional redundancy in these regions. To navigate through the complexity of these association networks we mapped candidate genes and metabolites onto metabolic pathways and implemented a shortest path strategy to highlight potential mechanistic links between metabolites and transcripts at colocalized mQTLs and eQTLs. Minimizing shortest path length drove prioritization of biological validations by gene silencing. Conclusions: These results underline the importance of network-based integration of multilevel systems genetics datasets to improve understanding of the genetic architecture of metabotype and transcriptomic regulations and to characterize novel functional roles for genes determining tissue-specific metabolism

ZENODO

HAL Descartes

Spiral - Imperial College Digital Repository

Springer - Publisher Connector

HAL-Inserm

UCL Discovery

PubMed Central