Search CORE

275,484 research outputs found

Reconstructing gene-regulatory networks from time series, knock-out data, and prior knowledge

Author: Fleck Christian
Geier Florian
Timmer Jens
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Cellular processes are controlled by gene-regulatory networks. Several computational methods are currently used to learn the structure of gene-regulatory networks from data. This study focusses on time series gene expression and gene knock-out data in order to identify the underlying network structure. We compare the performance of different network reconstruction methods using synthetic data generated from an ensemble of reference networks. Data requirements as well as optimal experiments for the reconstruction of gene-regulatory networks are investigated. Additionally, the impact of prior knowledge on network reconstruction as well as the effect of unobserved cellular processes is studied. RESULTS: We identify linear Gaussian dynamic Bayesian networks and variable selection based on F-statistics as suitable methods for the reconstruction of gene-regulatory networks from time series data. Commonly used discrete dynamic Bayesian networks perform inferior and this result can be attributed to the inevitable information loss by discretization of expression data. It is shown that short time series generated under transcription factor knock-out are optimal experiments in order to reveal the structure of gene regulatory networks. Relative to the level of observational noise, we give estimates for the required amount of gene expression data in order to accurately reconstruct gene-regulatory networks. The benefit of using of prior knowledge within a Bayesian learning framework is found to be limited to conditions of small gene expression data size. Unobserved processes, like protein-protein interactions, induce dependencies between gene expression levels similar to direct transcriptional regulation. We show that these dependencies cannot be distinguished from transcription factor mediated gene regulation on the basis of gene expression data alone. CONCLUSION: Currently available data size and data quality make the reconstruction of gene networks from gene expression data a challenge. In this study, we identify an optimal type of experiment, requirements on the gene expression data quality and size as well as appropriate reconstruction methods in order to reverse engineer gene regulatory networks from time series data

Springer - Publisher Connector

PubMed Central

GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

Author: Gordon SV
Hernández B
MacHugh DE
Magee DA
McGettigan PA
Nalpas NC
Parnell AC
Rue-Albrecht K
Publication venue: BioMed Central
Publication date: 25/02/2016
Field of study

Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.Department of Agriculture, Food and the MarineEuropean Commission - Seventh Framework Programme (FP7)Science Foundation IrelandUniversity College Dubli

Research Repository UCD

ZENODO

Springer - Publisher Connector

Irish Universities

PubMed Central

Spiral - Imperial College Digital Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Feature selection and modelling methods for microarray data from acute coronary syndrome

Author: Alecu Adrian
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 19/01/2015
Field of study

Acute coronary syndrome (ACS) represents a leading cause of mortality and morbidity worldwide. Providing better diagnostic solutions and developing therapeutic strategies customized to the individual patient represent societal and economical urgencies. Progressive improvement in diagnosis and treatment procedures require a thorough understanding of the underlying genetic mechanisms of the disease. Recent advances in microarray technologies together with the decreasing costs of the specialized equipment enabled affordable harvesting of time-course gene expression data. The high-dimensional data generated demands for computational tools able to extract the underlying biological knowledge. This thesis is concerned with developing new methods for analysing time-course gene expression data, focused on identifying differentially expressed genes, deconvolving heterogeneous gene expression measurements and inferring dynamic gene regulatory interactions. The main contributions include: a novel multi-stage feature selection method, a new deconvolution approach for estimating cell-type specific signatures and quantifying the contribution of each cell type to the variance of the gene expression patters, a novel approach to identify the cellular sources of differential gene expression, a new approach to model gene expression dynamics using sums of exponentials and a novel method to estimate stable linear dynamical systems from noisy and unequally spaced time series data. The performance of the proposed methods was demonstrated on a time-course dataset consisting of microarray gene expression levels collected from the blood samples of patients with ACS and associated blood count measurements. The results of the feature selection study are of significant biological relevance. For the first time is was reported high diagnostic performance of the ACS subtypes up to three months after hospital admission. The deconvolution study exposed features of within and between groups variation in expression measurements and identified potential cell type markers and cellular sources of differential gene expression. It was shown that the dynamics of post-admission gene expression data can be accurately modelled using sums of exponentials, suggesting that gene expression levels undergo a transient response to the ACS events before returning to equilibrium. The linear dynamical models capturing the gene regulatory interactions exhibit high predictive performance and can serve as platforms for system-level analysis, numerical simulations and intervention studies

White Rose E-theses Online

Embedding mRNA Stability in Correlation Analysis of Time-Series Gene Expression Data

Author: A Arkin
A Brazma
Alberto De Santis
AP Gerber
BC Foat
D Sankoff
E Wit
F He
F Hermans
G Franklin
Giorgio Morelli
H Ge
H Yu
HJ McBride
I Simon
Ida Ruberti
J Aach
J Bahler
J Gertien
J Grigull
J Qian
J Ross
JJ Smith
JL Hargrove
K Basso
KD MacIsaac
L Desmyter
L Farina
L Kuai
LL Breeden
LL Breeden
Lorenzo Farina
M Kato
Manuel Ares
MB Eisen
N Banerjee
N Simonis
NL Garneau
P Brazhnik
P Tamayo
PT Spellman
Q Guan
R Jansen
R Milo
RJ Cho
S Balaji
S Piatti
S Tavazoie
Samanta Salvucci
T Ideker
T Pramila
TG Wolfsberg
TJ Wade
TS Gardner
U de Lichtenberg
W Wang
WA Schmitt Jr
Y Pilpel
Y Wang
Z Zhu
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Current methods for the identification of putatively co-regulated genes directly from gene expression time profiles are based on the similarity of the time profile. Such association metrics, despite their central role in gene network inference and machine learning, have largely ignored the impact of dynamics or variation in mRNA stability. Here we introduce a simple, but powerful, new similarity metric called lead-lag R2 that successfully accounts for the properties of gene dynamics, including varying mRNA degradation and delays. Using yeast cell-cycle time-series gene expression data, we demonstrate that the predictive power of lead-lag R2 for the identification of co-regulated genes is significantly higher than that of standard similarity measures, thus allowing the selection of a large number of entirely new putatively co-regulated genes. Furthermore, the lead-lag metric can also be used to uncover the relationship between gene expression time-series and the dynamics of formation of multiple protein complexes. Remarkably, we found a high lead-lag R2 value among genes coding for a transient complex

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Archivio della ricerca- Università di Roma La Sapienza

Time-course analysis of genome-wide gene expression data from hormone-responsive human breast cancer cells

Author: Angelini Claudia
Cicatiello Luigi
Facchiano Angelo M
Ferraro Lorenzo
Grober Olì MV
Mutarelli Margherita
Ravo Maria
Weisz Alessandro
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Microarray experiments enable simultaneous measurement of the expression levels of virtually all transcripts present in cells, thereby providing a ‘molecular picture’ of the cell state. On the other hand, the genomic responses to a pharmacological or hormonal stimulus are dynamic molecular processes, where time influences gene activity and expression. The potential use of the statistical analysis of microarray data in time series has not been fully exploited so far, due to the fact that only few methods are available which take into proper account temporal relationships between samples. Results We compared here four different methods to analyze data derived from a time course mRNA expression profiling experiment which consisted in the study of the effects of estrogen on hormone-responsive human breast cancer cells. Gene expression was monitored with the innovative Illumina BeadArray platform, which includes an average of 30-40 replicates for each probe sequence randomly distributed on the chip surface. We present and discuss the results obtained by applying to these datasets different statistical methods for serial gene expression analysis. The influence of the normalization algorithm applied on data and of different parameter or threshold choices for the selection of differentially expressed transcripts has also been evaluated. In most cases, the selection was found fairly robust with respect to changes in parameters and type of normalization. We then identified which genes showed an expression profile significantly affected by the hormonal treatment over time. The final list of differentially expressed genes underwent cluster analysis of functional type, to identify groups of genes with similar regulation dynamics. Conclusions Several methods for processing time series gene expression data are presented, including evaluation of benefits and drawbacks of the different methods applied. The resulting protocol for data analysis was applied to characterization of the gene expression changes induced by estrogen in human breast cancer ZR-75.1 cells over an entire cell cycle.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Cagliari

Archivio della Ricerca - Università di Salerno

Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time

Author: Tian Suyan
Wang Chi
Publication venue: UKnowledge
Publication date: 19/03/2019
Field of study

With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene’s expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) were then optimized by either the coordinate descent method or the threshold gradient descent regularization method. By applying the proposed methods to simulated data and a traumatic injury dataset, we have demonstrated that the proposed methods, especially for the combination of sign average and threshold gradient descent regularization, outperform other competitive algorithms. To conclude, the proposed methods are highly recommended for studies with the objective of carrying out feature selection for longitudinal gene expression data

University of Kentucky

Automated annotation of gene expression image sequences via non-parametric factor analysis and conditional random fields

Author: Majoros W.H.
Ohler U.
Pruteanu-Malinici I.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 19/06/2013
Field of study

Motivation: Computational approaches for the annotation of phenotypes from image data have shown promising results across many applications, and provide rich and valuable information for studying gene function and interactions. While data are often available both at high spatial resolution and across multiple time points, phenotypes are frequently annotated independently, for individual time points only. In particular, for the analysis of developmental gene expression patterns, it is biologically sensible when images across multiple time points are jointly accounted for, such that spatial and temporal dependencies are captured simultaneously. Methods: We describe a discriminative undirected graphical model to label gene-expression time-series image data, with an efficient training and decoding method based on the junction tree algorithm. The approach is based on an effective feature selection technique, consisting of a non-parametric sparse Bayesian factor analysis model. The result is a flexible framework, which can handle large-scale data with noisy incomplete samples, i.e. it can tolerate data missing from individual time points. Results: Using the annotation of gene expression patterns across stages of Drosophila embryonic development as an example, we demonstrate that our method achieves superior accuracy, gained by jointly annotating phenotype sequences, when compared with previous models that annotate each stage in isolation. The experimental results on missing data indicate that our joint learning method successfully annotates genes for which no expression data are available for one or more stages

PubMed Central

MDC Repository

Functional Clustering of Periodic Transcriptional Profiles through ARMA(p,q)

Author: Berceli Scott A.
Berg Arthur
Li Ning
McMurry Timothy
Wang Zhong
Wu Rongling
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Background: Gene clustering of periodic transcriptional profiles provides an opportunity to shed light on a variety of biological processes, but this technique relies critically upon the robust modeling of longitudinal covariance structure over time. Methodology: We propose a statistical method for functional clustering of periodic gene expression by modeling the covariance matrix of serial measurements through a general autoregressive moving-average process of order (p,q), the socalled ARMA(p,q). We derive a sophisticated EM algorithm to estimate the proportions of each gene cluster, the Fourier series parameters that define gene-specific differences in periodic expression trajectories, and the ARMA parameters that model the covariance structure within a mixture model framework. The orders p and q of the ARMA process that provide the best fit are identified by model selection criteria. Conclusions: Through simulated data we show that whenever it is necessary, employment of sophisticated covariance structures such as ARMA is crucial in order to obtain unbiased estimates of the mean structure parameters and increased precision of estimation. The methods were implemented on recently published time-course gene expression data in yeast and the procedure was shown to effectively identify interesting periodic clusters in the dataset. The new approach wil

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Predicting symptom severity and contagiousness of respiratory viral infections

Author: Aguiar-Pulido Vanessa
Mathee Kalai
Mehta Arpit
Narasimhan Giri
Narasimhan Medhini
Rajabli Farid
Vietri Giuseppe
Publication venue: FIU Digital Commons
Publication date: 11/07/2016
Field of study

This work aims at predicting the symptom severity and contagiousness of a person infected with respiratory virus, using time series gene expression data. Four different respiratory viruses were studied – RSV, H1N1, H3N2 and Rhinovirus. Predictive models were built for each virus for each time point. Partial least squares discriminant analysis was used for feature selection and random forest was used for classification. Certain genes were identified as biomarkers in distinguishing the subjects. Gene enrichment analysis was performed on the differentially expressed genes. Prediction accuracy values were high even when expression data from early time points were analyzed. Significant genes were detected as early as 5 and 10 hours post infection, as compared to prior work that did so at 29 hours post infection. The potential biomarkers obtained with the proposed approach need to be investigated further

DigitalCommons@Florida International University

Microarray Data Mining and Gene Regulatory Network Analysis

Author: Li Ying
Publication venue: The Aquila Digital Community
Publication date: 01/05/2011
Field of study

The novel molecular biological technology, microarray, makes it feasible to obtain quantitative measurements of expression of thousands of genes present in a biological sample simultaneously. Genome-wide expression data generated from this technology are promising to uncover the implicit, previously unknown biological knowledge. In this study, several problems about microarray data mining techniques were investigated, including feature(gene) selection, classifier genes identification, generation of reference genetic interaction network for non-model organisms and gene regulatory network reconstruction using time-series gene expression data. The limitations of most of the existing computational models employed to infer gene regulatory network lie in that they either suffer from low accuracy or computational complexity. To overcome such limitations, the following strategies were proposed to integrate bioinformatics data mining techniques with existing GRN inference algorithms, which enables the discovery of novel biological knowledge. An integrated statistical and machine learning (ISML) pipeline was developed for feature selection and classifier genes identification to solve the challenges of the curse of dimensionality problem as well as the huge search space. Using the selected classifier genes as seeds, a scale-up technique is applied to search through major databases of genetic interaction networks, metabolic pathways, etc. By curating relevant genes and blasting genomic sequences of non-model organisms against well-studied genetic model organisms, a reference gene regulatory network for less-studied organisms was built and used both as prior knowledge and model validation for GRN reconstructions. Networks of gene interactions were inferred using a Dynamic Bayesian Network (DBN) approach and were analyzed for elucidating the dynamics caused by perturbations. Our proposed pipelines were applied to investigate molecular mechanisms for chemical-induced reversible neurotoxicity

Aquila Digital Community