377 research outputs found

    Recursive regularization for inferring gene networks from time-course gene expression profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Inferring gene networks from time-course microarray experiments with vector autoregressive (VAR) model is the process of identifying functional associations between genes through multivariate time series. This problem can be cast as a variable selection problem in Statistics. One of the promising methods for variable selection is the elastic net proposed by Zou and Hastie (2005). However, VAR modeling with the elastic net succeeds in increasing the number of true positives while it also results in increasing the number of false positives.</p> <p>Results</p> <p>By incorporating relative importance of the VAR coefficients into the elastic net, we propose a new class of regularization, called recursive elastic net, to increase the capability of the elastic net and estimate gene networks based on the VAR model. The recursive elastic net can reduce the number of false positives gradually by updating the importance. Numerical simulations and comparisons demonstrate that the proposed method succeeds in reducing the number of false positives drastically while keeping the high number of true positives in the network inference and achieves two or more times higher true discovery rate (the proportion of true positives among the selected edges) than the competing methods even when the number of time points is small. We also compared our method with various reverse-engineering algorithms on experimental data of MCF-7 breast cancer cells stimulated with two ErbB ligands, EGF and HRG.</p> <p>Conclusion</p> <p>The recursive elastic net is a powerful tool for inferring gene networks from time-course gene expression profiles.</p

    Weighted-Lasso for Structured Network Inference from Time Course Data

    Full text link
    We present a weighted-Lasso method to infer the parameters of a first-order vector auto-regressive model that describes time course expression data generated by directed gene-to-gene regulation networks. These networks are assumed to own a prior internal structure of connectivity which drives the inference method. This prior structure can be either derived from prior biological knowledge or inferred by the method itself. We illustrate the performance of this structure-based penalization both on synthetic data and on two canonical regulatory networks, first yeast cell cycle regulation network by analyzing Spellman et al's dataset and second E. coli S.O.S. DNA repair network by analysing U. Alon's lab data

    Modeling and identification of gene regulatory networks: A Granger causality approach

    Get PDF
    It is of increasing interest in systems biology to discover gene regulatory networks (GRNs) from time-series genomic data, i.e., to explore the interactions among a large number of genes and gene products over time. Currently, one common approach is based on Granger causality, which models the time-series genomic data as a vector autoregressive (VAR) process and estimates the GRNs from the VAR coefficient matrix. The main challenge for identification of VAR models is the high dimensionality of genes and limited number of time points, which results in statistically inefficient solution and high computational complexity. Therefore, fast and efficient variable selection techniques are highly desirable. In this paper, an introductory review of identification methods and variable selection techniques for VAR models in learning the GRNs will be presented. Furthermore, a dynamic VAR (DVAR) model, which accounts for dynamic GRNs changing with time during the experimental cycle, and its identification methods are introduced. © 2010 IEEE.published_or_final_versionThe 9th International Conference on Machine Learning and Cybernetics (ICMLC 2010), Qingdao, China, 11-14 July 2010. In Proceedings of the 9th ICMLC, 2010, v. 6, p. 3073-307

    Combining Bayesian Approaches and Evolutionary Techniques for the Inference of Breast Cancer Networks

    Get PDF
    Gene and protein networks are very important to model complex large-scale systems in molecular biology. Inferring or reverseengineering such networks can be defined as the process of identifying gene/protein interactions from experimental data through computational analysis. However, this task is typically complicated by the enormously large scale of the unknowns in a rather small sample size. Furthermore, when the goal is to study causal relationships within the network, tools capable of overcoming the limitations of correlation networks are required. In this work, we make use of Bayesian Graphical Models to attach this problem and, specifically, we perform a comparative study of different state-of-the-art heuristics, analyzing their performance in inferring the structure of the Bayesian Network from breast cancer data

    Feature selection and modelling methods for microarray data from acute coronary syndrome

    Get PDF
    Acute coronary syndrome (ACS) represents a leading cause of mortality and morbidity worldwide. Providing better diagnostic solutions and developing therapeutic strategies customized to the individual patient represent societal and economical urgencies. Progressive improvement in diagnosis and treatment procedures require a thorough understanding of the underlying genetic mechanisms of the disease. Recent advances in microarray technologies together with the decreasing costs of the specialized equipment enabled affordable harvesting of time-course gene expression data. The high-dimensional data generated demands for computational tools able to extract the underlying biological knowledge. This thesis is concerned with developing new methods for analysing time-course gene expression data, focused on identifying differentially expressed genes, deconvolving heterogeneous gene expression measurements and inferring dynamic gene regulatory interactions. The main contributions include: a novel multi-stage feature selection method, a new deconvolution approach for estimating cell-type specific signatures and quantifying the contribution of each cell type to the variance of the gene expression patters, a novel approach to identify the cellular sources of differential gene expression, a new approach to model gene expression dynamics using sums of exponentials and a novel method to estimate stable linear dynamical systems from noisy and unequally spaced time series data. The performance of the proposed methods was demonstrated on a time-course dataset consisting of microarray gene expression levels collected from the blood samples of patients with ACS and associated blood count measurements. The results of the feature selection study are of significant biological relevance. For the first time is was reported high diagnostic performance of the ACS subtypes up to three months after hospital admission. The deconvolution study exposed features of within and between groups variation in expression measurements and identified potential cell type markers and cellular sources of differential gene expression. It was shown that the dynamics of post-admission gene expression data can be accurately modelled using sums of exponentials, suggesting that gene expression levels undergo a transient response to the ACS events before returning to equilibrium. The linear dynamical models capturing the gene regulatory interactions exhibit high predictive performance and can serve as platforms for system-level analysis, numerical simulations and intervention studies

    Dynamic gene network reconstruction from gene expression data in mice after influenza A (H1N1) infection

    Get PDF
    Abstract Background The immune response to viral infection is a temporal process, represented by a dynamic and complex network of gene and protein interactions. Here, we present a reverse engineering strategy aimed at capturing the temporal evolution of the underlying Gene Regulatory Networks (GRN). The proposed approach will be an enabling step towards comprehending the dynamic behavior of gene regulation circuitry and mapping the network structure transitions in response to pathogen stimuli. Results We applied the Time Varying Dynamic Bayesian Network (TV-DBN) method for reconstructing the gene regulatory interactions based on time series gene expression data for the mouse C57BL/6J inbred strain after infection with influenza A H1N1 (PR8) virus. Initially, 3500 differentially expressed genes were clustered with the use of k-means algorithm. Next, the successive in time GRNs were built over the expression profiles of cluster centroids. Finally, the identified GRNs were examined with several topological metrics and available protein-protein and protein-DNA interaction data, transcription factor and KEGG pathway data. Conclusions Our results elucidate the potential of TV-DBN approach in providing valuable insights into the temporal rewiring of the lung transcriptome in response to H1N1 virus

    A Novel Network Profiling Analysis Reveals System Changes in Epithelial-Mesenchymal Transition

    Get PDF
    Patient-specific analysis of molecular networks is a promising strategy for making individual risk predictions and treatment decisions in cancer therapy. Although systems biology allows the gene network of a cell to be reconstructed from clinical gene expression data, traditional methods, such as Bayesian networks, only provide an averaged network for all samples. Therefore, these methods cannot reveal patient-specific differences in molecular networks during cancer progression. In this study, we developed a novel statistical method called NetworkProfiler, which infers patient-specific gene regulatory networks for a specific clinical characteristic, such as cancer progression, from gene expression data of cancer patients. We applied NetworkProfiler to microarray gene expression data from 762 cancer cell lines and extracted the system changes that were related to the epithelial-mesenchymal transition (EMT). Out of 1732 possible regulators of E-cadherin, a cell adhesion molecule that modulates the EMT, NetworkProfiler, identified 25 candidate regulators, of which about half have been experimentally verified in the literature. In addition, we used NetworkProfiler to predict EMT-dependent master regulators that enhanced cell adhesion, migration, invasion, and metastasis. In order to further evaluate the performance of NetworkProfiler, we selected Krueppel-like factor 5 (KLF5) from a list of the remaining candidate regulators of E-cadherin and conducted in vitro validation experiments. As a result, we found that knockdown of KLF5 by siRNA significantly decreased E-cadherin expression and induced morphological changes characteristic of EMT. In addition, in vitro experiments of a novel candidate EMT-related microRNA, miR-100, confirmed the involvement of miR-100 in several EMT-related aspects, which was consistent with the predictions obtained by NetworkProfiler

    Predictive network modeling of the high-resolution dynamic plant transcriptome in response to nitrate

    Get PDF
    International audienceABSTRACT: BACKGROUND: Nitrate, acting as both a nitrogen source and a signaling molecule, controls many aspects of plant development. However, gene networks involved in plant adaptation to fluctuating nitrate environments have not yet been identified. RESULTS: Here we use time-series transcriptome data to decipher gene relationships and consequently to build core regulatory networks involved in Arabidopsis root adaptation to nitrate provision. The experimental approach has been to monitor genome-wide responses to nitrate at 3, 6, 9, 12, 15 and 20 minutes, using Affymetrix ATH1 gene chips. This high-resolution time course analysis demonstrated that the previously known primary nitrate response is actually preceded by a very fast gene expression modulation, involving genes and functions needed to prepare plants to use or reduce nitrate. A state-space model inferred from this microarray time-series data successfully predicts gene behavior in unlearnt conditions. CONCLUSIONS: The experiments and methods allow us to propose a temporal working model for nitrate-driven gene networks. This network model is tested both in silico and experimentally. For example, the over-expression of a predicted gene hub encoding a transcription factor induced early in the cascade indeed leads to the modification of the kinetic nitrate response of sentinel genes such as NIR, NIA2, and NRT1.1, and several other transcription factors. The potential nitrate /hormone connections implicated by this time-series data is also evaluated

    Restricting Supervised Learning: Feature Selection and Feature Space Partition

    Get PDF
    Many supervised learning problems are considered difficult to solve either because of the redundant features or because of the structural complexity of the generative function. Redundant features increase the learning noise and therefore decrease the prediction performance. Additionally, a number of problems in various applications such as bioinformatics or image processing, whose data are sampled in a high dimensional space, suffer the curse of dimensionality, and there are not enough observations to obtain good estimates. Therefore, it is necessary to reduce such features under consideration. Another issue of supervised learning is caused by the complexity of an unknown generative model. To obtain a low variance predictor, linear or other simple functions are normally suggested, but they usually result in high bias. Hence, a possible solution is to partition the feature space into multiple non-overlapping regions such that each region is simple enough to be classified easily. In this dissertation, we proposed several novel techniques for restricting supervised learning problems with respect to either feature selection or feature space partition. Among different feature selection methods, 1-norm regularization is advocated by many researchers because it incorporates feature selection as part of the learning process. We give special focus here on ranking problems because very little work has been done for ranking using L1 penalty. We present here a 1-norm support vector machine method to simultaneously find a linear ranking function and to perform feature subset selection in ranking problems. Additionally, because ranking is formulated as a classification task when pair-wise data are considered, it increases the computational complexity from linear to quadratic in terms of sample size. We also propose a convex hull reduction method to reduce this impact. The method was tested on one artificial data set and two benchmark real data sets, concrete compressive strength set and Abalone data set. Theoretically, by tuning the trade-off parameter between the 1-norm penalty and the empirical error, any desired size of feature subset could be achieved, but computing the whole solution path in terms of the trade-off parameter is extremely difficult. Therefore, using 1-norm regularization alone may not end up with a feature subset of small size. We propose a recursive feature selection method based on 1-norm regularization which can handle the multi-class setting effectively and efficiently. The selection is performed iteratively. In each iteration, a linear multi-class classifier is trained using 1-norm regularization, which leads to sparse weight vectors, i.e., many feature weights are exactly zero. Those zero-weight features are eliminated in the next iteration. The selection process has a fast rate of convergence. We tested our method on an earthworm microarray data set and the empirical results demonstrate that the selected features (genes) have very competitive discriminative power. Feature space partition separates a complex learning problem into multiple non-overlapping simple sub-problems. It is normally implemented in a hierarchical fashion. Different from decision tree, a leaf node of this hierarchical structure does not represent a single decision, but represents a region (sub-problem) that is solvable with respect to linear functions or other simple functions. In our work, we incorporate domain knowledge in the feature space partition process. We consider domain information encoded by discrete or categorical attributes. A discrete or categorical attribute provides a natural partition of the problem domain, and hence divides the original problem into several non-overlapping sub-problems. In this sense, the domain information is useful if the partition simplifies the learning task. However it is not trivial to select the discrete or categorical attribute that maximally simplify the learning task. A naive approach exhaustively searches all the possible restructured problems. It is computationally prohibitive when the number of discrete or categorical attributes is large. We describe a metric to rank attributes according to their potential to reduce the uncertainty of a classification task. It is quantified as a conditional entropy achieved using a set of optimal classifiers, each of which is built for a sub-problem defined by the attribute under consideration. To avoid high computational cost, we approximate the solution by the expected minimum conditional entropy with respect to random projections. This approach was tested on three artificial data sets, three cheminformatics data sets, and two leukemia gene expression data sets. Empirical results demonstrate that our method is capable of selecting a proper discrete or categorical attribute to simplify the problem, i.e., the performance of the classifier built for the restructured problem always beats that of the original problem. Restricting supervised learning is always about building simple learning functions using a limited number of features. Top Selected Pair (TSP) method builds simple classifiers based on very few (for example, two) features with simple arithmetic calculation. However, traditional TSP method only deals with static data. In this dissertation, we propose classification methods for time series data that only depend on a few pairs of features. Based on the different comparison strategies, we developed the following approaches: TSP based on average, TSP based on trend, and TSP based on trend and absolute difference amount. In addition, inspired by the idea of using two features, we propose a time series classification method based on few feature pairs using dynamic time warping and nearest neighbor
    corecore