257 research outputs found

    Construction of Gene Regulatory Networks Using Recurrent Neural Networks and Swarm Intelligence

    Get PDF

    Discovering gene association networks by multi-objective evolutionary quantitative association rules

    Get PDF
    In the last decade, the interest in microarray technology has exponentially increased due to its ability to monitor the expression of thousands of genes simultaneously. The reconstruction of gene association networks from gene expression profiles is a relevant task and several statistical techniques have been proposed to build them. The problem lies in the process to discover which genes are more relevant and to identify the direct regulatory relationships among them. We developed a multi-objective evolutionary algorithm for mining quantitative association rules to deal with this problem. We applied our methodology named GarNet to a well-known microarray data of yeast cell cycle. The performance analysis of GarNet was organized in three steps similarly to the study performed by Gallo et al. GarNet outperformed the benchmark methods in most cases in terms of quality metrics of the networks, such as accuracy and precision, which were measured using YeastNet database as true network. Furthermore, the results were consistent with previous biological knowledge.Ministerio de Ciencia y Tecnología TIN2011-28956-C02-02Junta de Andalucía P11-TIC-752

    MICRAT: A Novel Algorithm for Inferring Gene Regulatory Networks Using Time Series Gene Expression Data

    Get PDF
    Background: Reconstruction of gene regulatory networks (GRNs), also known as reverse engineering of GRNs, aims to infer the potential regulation relationships between genes. With the development of biotechnology, such as gene chip microarray and RNA-sequencing, the high-throughput data generated provide us with more opportunities to infer the gene-gene interaction relationships using gene expression data and hence understand the underlying mechanism of biological processes. Gene regulatory networks are known to exhibit a multiplicity of interaction mechanisms which include functional and non-functional, and linear and non-linear relationships. Meanwhile, the regulatory interactions between genes and gene products are not spontaneous since various processes involved in producing fully functional and measurable concentrations of transcriptional factors/proteins lead to a delay in gene regulation. Many different approaches for reconstructing GRNs have been proposed, but the existing GRN inference approaches such as probabilistic Boolean networks and dynamic Bayesian networks have various limitations and relatively low accuracy. Inferring GRNs from time series microarray data or RNA-sequencing data remains a very challenging inverse problem due to its nonlinearity, high dimensionality, sparse and noisy data, and significant computational cost, which motivates us to develop more effective inference methods. Results: We developed a novel algorithm, MICRAT (Maximal Information coefficient with Conditional Relative Average entropy and Time-series mutual information), for inferring GRNs from time series gene expression data. Maximal information coefficient (MIC) is an effective measure of dependence for two-variable relationships. It captures a wide range of associations, both functional and non-functional, and thus has good performance on measuring the dependence between two genes. Our approach mainly includes two procedures. Firstly, it employs maximal information coefficient for constructing an undirected graph to represent the underlying relationships between genes. Secondly, it directs the edges in the undirected graph for inferring regulators and their targets. In this procedure, the conditional relative average entropies of each pair of nodes (or genes) are employed to indicate the directions of edges. Since the time delay might exist in the expression of regulators and target genes, time series mutual information is combined to cooperatively direct the edges for inferring the potential regulators and their targets. We evaluated the performance of MICRAT by applying it to synthetic datasets as well as real gene expression data and compare with other GRN inference methods. We inferred five 10-gene and five 100-gene networks from the DREAM4 challenge that were generated using the gene expression simulator GeneNetWeaver (GNW). MICRAT was also used to reconstruct GRNs on real gene expression data including part of the DNA-damaged response pathway (SOS DNA repair network) and experimental dataset in E. Coli. The results showed that MICRAT significantly improved the inference accuracy, compared to other inference methods, such as TDBN, etc. Conclusion: In this work, a novel algorithm, MICRAT, for inferring GRNs from time series gene expression data was proposed by taking into account dependence and time delay of expressions of a regulator and its target genes. This approach employed maximal information coefficients for reconstructing an undirected graph to represent the underlying relationships between genes. The edges were directed by combining conditional relative average entropy with time course mutual information of pairs of genes. The proposed algorithm was evaluated on the benchmark GRNs provided by the DREAM4 challenge and part of the real SOS DNA repair network in E. Coli. The experimental study showed that our approach was comparable to other methods on 10-gene datasets and outperformed other methods on 100-gene datasets in GRN inference from time series datasets

    Machine learning methods for omics data integration

    Get PDF
    High-throughput technologies produce genome-scale transcriptomic and metabolomic (omics) datasets that allow for the system-level studies of complex biological processes. The limitation lies in the small number of samples versus the larger number of features represented in these datasets. Machine learning methods can help integrate these large-scale omics datasets and identify key features from each dataset. A novel class dependent feature selection method integrates the F statistic, maximum relevance binary particle swarm optimization (MRBPSO), and class dependent multi-category classification (CDMC) system. A set of highly differentially expressed genes are pre-selected using the F statistic as a filter for each dataset. MRBPSO and CDMC function as a wrapper to select desirable feature subsets for each class and classify the samples using those chosen class-dependent feature subsets. The results indicate that the class-dependent approaches can effectively identify unique biomarkers for each cancer type and improve classification accuracy compared to class independent feature selection methods. The integration of transcriptomics and metabolomics data is based on a classification framework. Compared to principal component analysis and non-negative matrix factorization based integration approaches, our proposed method achieves 20-30% higher prediction accuracies on Arabidopsis tissue development data. Metabolite-predictive genes and gene-predictive metabolites are selected from transcriptomic and metabolomic data respectively. The constructed gene-metabolite correlation network can infer the functions of unknown genes and metabolites. Tissue-specific genes and metabolites are identified by the class-dependent feature selection method. Evidence from subcellular locations, gene ontology, and biochemical pathways support the involvement of these entities in different developmental stages and tissues in Arabidopsis

    Natural Computing and Beyond

    Get PDF
    This book contains the joint proceedings of the Winter School of Hakodate (WSH) 2011 held in Hakodate, Japan, March 15–16, 2011, and the 6th International Workshop on Natural Computing (6th IWNC) held in Tokyo, Japan, March 28–30, 2012, organized by the Special Interest Group of Natural Computing (SIG-NAC), the Japanese Society for Artificial Intelligence (JSAI). This volume compiles refereed contributions to various aspects of natural computing, ranging from computing with slime mold, artificial chemistry, eco-physics, and synthetic biology, to computational aesthetics

    Learning Bayesian network equivalence classes using ant colony optimisation

    Get PDF
    Bayesian networks have become an indispensable tool in the modelling of uncertain knowledge. Conceptually, they consist of two parts: a directed acyclic graph called the structure, and conditional probability distributions attached to each node known as the parameters. As a result of their expressiveness, understandability and rigorous mathematical basis, Bayesian networks have become one of the first methods investigated, when faced with an uncertain problem domain. However, a recurring problem persists in specifying a Bayesian network. Both the structure and parameters can be difficult for experts to conceive, especially if their knowledge is tacit.To counteract these problems, research has been ongoing, on learning both the structure and parameters of Bayesian networks from data. Whilst there are simple methods for learning the parameters, learning the structure has proved harder. Part ofthis stems from the NP-hardness of the problem and the super-exponential space of possible structures. To help solve this task, this thesis seeks to employ a relatively new technique, that has had much success in tackling NP-hard problems. This technique is called ant colony optimisation. Ant colony optimisation is a metaheuristic based on the behaviour of ants acting together in a colony. It uses the stochastic activity of artificial ants to find good solutions to combinatorial optimisation problems. In the current work, this method is applied to the problem of searching through the space of equivalence classes of Bayesian networks, in order to find a good match against a set of data. The system uses operators that evaluate potential modifications to a current state. Each of the modifications is scored and the results used to inform the search. In order to facilitate these steps, other techniques are also devised, to speed up the learning process. The techniques includeThe techniques are tested by sampling data from gold standard networks and learning structures from this sampled data. These structures are analysed using various goodnessof-fit measures to see how well the algorithms perform. The measures include structural similarity metrics and Bayesian scoring metrics. The results are compared in depth against systems that also use ant colony optimisation and other methods, including evolutionary programming and greedy heuristics. Also, comparisons are made to well known state-of-the-art algorithms and a study performed on a real-life data set. The results show favourable performance compared to the other methods and on modelling the real-life data

    Gene regulatory network modelling with evolutionary algorithms -an integrative approach

    Get PDF
    Building models for gene regulation has been an important aim of Systems Biology over the past years, driven by the large amount of gene expression data that has become available. Models represent regulatory interactions between genes and transcription factors and can provide better understanding of biological processes, and means of simulating both natural and perturbed systems (e.g. those associated with disease). Gene regulatory network (GRN) quantitative modelling is still limited, however, due to data issues such as noise and restricted length of time series, typically used for GRN reverse engineering. These issues create an under-determination problem, with many models possibly fitting the data. However, large amounts of other types of biological data and knowledge are available, such as cross-platform measurements, knockout experiments, annotations, binding site affinities for transcription factors and so on. It has been postulated that integration of these can improve model quality obtained, by facilitating further filtering of possible models. However, integration is not straightforward, as the different types of data can provide contradictory information, and are intrinsically noisy, hence large scale integration has not been fully explored, to date. Here, we present an integrative parallel framework for GRN modelling, which employs evolutionary computation and different types of data to enhance model inference. Integration is performed at different levels. (i) An analysis of cross-platform integration of time series microarray data, discussing the effects on the resulting models and exploring crossplatform normalisation techniques, is presented. This shows that time-course data integration is possible, and results in models more robust to noise and parameter perturbation, as well as reduced noise over-fitting. (ii) Other types of measurements and knowledge, such as knock-out experiments, annotated transcription factors, binding site affinities and promoter sequences are integrated within the evolutionary framework to obtain more plausible GRN models. This is performed by customising initialisation, mutation and evaluation of candidate model solutions. The different data types are investigated and both qualitative and quantitative improvements are obtained. Results suggest that caution is needed in order to obtain improved models from combined data, and the case study presented here provides an example of how this can be achieved. Furthermore, (iii), RNA-seq data is studied in comparison to microarray experiments, to identify overlapping features and possibilities of integration within the framework. The extension of the framework to this data type is straightforward and qualitative improvements are obtained when combining predicted interactions from single-channel and RNA-seq datasets

    Reverse engineering of genetic networks with time delayed recurrent neural networks and clustering techniques

    Get PDF
    In the iterative process of experimentally probing biological networks and computationally inferring models for the networks, fast, accurate and flexible computational frameworks are needed for modeling and reverse engineering biological networks. In this dissertation, I propose a novel model to simulate gene regulatory networks using a specific type of time delayed recurrent neural networks. Also, I introduce a parameter clustering method to select groups of parameter sets from the simulations representing biologically reasonable networks. Additionally, a general purpose adaptive function is used here to decrease and study the connectivity of small gene regulatory networks modules. In this dissertation, the performance of this novel model is shown to simulate the dynamics and to infer the topology of gene regulatory networks derived from synthetic and experimental time series gene expression data. Here, I assess the quality of the inferred networks by the use of graph edit distance measurements in comparison to the synthetic and experimental benchmarks. Additionally, I compare between edition costs of the inferred networks obtained with the time delay recurrent networks and other previously described reverse engineering methods based on continuous time recurrent neural and dynamic Bayesian networks. Furthermore, I address questions of network connectivity and correlation between data fitting and inference power by simulating common experimental limitations of the reverse engineering process as incomplete and highly noisy data. The novel specific type of time delay recurrent neural networks model in combination with parameter clustering substantially improves the inference power of reverse engineered networks. Additionally, some suggestions for future improvements are discussed, particularly under the data driven perspective as the solution for modeling complex biological systems

    Reverse Engineering of Biological Systems

    Get PDF
    Gene regulatory network (GRN) consists of a set of genes and regulatory relationships between the genes. As outputs of the GRN, gene expression data contain important information that can be used to reconstruct the GRN to a certain degree. However, the reverse engineer of GRNs from gene expression data is a challenging problem in systems biology. Conventional methods fail in inferring GRNs from gene expression data because of the relative less number of observations compared with the large number of the genes. The inherent noises in the data make the inference accuracy relatively low and the combinatorial explosion nature of the problem makes the inference task extremely difficult. This study aims at reconstructing the GRNs from time-course gene expression data based on GRN models using system identification and parameter estimation methods. The main content consists of three parts: (1) a review of the methods for reverse engineering of GRNs, (2) reverse engineering of GRNs based on linear models and (3) reverse engineering of GRNs based on a nonlinear model, specifically S-systems. In the first part, after the necessary background and challenges of the problem are introduced, various methods for the inference of GRNs are comprehensively reviewed from two aspects: models and inference algorithms. The advantages and disadvantages of each method are discussed. The second part focus on inferring GRNs from time-course gene expression data based on linear models. First, the statistical properties of two sparse penalties, adaptive LASSO and SCAD, with an autoregressive model are studied. It shows that the proposed methods using these two penalties can asymptotically reconstruct the underlying networks. This provides a solid foundation for these methods and their extensions. Second, the integration of multiple datasets should be able to improve the accuracy of the GRN inference. A novel method, Huber group LASSO, is developed to infer GRNs from multiple time-course data, which is also robust to large noises and outliers that the data may contain. An efficient algorithm is also developed and its convergence analysis is provided. The third part can be further divided into two phases: estimating the parameters of S-systems with system structure known and inferring the S-systems without knowing the system structure. Two methods, alternating weighted least squares (AWLS) and auxiliary function guided coordinate descent (AFGCD), have been developed to estimate the parameters of S-systems from time-course data. AWLS takes advantage of the special structure of S-systems and significantly outperforms one existing method, alternating regression (AR). AFGCD uses the auxiliary function and coordinate descent techniques to get the smart and efficient iteration formula and its convergence is theoretically guaranteed. Without knowing the system structure, taking advantage of the special structure of the S-system model, a novel method, pruning separable parameter estimation algorithm (PSPEA) is developed to locally infer the S-systems. PSPEA is then combined with continuous genetic algorithm (CGA) to form a hybrid algorithm which can globally reconstruct the S-systems
    corecore