378 research outputs found

    CONSTRUCTION AND ANALYSIS OF GENETIC REGULATORY NETWORKS WITH RNA-SEQ DATA FROM ARABIDOPSIS THALIANA

    Get PDF
    Reconstruction of gene regulatory networks (GRNs) is a fundamental aspect of genetic engineering and provides a deeper understanding of the biological processes of an organism. Two methods were implemented to reconstruct the gene regulatory networks of Arabidopsis thaliana under two treatments: methyl jasmonate (MeJa) and salicylic acid (SA). The Joint Reconstruction of multiple Gene Regulatory Networks (JRmGRN) method was utilized to construct a joint network for identifying hub genes common to both conditions in addition to networks specific to each condition. The Differential Network Analysis with False Discover Rate Control method constructed a network of connections unique to only one of the two conditions. Both methods produced biologically significant results and complemented each other. The two methods were tested and found to have efficacy to identify notable hub genes involved in MeJa/SA signaling pathways and downstream response

    Data Driven Techniques for Modeling Coupled Dynamics in Transient Processes

    Get PDF
    We study the problem of modeling coupled dynamics in transient processes that happen in a network. The problem is considered at two levels. At the node level, the coupling between underlying sub-processes of a node in a network is considered. At the network level, the direct influence among the nodes is considered. After the model is constructed, we develop a network-based approach for change detection in high dimension transient processes. The overall contribution of our work is a more accurate model to describe the underlying transient dynamics either for each individual node or for the whole network and a new statistic for change detection in multi-dimensional time series. Specifically, at the node level, we developed a model to represent the coupled dynamics between the two processes. We provide closed form formulas on the conditions for the existence of periodic trajectory and the stability of solutions. Numerical studies suggest that our model can capture the nonlinear characteristics of empirical data while reducing computation time by about 25% on average, compared to a benchmark modeling approach. In the last two problems, we provide a closed form formula for the bound in the sparse regression formulation, which helps to reduce the effort of trial and error to find an appropriate bound. Compared to other benchmark methods in inferring network structure from time series, our method reduces inference error by up to 5 orders of magnitudes and maintain better sparsity. We also develop a new method to infer dynamic network structure from a single time series. This method is the basis for introducing a new spectral graph statistic for change detection. This statistic can detect changes in simulation scenario with modified area under curve (mAUC) of 0.96. When applying to the problem of detecting seizure from EEG signal, our statistic can capture the physiology of the process while maintaining a detection rate of 40% by itself. Therefore, it can serve as an effective feature to detect change and can be added to the current set of features for detecting seizures from EEG signal

    Reverse Engineering of Biological Systems

    Get PDF
    Gene regulatory network (GRN) consists of a set of genes and regulatory relationships between the genes. As outputs of the GRN, gene expression data contain important information that can be used to reconstruct the GRN to a certain degree. However, the reverse engineer of GRNs from gene expression data is a challenging problem in systems biology. Conventional methods fail in inferring GRNs from gene expression data because of the relative less number of observations compared with the large number of the genes. The inherent noises in the data make the inference accuracy relatively low and the combinatorial explosion nature of the problem makes the inference task extremely difficult. This study aims at reconstructing the GRNs from time-course gene expression data based on GRN models using system identification and parameter estimation methods. The main content consists of three parts: (1) a review of the methods for reverse engineering of GRNs, (2) reverse engineering of GRNs based on linear models and (3) reverse engineering of GRNs based on a nonlinear model, specifically S-systems. In the first part, after the necessary background and challenges of the problem are introduced, various methods for the inference of GRNs are comprehensively reviewed from two aspects: models and inference algorithms. The advantages and disadvantages of each method are discussed. The second part focus on inferring GRNs from time-course gene expression data based on linear models. First, the statistical properties of two sparse penalties, adaptive LASSO and SCAD, with an autoregressive model are studied. It shows that the proposed methods using these two penalties can asymptotically reconstruct the underlying networks. This provides a solid foundation for these methods and their extensions. Second, the integration of multiple datasets should be able to improve the accuracy of the GRN inference. A novel method, Huber group LASSO, is developed to infer GRNs from multiple time-course data, which is also robust to large noises and outliers that the data may contain. An efficient algorithm is also developed and its convergence analysis is provided. The third part can be further divided into two phases: estimating the parameters of S-systems with system structure known and inferring the S-systems without knowing the system structure. Two methods, alternating weighted least squares (AWLS) and auxiliary function guided coordinate descent (AFGCD), have been developed to estimate the parameters of S-systems from time-course data. AWLS takes advantage of the special structure of S-systems and significantly outperforms one existing method, alternating regression (AR). AFGCD uses the auxiliary function and coordinate descent techniques to get the smart and efficient iteration formula and its convergence is theoretically guaranteed. Without knowing the system structure, taking advantage of the special structure of the S-system model, a novel method, pruning separable parameter estimation algorithm (PSPEA) is developed to locally infer the S-systems. PSPEA is then combined with continuous genetic algorithm (CGA) to form a hybrid algorithm which can globally reconstruct the S-systems

    Inferring Gene Regulatory Networks from Time Series Microarray Data

    Get PDF
    The innovations and improvements in high-throughput genomic technologies, such as DNA microarray, make it possible for biologists to simultaneously measure dependencies and regulations among genes on a genome-wide scale and provide us genetic information. An important objective of the functional genomics is to understand the controlling mechanism of the expression of these genes and encode the knowledge into gene regulatory network (GRN). To achieve this, computational and statistical algorithms are especially needed. Inference of GRN is a very challenging task for computational biologists because the degree of freedom of the parameters is redundant. Various computational approaches have been proposed for modeling gene regulatory networks, such as Boolean network, differential equations and Bayesian network. There is no so called golden method which can generally give us the best performance for any data set. The research goal is to improve inference accuracy and reduce computational complexity. One of the problems in reconstructing GRN is how to deal with the high dimensionality and short time course gene expression data. In this work, some existing inference algorithms are compared and the limitations lie in that they either suffer from low inference accuracy or computational complexity. To overcome such difficulties, a new approach based on state space model and Expectation-Maximization (EM) algorithms is proposed to model the dynamic system of gene regulation and infer gene regulatory networks. In our model, GRN is represented by a state space model that incorporates noises and has the ability to capture more various biological aspects, such as hidden or missing variables. An EM algorithm is used to estimate the parameters based on the given state space functions and the gene interaction matrix is derived by decomposing the observation matrix using singular value decomposition, and then it is used to infer GRN. The new model is validated using synthetic data sets before applying it to real biological data sets. The results reveal that the developed model can infer the gene regulatory networks from large scale gene expression data and significantly reduce the computational time complexity without losing much inference accuracy compared to dynamic Bayesian network

    Data Driven Techniques for Modeling Coupled Dynamics in Transient Processes

    Get PDF
    We study the problem of modeling coupled dynamics in transient processes that happen in a network. The problem is considered at two levels. At the node level, the coupling between underlying sub-processes of a node in a network is considered. At the network level, the direct influence among the nodes is considered. After the model is constructed, we develop a network-based approach for change detection in high dimension transient processes. The overall contribution of our work is a more accurate model to describe the underlying transient dynamics either for each individual node or for the whole network and a new statistic for change detection in multi-dimensional time series. Specifically, at the node level, we developed a model to represent the coupled dynamics between the two processes. We provide closed form formulas on the conditions for the existence of periodic trajectory and the stability of solutions. Numerical studies suggest that our model can capture the nonlinear characteristics of empirical data while reducing computation time by about 25% on average, compared to a benchmark modeling approach. In the last two problems, we provide a closed form formula for the bound in the sparse regression formulation, which helps to reduce the effort of trial and error to find an appropriate bound. Compared to other benchmark methods in inferring network structure from time series, our method reduces inference error by up to 5 orders of magnitudes and maintain better sparsity. We also develop a new method to infer dynamic network structure from a single time series. This method is the basis for introducing a new spectral graph statistic for change detection. This statistic can detect changes in simulation scenario with modified area under curve (mAUC) of 0.96. When applying to the problem of detecting seizure from EEG signal, our statistic can capture the physiology of the process while maintaining a detection rate of 40% by itself. Therefore, it can serve as an effective feature to detect change and can be added to the current set of features for detecting seizures from EEG signal

    Systems biology approaches to the computational modelling of trypanothione metabolism in Trypanosoma brucei

    Get PDF
    This work presents an advanced modelling procedure, which applies both structural modelling and kinetic modelling approaches to the trypanothione metabolic network in the bloodstream form of Trypanosoma brucei, the parasite responsible for African Sleeping sickness. Trypanothione has previously been identified as an essential compound for parasitic protozoa, however the underlying metabolic processes are poorly understood. Structural modelling allows the study of the network metabolism in the absence of sufficient quantitative information of target enzymes. Using this approach we examine the essential features associated with the control and regulation of intracellular trypanothione level. The first detailed kinetic model of the trypanothione metabolic network is developed, based on a critical review of the relevant scientific papers. Kinetic modelling of the network focuses on understanding the effect of anti-trypanosomal drug DFMO and examining other enzymes as potential targets for anti-trypanosomal chemotherapy. We also consider the inverse problem of parameter estimation when the system is defined with non-linear differential equations. The performance of a recently developed population-based PSwarm algorithm that has not yet been widely applied to biological problems is investigated and the problem of parameter estimation under conditions such as experimental noise and lack of information content is illustrated using the ERK signalling pathway. We propose a novel multi-objective optimization algorithm (MoPSwarm) for the validation of perturbation-based models of biological systems, and perform a comparative study to determine the factors crucial to the performance of the algorithm. By simultaneously taking several, possibly conflicting aspects into account, the problem of parameter estimation arising from non-informative experimental measurements can be successfully overcome. The reliability and efficiency of MoPSwarm is also tested using the ERK signalling pathway and demonstrated in model validation of the polyamine biosynthetic pathway of the trypanothione network. It is frequently a problem that models of biological systems are based on a relatively small amount of experimental information and that extensive in vivo observations are rarely available. To address this problem, we propose a new and generic methodological framework guided by the principles of Systems Biology. The proposed methodology integrates concepts from mathematical modelling and system identification to enable physical insights about the system to be accounted for in the modelling procedure. The framework takes advantage of module-based representation and employs PSwarm and our proposed multi-objective optimization algorithm as the core of this framework. The methodological framework is employed in the study of the trypanothione metabolic network, specifically, the validation of the model of the polyamine biosynthetic pathway. Good agreements with several existing data sets are obtained and new predictions about enzyme kinetics and regulatory mechanisms are generated, which could be tested by in vivo approaches

    Bioinformatic approaches to study the metabolic effect on Gene Regulation

    Get PDF
    La adaptación celular a ambientes dinámicos constituye un mecanismo esencial para la supervivencia celular. Las células responden a condiciones externas modulando los mecanismos moleculares que regulan expresión génica o la actividad proteica, confiriendo una respuesta rápida a cambios metabólicos externos. Por ello, los mecanismos celulares que captan los cambios metabólicos consistuyen un paso importante en adaptación celular, siendo la epigenética el mecanismo que une el metabolismo con la regulación génica. Las marcas epigenéticas confieren a la célula la capacidad de moldear la conformacion de la cromatina, lo que permite la regulación de la expresión génica. Por tanto, un correcto funcionamiento de la regulación epigenética de la célula, es crucial para la adaptación celular a ambientes con cambios metabólicos. Los moduladores epigenéticos dependen de la disponibilidad meta\-bólica para poder modificar la epigenética de la célula. Estudios recientes han señalado que la acumulación de ciertos metabolitos es clave para que moduladores epigenéticos actúen sobre las marcas de la cromatina. Un ejemplo claro se ve en los ritmos circadianos, donde los mecanismos epigenéticos median la relación que existe entre las oscilaciones metabólicas y los cambios en expresión génica; la falta de mecanismos epigenéticos desconecta estos relojes moleculares, provocando enfermedades como en el caso del síndrome metabólico. El estudio del control metabólico del epigenoma y el transcriptoma es un área de conocimiento emergente. Muchos estudios han generado información a través de las tecnologías de alto rendimiento, que miden la expresión génica, los metabolitos o las modificaciones de histonas entre otros tipos de moleculas para medir esta conexión, y aunque se ha desarrollado mucha literatura al respecto, los mecanismos que ejercen la regulación de distintos tipos moleculares es todavía desconocida. Una necesidad en el ámbito de la bioinformática es el análisis integrativo de datos moleculares que propongan modelos de regulación detallados para conocer la relación entre metabolismo, cromatina y la transcripción. En este trabajo se ha aproximado la integración estadística de meta\-bolómica y distintos datos epigenéticos con la expresión génica. Hemos realizado estos análisis integrativos en el sistema modelo del ciclo metabólico de la levadura (YMC), en el cual la expresión génica se coordina con cambios en modificaciones de histonas y oscilaciones metabólicas. Primero analizamos el impacto de las modificaciones de histonas sobre la expresión génica, lo cual nos permitió identificar las marcas de histonas que coordinan los cambios en expresión. Después creamos un conjunto de datos multiómico obteniendo muestras de metabolómica y ATAC-seq en el YMC, e incorporamos un set de datos de NET-seq. Estos datos fueron usados para modelar el impacto de los cambios metabólicos y de la cromatina en la expresión génica y, por primera vez en ritmos biológicos, integramos los tres tipos de datos moleculares en un solo modelo usando PLS-Path Modelling, una estrategia multivariante que permite encontrar relaciones entre muchos conjuntos de datos multi dimensionales. Esta herramienta nos ha permitido conocer que la expresión génica en la fase oxidativa está regulada principalmente por la marca de histona H3K9ac, y la acumulación de ATP en esta parte del ciclo sugiere una regulación de la cromatina activando la enzima dependiente de ATP INO80. El resultado de PLS-PM también nos muestra que los derivados de la nicotinamida podrían afectar los niveles de H3K18ac en a fase RC del ciclo a través de la regulación de las sirtuinas, activando la respuesta de degradación de ácidos grasos. El aspartato también se ha asociado a la regulación epigenética de la fase RC, pero los mecanismos por los que esta asociación novedosa tienen lugar son aún desconocidos. Finalmente, hemos creado Padhoc, una herramienta computacional capaz de combinar el conocimiento existente en nuevos ámbitos de investigación -como el de este trabajo- para proponer modelos de redes metabólicas que compleneten el conocimiento de las bases de datos actuales. Esta tesis recopila la extracción de un conjunto de datos multiómicos que cubre metabolismo, epigenética y expresión génica, así como su análisis integrativo usando estrategias multivariantes novedosas que modelan la coordinación de las distintas moléculas estudiadas. Además, incluimos una herramienta para la reconstrucción de redes biológicas. En conjunto, esta tesis presenta distintas herramientas para estudiar el impacto metabólico en la expresión génica usando la biología computacional.Cellular adaptation to changing environments constitutes a critical mechanism for cell survival. Cells primarily respond to external conditions by modulating the molecular mechanisms that regulate gene expression or protein activity, granting a rapid response to external metabolic changes. Therefore, metabolic sensing constitutes an important step in cell adaptation, and epigenetics is now considered the mechanism that connects metabolic shifts with gene regulation. Epigenetic marks give cells the capability of shaping chromatin conformation, which in turn regulates gene expression. Consequently, the correct functioning of a cell's epigenetic program is critical for cellular adaptation to changing conditions. Different epigenetic modifiers rely on metabolite availability to modify the cell's epigenetic landscape. Recent studies point towards the accumulation of key metabolites as the critical mechanism by which epigenetic modifiers modulate the chromatin marks. This can be appreciated in circadian rhythms, where epigenetic changes mediate the cross-talk between metabolic oscillations and gene expression. Deficiencies that disconnect this molecular regulation lead to diseases, such as metabolic syndrome. The study of the metabolic control of the epigenome and transcriptome is an emerging field of research. Multiple studies have generated large, high-throughput datasets that measure gene expression, metabolites and histone modifications, among others, to study these interconnections; although a wealth of literature is accumulating, the precise mechanisms of these multi-layered regulations are still to be fully elucidated. Also, a consensus pathway describing these processes cannot yet be found in any of the common biological pathway databases. One critical need in the field is the integrative analysis of existing molecular data to propose detailed regulatory models for the interplay between metabolism, chromatin state and transcription. This thesis addresses the statistical integration of metabolomics and epigenetics measurements with gene expression. We approached this data analysis challenge using the Yeast Metabolic Cycle (YMC) as a model system. Gene expression at the YMC can be divided into three, well-defined phases where transcription is coordinated with histone modifications and metabolomics oscillations. First, we analyzed the impact of histone modifications on gene expression, which led to the identification of the histone marks that have a higher impact on gene expression changes. Next, we created a comprehensive, multi-layered, multi-omics dataset for this system by obtaining metabolomics and ATAC-Seq data of the YMC and incorporating an existing nascent transcription (NET-seq) dataset. Moreover, we modeled the impact of chromatin conformation and metabolic changes on gene expression, and created a regulatory model for gene expression, epigenetics and metabolomics by applying PLS Path Modeling, a multivariate strategy suitable for finding relationships across multiple high-dimensional datasets. To our knowledge, this is the first time that PLS-PM is used for the modelling of molecular regulatory layers. We found that gene expression in OX phase was mainly controlled by H3K9ac histone mark and ATP accumulation at this phase, suggesting INO80 ATP-dependent chromatin remodeling activity. We also found an enrichment of H3K18ac during RC phase, together with accumulation of nicotinamide and its derivatives, suggesting that sirtuins may regulate H3K18ac levels at RC to activate fatty acid oxidation response. Aspartate was also associated with RC phase epigenetic regulation, but the mechanisms by which this amino acid may control the epigenome are still unanswered. Finally, in this work, we have also created Padhoc, a computational pipeline to integrate the existing published knowledge in emerging research fields -such as those studied in this thesis- to propose pathway models that can complement current pathway databases. Altogether, this thesis involves the generation of a multi-omics dataset that covers metabolic, epigenetic and gene expression information, and their integrative analysis using novel multivariate strategies that model their mechanistic coordination. Moreover, it includes a framework for the reconstruction of biological pathways. All in all, we have presented different strategies by which to study the impact of metabolic changes in chromatin using computational biology approaches

    Regulatory network discovery using heuristics

    Get PDF
    This thesis improves the GRN discovery process by integrating heuristic information via a co-regulation function, a post-processing procedure, and a Hub Network algorithm to build the backbone of the network.Doctor of Philosoph

    Vascular Endothelial Growth Factor (VEGF) and Semaphorin 3A (Sema3A) signaling for vascularized bone grafts

    Get PDF
    One of the major challenges for the treatment of critical size bone defects is to ensure a rapid and efficient vascularization of tissue-engineered bone grafts upon implantation in vivo. The biological processes of osteogenesis and angiogenesis are intimately coupled, and many factors play important roles in this cross-talk. Among them, Vascular Endothelial Growth Factor (VEGF), the master regulator of vascular growth, is crucial during bone development, homeostasis and repair, and it is a key molecular target for the generation of vascularized bone grafts. However, in order to exploit its therapeutic potential, VEGF dose and spatial-temporal distribution have to be precisely controlled. Semaphorin 3A (Sema3A) regulates osteoblasts and osteoclasts to promote bone synthesis through the Neuropilin-1 receptor (NP-1) and it has important roles in angiogenesis. We previously found that VEGF dose-dependently inhibits endothelial Sema3A expression in skeletal muscle. Here we investigated the role of VEGF and Sema3A in the coupling of angiogenesis and osteogenesis in engineered bone grafts in order to provide rational bases for novel, safe and effective therapeutic strategies for the repair of bone tissue To this purpose, osteogenic constructs were prepared with human bone marrow mesenchymal cells (BMSCs) in combination with fibrin matrices decorated with recombinant VEGF or Sema3A engineered with a transglutaminase substrate sequence (TG-VEGF and TG-Sema3A) to allow cross-linking into fibrin hydrogels and controlled release. We found that VEGF-dose dependently regulates both angiogenesis and osteogenesis. Low VEGF doses accelerated vascular invasion and ensured efficient bone depositio. High VEGF doses delayed vascular ingrowth, increased osteoclast recruitment and decreased bone formation by impairing the differentiation of the implanted human osteogenic progenitor cells. Moreover, we showed that VEGF-dose dependently downregulates Sema3A expression and that Sema3A is critical for both vascularization and intramembranous bone formation in osteogenic grafts. These results confirm the importance of both VEGF and Sema3A in bone biology and provide the basis for the design of novel rational strategies to generate vascularized bone grafts with the aim to improve the healing of critical-size bone defects
    corecore