784 research outputs found

    Inference of gene regulation from expression datasets

    Get PDF
    The development of high throughput techniques and the accumulation of large scale gene expression data provide researchers great opportunities to more efficiently solve important but complex biological problems, such as reconstruction of gene regulatory networks and identification of miRNA-target interactions. In the past decade, many algorithms have been developed to address these problems. However, prediction and simulation of gene expression data have not yet received as much attention. In this study, we present a model based on stepwise multiple linear regression (SMLR) that can be applied for prediction and simulation of gene expression, as well as reconstruction of gene regulatory networks by analysis of time-series gene expression data, and we present its application in analysis of paired miRNA-mRNA expression data.Ph.D., Biomedical Engineering -- Drexel University, 201

    Integrative Modeling of Transcriptional Regulation in Response to Autoimmune Desease Therapies

    Get PDF
    Die rheumatoide Arthritis (RA) und die Multiple Sklerose (MS) werden allgemein als Autoimmunkrankheiten eingestuft. Zur Behandlung dieser Krankheiten werden immunmodulatorische Medikamente eingesetzt, etwa TNF-alpha-Blocker (z.B. Etanercept) im Falle der RA und IFN-beta-PrĂ€parate (z.B. Betaferon und Avonex) im Falle der MS. Bis heute sind die molekularen Mechanismen dieser Therapien weitestgehend unbekannt. Zudem ist ihre Wirksamkeit und VertrĂ€glichkeit bei einigen Patienten unzureichend. In dieser Arbeit wurde die transkriptionelle Antwort im Blut von Patienten auf jede dieser drei Therapien untersucht, um die Wirkungsweise dieser Medikamente besser zu verstehen. Dabei wurden Methoden der Netzwerkinferenz eingesetzt, mit dem Ziel, die genregulatorischen Netzwerke (GRNs) der in ihrer Expression verĂ€nderten Gene zu rekonstruieren. Ausgangspunkt dieser Analysen war jeweils ein Genexpressions- Datensatz. Daraus wurden zunĂ€chst Gene gefiltert, die nach Therapiebeginn hoch- oder herunterreguliert sind. Anschließend wurden die genregulatorischen Regionen dieser Gene auf Transkriptionsfaktor-Bindestellen (TFBS) analysiert. Um schließlich GRN-Modelle abzuleiten, wurde ein neuer Netzwerkinferenz-Algorithmus (TILAR) verwendet. TILAR unterscheidet zwischen Genen und TF und beschreibt die regulatorischen Effekte zwischen diesen durch ein lineares Gleichungssystem. TILAR erlaubt dabei Vorwissen ĂŒber Gen-TF- und TF-Gen-Interaktionen einzubeziehen. Im Ergebnis wurden komplexe Netzwerkstrukturen rekonstruiert, welche die regulatorischen Beziehungen zwischen den Genen beschreiben, die im Verlauf der Therapien differentiell exprimiert sind. FĂŒr die Etanercept-Therapie wurde ein Teilnetz gefunden, das Gene enthĂ€lt, die niedrigere Expressionslevel bei RA-Patienten zeigen, die sehr gut auf das Medikament ansprechen. Die Analyse von GRNs kann somit zu einem besseren VerstĂ€ndnis Therapie-assoziierter Prozesse beitragen und transkriptionelle Unterschiede zwischen Patienten aufzeigen

    Analysis of High-dimensional and Left-censored Data with Applications in Lipidomics and Genomics

    Get PDF
    Recently, there has been an occurrence of new kinds of high- throughput measurement techniques enabling biological research to focus on fundamental building blocks of living organisms such as genes, proteins, and lipids. In sync with the new type of data that is referred to as the omics data, modern data analysis techniques have emerged. Much of such research is focusing on finding biomarkers for detection of abnormalities in the health status of a person as well as on learning unobservable network structures representing functional associations of biological regulatory systems. The omics data have certain specific qualities such as left-censored observations due to the limitations of the measurement instruments, missing data, non-normal observations and very large dimensionality, and the interest often lies in the connections between the large number of variables. There are two major aims in this thesis. First is to provide efficient methodology for dealing with various types of missing or censored omics data that can be used for visualisation and biomarker discovery based on, for example, regularised regression techniques. Maximum likelihood based covariance estimation method for data with censored values is developed and the algorithms are described in detail. Second major aim is to develop novel approaches for detecting interactions displaying functional associations from large-scale observations. For more complicated data connections, a technique based on partial least squares regression is investigated. The technique is applied for network construction as well as for differential network analyses both on multiple imputed censored data and next- generation sequencing count data.Uudet mittausteknologiat ovat mahdollistaneet kokonaisvaltaisen ymmÀrryksen lisÀÀmisen elollisten organismien molekyylitason prosesseista. Niin kutsutut omiikka-teknologiat, kuten genomiikka, proteomiikka ja lipidomiikka, kykenevÀt tuottamaan valtavia mÀÀriÀ mittausdataa yksittÀisten geenien, proteiinien ja lipidien ekspressio- tai konsentraatiotasoista ennennÀkemÀttömÀllÀ tarkkuudella. Samanaikaisesti tarve uusien analyysimenetelmien kehittÀmiselle on kasvanut. Kiinnostuksen kohteena ovat olleet erityisesti tiettyjen sairauksien riskiÀ tai prognoosia ennustavien merkkiaineiden tunnistaminen sekÀ biologisten verkkojen rekonstruointi. Omiikka-aineistoilla on useita erityisominaisuuksia, jotka rajoittavat tavanomaisten menetelmien suoraa ja tehokasta soveltamista. NÀistÀ tÀrkeimpiÀ ovat vasemmalta sensuroidut ja puuttuvat havainnot, sekÀ havaittujen muuttujien suuri lukumÀÀrÀ. TÀmÀn vÀitöskirjan ensimmÀisenÀ tavoitteena on tarjota rÀÀtÀlöityjÀ analyysimenetelmiÀ epÀtÀydellisten omiikka-aineistojen visualisointiin ja mallin valintaan kÀyttÀen esimerkiksi regularisoituja regressiomalleja. Kuvailemme myös sensuroidulle aineistolle sopivan suurimman uskottavuuden estimaattorin kovarianssimatriisille. Toisena tavoitteena on kehittÀÀ uusia menetelmiÀ omiikka-aineistojen assosiaatiorakenteiden tarkasteluun. Monimutkaisempien rakenteiden tarkasteluun, visualisoimiseen ja vertailuun esitetÀÀn erilaisia variaatioita osittaisen pienimmÀn neliösumman menetelmÀÀn pohjautuvasta algoritmista, jonka avulla voidaan rekonstruoida assosiaatioverkkoja sekÀ multi-imputoidulle sensuroidulle ettÀ lukumÀÀrÀaineistoille.Siirretty Doriast

    Trimming of mammalian transcriptional networks using network component analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Network Component Analysis (NCA) has been used to deduce the activities of transcription factors (TFs) from gene expression data and the TF-gene binding relationship. However, the TF-gene interaction varies in different environmental conditions and tissues, but such information is rarely available and cannot be predicted simply by motif analysis. Thus, it is beneficial to identify key TF-gene interactions under the experimental condition based on transcriptome data. Such information would be useful in identifying key regulatory pathways and gene markers of TFs in further studies.</p> <p>Results</p> <p>We developed an algorithm to trim network connectivity such that the important regulatory interactions between the TFs and the genes were retained and the regulatory signals were deduced. Theoretical studies demonstrated that the regulatory signals were accurately reconstructed even in the case where only three independent transcriptome datasets were available. At least 80% of the main target genes were correctly predicted in the extreme condition of high noise level and small number of datasets. Our algorithm was tested with transcriptome data taken from mice under rapamycin treatment. The initial network topology from the literature contains 70 TFs, 778 genes, and 1423 edges between the TFs and genes. Our method retained 1074 edges (i.e. 75% of the original edge number) and identified 17 TFs as being significantly perturbed under the experimental condition. Twelve of these TFs are involved in MAPK signaling or myeloid leukemia pathways defined in the KEGG database, or are known to physically interact with each other. Additionally, four of these TFs, which are Hif1a, Cebpb, Nfkb1, and Atf1, are known targets of rapamycin. Furthermore, the trimmed network was able to predict <it>Eno1 </it>as an important target of Hif1a; this key interaction could not be detected without trimming the regulatory network.</p> <p>Conclusions</p> <p>The advantage of our new algorithm, relative to the original NCA, is that our algorithm can identify the important TF-gene interactions. Identifying the important TF-gene interactions is crucial for understanding the roles of pleiotropic global regulators, such as p53. Also, our algorithm has been developed to overcome NCA's inability to analyze large networks where multiple TFs regulate a single gene. Thus, our algorithm extends the applicability of NCA to the realm of mammalian regulatory network analysis.</p

    Bayesian approaches to reverse engineer cellular systems: a simulation study on nonlinear Gaussian networks

    Get PDF
    BACKGROUND. Reverse engineering cellular networks is currently one of the most challenging problems in systems biology. Dynamic Bayesian networks (DBNs) seem to be particularly suitable for inferring relationships between cellular variables from the analysis of time series measurements of mRNA or protein concentrations. As evaluating inference results on a real dataset is controversial, the use of simulated data has been proposed. However, DBN approaches that use continuous variables, thus avoiding the information loss associated with discretization, have not yet been extensively assessed, and most of the proposed approaches have dealt with linear Gaussian models. RESULTS. We propose a generalization of dynamic Gaussian networks to accommodate nonlinear dependencies between variables. As a benchmark dataset to test the new approach, we used data from a mathematical model of cell cycle control in budding yeast that realistically reproduces the complexity of a cellular system. We evaluated the ability of the networks to describe the dynamics of cellular systems and their precision in reconstructing the true underlying causal relationships between variables. We also tested the robustness of the results by analyzing the effect of noise on the data, and the impact of a different sampling time. CONCLUSION. The results confirmed that DBNs with Gaussian models can be effectively exploited for a first level analysis of data from complex cellular systems. The inferred models are parsimonious and have a satisfying goodness of fit. Furthermore, the networks not only offer a phenomenological description of the dynamics of cellular systems, but are also able to suggest hypotheses concerning the causal interactions between variables. The proposed nonlinear generalization of Gaussian models yielded models characterized by a slightly lower goodness of fit than the linear model, but a better ability to recover the true underlying connections between variables.Italian Ministry of University and Scientific Research; National Institutes of Health & National Human Genome Research Institute (HG003354-01A2); Collegio Ghislieri, Pavia Italy fellowshi

    Using genetic markers to orient the edges in quantitative trait networks: The NEO software

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systems genetic studies have been used to identify genetic loci that affect transcript abundances and clinical traits such as body weight. The pairwise correlations between gene expression traits and/or clinical traits can be used to define undirected trait networks. Several authors have argued that genetic markers (e.g expression quantitative trait loci, eQTLs) can serve as causal anchors for orienting the edges of a trait network. The availability of hundreds of thousands of genetic markers poses new challenges: how to relate (anchor) traits to multiple genetic markers, how to score the genetic evidence in favor of an edge orientation, and how to weigh the information from multiple markers.</p> <p>Results</p> <p>We develop and implement Network Edge Orienting (NEO) methods and software that address the challenges of inferring unconfounded and directed gene networks from microarray-derived gene expression data by integrating mRNA levels with genetic marker data and Structural Equation Model (SEM) comparisons. The NEO software implements several manual and automatic methods for incorporating genetic information to anchor traits. The networks are oriented by considering each edge separately, thus reducing error propagation. To summarize the genetic evidence in favor of a given edge orientation, we propose Local SEM-based Edge Orienting (LEO) scores that compare the fit of several competing causal graphs. SEM fitting indices allow the user to assess local and overall model fit. The NEO software allows the user to carry out a robustness analysis with regard to genetic marker selection. We demonstrate the utility of NEO by recovering known causal relationships in the sterol homeostasis pathway using liver gene expression data from an F2 mouse cross. Further, we use NEO to study the relationship between a disease gene and a biologically important gene co-expression module in liver tissue.</p> <p>Conclusion</p> <p>The NEO software can be used to orient the edges of gene co-expression networks or quantitative trait networks if the edges can be anchored to genetic marker data. R software tutorials, data, and supplementary material can be downloaded from: <url>http://www.genetics.ucla.edu/labs/horvath/aten/NEO</url>.</p

    Integrated cellular network of transcription regulations and protein-protein interactions

    Get PDF
    [[abstract]]Background With the accumulation of increasing omics data, a key goal of systems biology is to construct networks at different cellular levels to investigate cellular machinery of the cell. However, there is currently no satisfactory method to construct an integrated cellular network that combines the gene regulatory network and the signaling regulatory pathway. Results In this study, we integrated different kinds of omics data and developed a systematic method to construct the integrated cellular network based on coupling dynamic models and statistical assessments. The proposed method was applied to S. cerevisiae stress responses, elucidating the stress response mechanism of the yeast. From the resulting integrated cellular network under hyperosmotic stress, the highly connected hubs which are functionally relevant to the stress response were identified. Beyond hyperosmotic stress, the integrated network under heat shock and oxidative stress were also constructed and the crosstalks of these networks were analyzed, specifying the significance of some transcription factors to serve as the decision-making devices at the center of the bow-tie structure and the crucial role for rapid adaptation scheme to respond to stress. In addition, the predictive power of the proposed method was also demonstrated. Conclusions We successfully construct the integrated cellular network which is validated by literature evidences. The integration of transcription regulations and protein-protein interactions gives more insight into the actual biological network and is more predictive than those without integration. The method is shown to be powerful and flexible and can be used under different conditions and for different species. The coupling dynamic models of the whole integrated cellular network are very useful for theoretical analyses and for further experiments in the fields of network biology and synthetic biology.[[fileno]]2030106010241[[department]]é›»æ©Ÿć·„çš‹ć­ž

    Joint Bayesian variable and graph selection for regression models with network-structured predictors

    Get PDF
    In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications because it allows the identification of pathways of functionally related genes or proteins that impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is availablea priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival.

    Integrative methods for reconstruction of dynamic networks in chondrogenesis

    Get PDF
    Application of human mesenchymal stem cells represents a promising approach in the field of regenerative medicine. Specific stimulation can give rise to chondrocytes, osteocytes or adipocytes. Investigation of the underlying biological processes which induce the observed cellular differentiation is essential to efficiently generate specific tissues for therapeutic purposes. Upon treatment with diverse stimuli, gene expression levels of cultivated human mesenchymal stem cells were monitored using time series microarray experiments for the three lineages. Application of gene network inference is a common approach to identify the regulatory dependencies among a set of investigated genes. This thesis applies the NetGenerator V2.0 tool, which is capable to deal with multiple time series data, which investigates the effect of multiple external stimuli. The applied model is based on a system of linear ordinary differential equations, whose parameters are optimised to reproduce the given time series datasets. Several procedures in the inference process were adapted in this new version in order to allow for the integration of multiple datasets. Network inference was applied on in silico network examples as well as on multi-experiment microarray data of mesenchymal stem cells. The resulting chondrogenesis model was evaluated on the basis of several features including the model adaptation to the data, total number of connections, proportion of connections associated with prior knowledge and the model stability in a resampling procedure. Altogether, NetGenerator V2.0 has provided an automatic and efficient way to integrate experimental datasets and to enhance the interpretability and reliability of the resulting network. In a second chondrogenesis model, the miRNA and mRNA time series data were integrated for the purpose of network inference. One hypothesis of the model was verified by experiments, which demonstrated the negative effect of miR-524-5p on downstream genes
    • 

    corecore