1,129 research outputs found

    Inferring functional modules of protein families with probabilistic topic models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context.</p> <p>Results</p> <p>We describe a Bayesian method, based on a probabilistic topic model, which directly identifies functional modules of protein families. The method explores the co-occurrence patterns of protein families across a collection of sequence samples to infer a probabilistic model of arbitrarily-sized functional modules.</p> <p>Conclusions</p> <p>We show that our method identifies protein modules - some of which correspond to well-known biological processes - that are tightly interconnected with known functional interactions and are different from the interactions identified by pairwise co-occurrence. The modules are not specific to any given organism and may combine different realizations of a protein complex or pathway within different taxa.</p

    Network Analysis of Microarray Data

    Get PDF
    DNA microarrays are widely used to investigate gene expression. Even though the classical analysis of microarray data is based on the study of differentially expressed genes, it is well known that genes do not act individually. Network analysis can be applied to study association patterns of the genes in a biological system. Moreover, it finds wide application in differential coexpression analysis between different systems. Network based coexpression studies have for example been used in (complex) disease gene prioritization, disease subtyping, and patient stratification.Peer reviewe

    Inferring cellular networks – a review

    Get PDF
    In this review we give an overview of computational and statistical methods to reconstruct cellular networks. Although this area of research is vast and fast developing, we show that most currently used methods can be organized by a few key concepts. The first part of the review deals with conditional independence models including Gaussian graphical models and Bayesian networks. The second part discusses probabilistic and graph-based methods for data from experimental interventions and perturbations

    Unsupervised learning of transcriptional regulatory networks via latent tree graphical models

    Get PDF
    Gene expression is a readily-observed quantification of transcriptional activity and cellular state that enables the recovery of the relationships between regulators and their target genes. Reconstructing transcriptional regulatory networks from gene expression data is a problem that has attracted much attention, but previous work often makes the simplifying (but unrealistic) assumption that regulator activity is represented by mRNA levels. We use a latent tree graphical model to analyze gene expression without relying on transcription factor expression as a proxy for regulator activity. The latent tree model is a type of Markov random field that includes both observed gene variables and latent (hidden) variables, which factorize on a Markov tree. Through efficient unsupervised learning approaches, we determine which groups of genes are co-regulated by hidden regulators and the activity levels of those regulators. Post-processing annotates many of these discovered latent variables as specific transcription factors or groups of transcription factors. Other latent variables do not necessarily represent physical regulators but instead reveal hidden structure in the gene expression such as shared biological function. We apply the latent tree graphical model to a yeast stress response dataset. In addition to novel predictions, such as condition-specific binding of the transcription factor Msn4, our model recovers many known aspects of the yeast regulatory network. These include groups of co-regulated genes, condition-specific regulator activity, and combinatorial regulation among transcription factors. The latent tree graphical model is a general approach for analyzing gene expression data that requires no prior knowledge of which possible regulators exist, regulator activity, or where transcription factors physically bind

    Modular combinatorial binding among human trans-acting factors reveals direct and indirect factor binding

    Get PDF
    Background The combinatorial binding of trans-acting factors (TFs) to the DNA is critical to the spatial and temporal specificity of gene regulation. For certain regulatory regions, more than one regulatory module (set of TFs that bind together) are combined to achieve context-specific gene regulation. However, previous approaches are limited to either pairwise TF co-association analysis or assuming that only one module is used in each regulatory region. Results We present a new computational approach that models the modular organization of TF combinatorial binding. Our method learns compact and coherent regulatory modules from in vivo binding data using a topic model. We found that the binding of 115 TFs in K562 cells can be organized into 49 interpretable modules. Furthermore, we found that tens of thousands of regulatory regions use multiple modules, a structure that cannot be observed with previous hard clustering based methods. The modules discovered recapitulate many published protein-protein physical interactions, have consistent functional annotations of chromatin states, and uncover context specific co-binding such as gene proximal binding of NFY + FOS + SP and distal binding of NFY + FOS + USF. For certain TFs, the co-binding partners of direct binding (motif present) differs from those of indirect binding (motif absent); the distinct set of co-binding partners can predict whether the TF binds directly or indirectly with up to 95% accuracy. Joint analysis across two cell types reveals both cell-type-specific and shared regulatory modules. Conclusions Our results provide comprehensive cell-type-specific combinatorial binding maps and suggest a modular organization of combinatorial binding. Keywords Computational genomics Transcription factor Combinatorial binding Direct and indirect binding Topic modelNational Institutes of Health (U.S.) (grant 1U01HG007037-01

    Computational Approaches to Biological Network Inference and Modeling in Systems Biology

    Get PDF
    Living systems, which are composed of biological components such as molecules, cells, organisms or entire species, are dynamic and complex. Their behaviors are difficult to study with respect to the properties of individual elements. To study their behaviors, we use quantitative techniques in the "omic" fields such as genomics, bioinformatics and proteomics to measure the behavior of groups of interacting components, and we use mathematical and computational modeling to describe and predict their dynamical behavior. The first step in the understanding of a biological system is to investigate how its individual elements interact with each other. This step consist of drawing a static wiring diagram that connects the individual parts. Experimental techniques that are used - are designed to observe interactions among the biological components in the laboratory while computational approaches are designed to predict interactions among the individual elements based on their properties. In the first part of this thesis, we present techniques for network inference that are particularly targeted at protein-protein interaction networks. These techniques include comparative genomics, structure-based, biological context methods and integrated frameworks. We evaluate and compare the prediction methods that have been most often used for domain-domain interactions and we discuss the limitations of the methods and data resources. We introduce the concept of the Enhanced Phylogenetic Tree, which is a new graphical presentation of the evolutionary history of protein families; then, we propose a novel method for assigning functional linkages to proteins. This method was applied to predicting both human and yeast protein functional linkages. The next step is to obtain insights into the dynamical aspects of the biological systems. One of the outreaching goals of systems biology is to understand the emergent properties of living systems, i.e., to understand how the individual components of a system come together to form distinct, collective and interactive properties and functions. The emergent properties of a system are neither to be found in nor are directly deducible from the lower-level properties of that system. An example of the emergent properties is synchronization, a dynamical state of complex network systems in which the individual components of the systems behave coherently, almost in unison. In the second part of the thesis, we apply computational modeling to mimic and simplify real-life complex systems. We focus on clarifying how the network topology determines the initiation and propagation of synchronization. A simple but efficient method is proposed to reconstruct network structures from functional behaviors for oscillatory systems such as brain. We study the feasibility of network reconstruction systematically for different regimes of coupling and for different network topologies. We utilize the Kuramoto model, an interacting system of oscillators, which is simple but relevant enough to address our questions.Molekyylit, solut, eliöt ja eliölajit muodostavat monimutkaisia dynaamisia järjestelmiä, joiden käyttäytymistä on vaikea johtaa niiden yksittäisten osasten ominaisuuksista. omiikka -tekniikat, joihin kuuluvat esimerkiksi genomiikka, bioinformatiikka ja proteomiikka, mahdollistavat vuorovaikutuksessa olevien komponenttien käyttäytymisen kvantitatiivisen mittaamisen. Tässä työssä käytän matemaattista ja laskennallista mallinnusta kompleksisten systeemien kuvaamiseen ja niiden dynamiikan ennustamiseen. Biologisen systeemin ymmärtämisen ensimmäinen vaihe on yksittäisten elementtien vuorovaikutusten tutkiminen. Tuloksena on staattinen kytkentäkaavio yksittäisten komponenttien yhteyksistä. Kokeelliset menetelmät havainnoivat biologisten komponenttien välisiä vuorovaikutuksia laboratorio-oloissa, kun taas laskennalliset lähestymistavat pyrkivät ennustamaan vuorovaikutuksia yksittäisten komponenttien ominaisuuksien perusteella. Työn ensimmäisessä osassa esittelen proteiini-proteiini-interaktioiden ennustamiseen tarkoitettuja menetelmiä. Näihin menetelmiin kuuluvat vertaileva genomiikka, rakennepohjaiset ja biologiseen kontekstiin pohjautuvat sekä integroidut menetelmät. Arvioin ja vertailen domeeni-domeeni-interaktioiden ennustamiseen yleisimmin käytettyjä menetelmiä. Erityisesti pohdin menetelmien ja tietovarantojen rajoituksia. Otan käyttöön uuden käsitteen, Laajennetun Fylogeneettisen Puun, joka kuvaa graafisesti proteiiniperheiden kehityshistoriaa ja jonka pohjalta ehdotan uutta menetelmää proteiinien välisten toiminnallisten yhteyksien osoittamiseen. Sovelsin menetelmää proteiinien toiminnallisten yhteyksien ennustamiseen ihmisellä ja hiivalla. Biologisen systeemin ymmärtämisen seuraava vaihe on luoda käsitys sen dynamiikasta. Systeemibiologiassa pyritään ymmärtämään emergenttejä ominaisuuksia eli miten erillisten komponenttien yhteisvaikutuksesta syntyy erityisiä kollektiivisia ominaisuuksia ja toimintoja. Kompleksisen systeemin emergenttit ominaisuudet eivät ole näkyvissä eivätkä suoraan pääteltävissä systeemin alemman tason ominaisuuksista. Synkronointi on esimerkki emergentistä ominaisuudesta. Synkronoinnissa kompleksisen vuorovaikutusverkon yksittäiset komponentit tahdistuvat ja käyttäyvät lähes yhtenäisesti. Työn toisessa osassa käytän laskennallista mallinnusta jäljittelemään yksinkertaistettuja kompleksisia systeemejä. Keskityn selvittämään, kuinka vuorovaikutusverkon topologia vaikuttaa synkronoinnin viriämiseen ja leviämiseen. Esitän yksinkertaisen mutta tehokkaan menetelmän verkon rakenteen rekonstruointiin värähtelevän systeemin, kuten aivojen, toiminnallisen käyttäytymisen perusteella. Tutkin järjestelmällisesti verkon rekonstruoinnin onnistumista eri kytkentälujuuksilla ja erilaisilla verkon topologioilla. Hyödynnän Kuramoton mallia vuorovaikuttavista värähtelijöistä, joka on yksinkertainen mutta riittävä vastaamaan kysymyksiimme
    corecore