21 research outputs found

    Fast Discovery of Reliable Subnetworks

    Get PDF
    Peer reviewe

    Simulation and graph mining tools for improving gene mapping efficiency

    Get PDF
    Gene mapping is a systematic search for genes that affect observable characteristics of an organism. In this thesis we offer computational tools to improve the efficiency of (disease) gene-mapping efforts. In the first part of the thesis we propose an efficient simulation procedure for generating realistic genetical data from isolated populations. Simulated data is useful for evaluating hypothesised gene-mapping study designs and computational analysis tools. As an example of such evaluation, we demonstrate how a population-based study design can be a powerful alternative to traditional family-based designs in association-based gene-mapping projects. In the second part of the thesis we consider a prioritisation of a (typically large) set of putative disease-associated genes acquired from an initial gene-mapping analysis. Prioritisation is necessary to be able to focus on the most promising candidates. We show how to harness the current biomedical knowledge for the prioritisation task by integrating various publicly available biological databases into a weighted biological graph. We then demonstrate how to find and evaluate connections between entities, such as genes and diseases, from this unified schema by graph mining techniques. Finally, in the last part of the thesis, we define the concept of reliable subgraph and the corresponding subgraph extraction problem. Reliable subgraphs concisely describe strong and independent connections between two given vertices in a random graph, and hence they are especially useful for visualising such connections. We propose novel algorithms for extracting reliable subgraphs from large random graphs. The efficiency and scalability of the proposed graph mining methods are backed by extensive experiments on real data. While our application focus is in genetics, the concepts and algorithms can be applied to other domains as well. We demonstrate this generality by considering coauthor graphs in addition to biological graphs in the experiments.Geenikartoitus on organismin havaittaviin piirteisiin vaikuttavien geenien järjestelmällistä etsintää perimästä. Väitöskirjassa esitetään uusia menetelmiä, joilla voidaan tehostaa sairauksille altistavien geenien kartoitusta. Väitöskirjan alussa tarkastellaan perimän simulointia (tyypillisesti maantieteellisesti) eristäytyneissä populaatioissa ja esitetään tarkoitukseen soveltuva uusi simulaattoriohjelmisto. Simuloidut aineistot ovat hyödyllisiä tutkimussuunnittelussa, jolloin niillä voidaan arvioida suunniteltujen aineistojen tilastollisia ominaisuuksia sekä käytettävien analysointimenetelmien toimintaa. Esimerkkinä tällaisesta tutkimuksesta työssä käydään läpi esitetyllä ohjelmistolla tehty laajahko simulaatiotutkimus. Tulosten perusteella väestöpohjainen tapaus-verrokkitutkimusasetelma vaikuttaa olevan tilastollisesti voimakas vaihtoehto kalliimmille perhe- ja sukupuupohjaisille asetelmille. Toinen osa väitöskirjaa käsittelee mahdollisesti sairauksille altistavien ns. ehdokasgeenien pisteytystä sen mukaan, kuinka vahvat yhteydet niillä on tutkittavaan sairauteen. Pisteytys on tärkeää, koska alustavat aineiston tarkastelut tuottavat tyypillisesti runsaasti ehdokasgeenejä, joiden kaikkien läpikäynti olisi liian työlästä. Pisteytyksellä jatkotutkimukset voidaan kohdistaa lupaavimpiin ehdokkaisiin. Työssä esitetään kuinka tällä hetkellä erillissä tietokannoissa oleva biologinen tieto voidaan esittää yhteinäisessä verkkomuodossa. Lisäksi näytetään kuinka tällaisesta aineistosta voidaan etsiä ehdokasgeenien ja tutkittavan sairauden välisiä yhteyksiä ja pisteyttää niitä verkonlouhinta-algoritmien avulla. Lopuksi työssä esitetään luotettavan aliverkon eristämisongelma ja algoritmeja sen ratkaisemiseen. Ongelmassa tavoitteena on poimia suuresta verkosta suhteellisen pieni aliverkko, joka sisältää vahvoja ja toisistaan riippumattomia yhteyksiä kahden annetun verkon solmun välillä. Siten luotettavat aliverkot soveltuvat erityisen hyvin löydettyjen yhteyksien kuvalliseen esittämiseen. Luotettavia aliverkkoja voidaan soveltaa perinnöllisyystieteen lisäksi myös muilla aloilla, kuten sosiaalisten verkkojen analyysissä

    Finding reliable subgraphs from large probabilistic graphs

    Get PDF
    Reliable subgraphs can be used, for example, to find and rank nontrivial links between given vertices, to concisely visualize large graphs, or to reduce the size of input for computationally demanding graph algorithms. We propose two new heuristics for solving the most reliable subgraph extraction problem on large, undirected probabilistic graphs. Such a problem is specified by a probabilistic graph G subject to random edge failures, a set of terminal vertices, and an integer K. The objective is to remove K edges from G such that the probability of connecting the terminals in the remaining subgraph is maximized. We provide some technical details and a rough analysis of the proposed algorithms. The practical performance of the methods is evaluated on real probabilistic graphs from the biological domain. The results indicate that the methods scale much better to large input graphs, both computationally and in terms of the quality of the result.Reliable subgraphs can be used, for example, to find and rank nontrivial links between given vertices, to concisely visualize large graphs, or to reduce the size of input for computationally demanding graph algorithms. We propose two new heuristics for solving the most reliable subgraph extraction problem on large, undirected probabilistic graphs. Such a problem is specified by a probabilistic graph G subject to random edge failures, a set of terminal vertices, and an integer K. The objective is to remove K edges from G such that the probability of connecting the terminals in the remaining subgraph is maximized. We provide some technical details and a rough analysis of the proposed algorithms. The practical performance of the methods is evaluated on real probabilistic graphs from the biological domain. The results indicate that the methods scale much better to large input graphs, both computationally and in terms of the quality of the result.Reliable subgraphs can be used, for example, to find and rank nontrivial links between given vertices, to concisely visualize large graphs, or to reduce the size of input for computationally demanding graph algorithms. We propose two new heuristics for solving the most reliable subgraph extraction problem on large, undirected probabilistic graphs. Such a problem is specified by a probabilistic graph G subject to random edge failures, a set of terminal vertices, and an integer K. The objective is to remove K edges from G such that the probability of connecting the terminals in the remaining subgraph is maximized. We provide some technical details and a rough analysis of the proposed algorithms. The practical performance of the methods is evaluated on real probabilistic graphs from the biological domain. The results indicate that the methods scale much better to large input graphs, both computationally and in terms of the quality of the result.Peer reviewe

    Label-free quantitative phosphoproteomics with novel pairwise abundance normalization reveals synergistic RAS and CIP2A signaling

    Get PDF
    Hyperactivated RAS drives progression of many human malignancies. However, oncogenic activity of RAS is dependent on simultaneous inactivation of protein phosphatase 2A (PP2A) activity. Although PP2A is known to regulate some of the RAS effector pathways, it has not been systematically assessed how these proteins functionally interact. Here we have analyzed phosphoproteomes regulated by either RAS or PP2A, by phosphopeptide enrichment followed by mass-spectrometry-based label-free quantification. To allow data normalization in situations where depletion of RAS or PP2A inhibitor CIP2A causes a large uni-directional change in the phosphopeptide abundance, we developed a novel normalization strategy, named pairwise normalization. This normalization is based on adjusting phosphopeptide abundances measured before and after the enrichment. The superior performance of the pairwise normalization was verified by various independent methods. Additionally, we demonstrate how the selected normalization method influences the downstream analyses and interpretation of pathway activities. Consequently, bioinformatics analysis of RAS and CIP2A regulated phosphoproteomes revealed a significant overlap in their functional pathways. This is most likely biologically meaningful as we observed a synergistic survival effect between CIP2A and RAS expression as well as KRAS activating mutations in TCGA pan-cancer data set, and synergistic relationship between CIP2A and KRAS depletion in colony growth assays.Peer reviewe
    corecore