50 research outputs found

    Construction of a Pig Physical Interactome Using Sequence Homology and a Comprehensive Reference Human Interactome

    Get PDF
    The analysis of interaction networks is crucial for understanding molecular function and has an essential impact for genomewide studies. However, the interactomes of most species are largely incomplete and computational strategies that take into account sequence homology can help compensating for this lack of information using cross-species analysis. In this work we report the construction of a porcine interactome resource. We applied sequence homology matching and carried out bi-directional BLASTp searches for the currently available protein sequence collections of human and pig. Using this homology we were able to recover, on average, 71% of the proteins annotated for human pathways for the pig. Porcine protein-protein interactions were deduced from homologous proteins with known interactions in human. The result of this work is a resource comprising 204,699 predicted porcine interactions that can be used in genome analyses in order to enhance functional interpretation of data. The data can be visualized and downloaded from http://cpdb.molgen.mpg.de/pig

    Evidence mining and novelty assessment of protein–protein interactions with the ConsensusPathDB plugin for Cytoscape

    Get PDF
    Summary: Protein–protein interaction detection methods are applied on a daily basis by molecular biologists worldwide. After generating a set of potential interactions, biologists face the problem of highlighting the ones that are novel and collecting evidence with respect to literature and annotation. This task can be as tedious as searching for every predicted interaction in several interaction data repositories, or manually screening the scientific literature. To facilitate the task of evidence mining and novelty assessment of protein–protein interactions, we have developed a Cytoscape plugin that automatically mines publication references, database references, interaction detection method descriptions and pathway annotation for a user-supplied network of interactions. The basis for the annotation is ConsensusPathDB—a meta-database that integrates numerous protein–protein, signaling, metabolic and gene regulatory interaction repositories for currently three species: Homo sapiens, Saccharomyces cerevisiae and Mus musculus

    Denoising inferred functional association networks obtained by gene fusion analysis.

    Get PDF
    BACKGROUND: Gene fusion detection - also known as the 'Rosetta Stone' method - involves the identification of fused composite genes in a set of reference genomes, which indicates potential interactions between its un-fused counterpart genes in query genomes. The precision of this method typically improves with an ever-increasing number of reference genomes. RESULTS: In order to explore the usefulness and scope of this approach for protein interaction prediction and generate a high-quality, non-redundant set of interacting pairs of proteins across a wide taxonomic range, we have exhaustively performed gene fusion analysis for 184 genomes using an efficient variant of a previously developed protocol. By analyzing interaction graphs and applying a threshold that limits the maximum number of possible interactions within the largest graph components, we show that we can reduce the number of implausible interactions due to the detection of promiscuous domains. With this generally applicable approach, we generate a robust set of over 2 million distinct and testable interactions encompassing 696,894 proteins in 184 species or strains, most of which have never been the subject of high-throughput experimental proteomics. We investigate the cumulative effect of increasing numbers of genomes on the fidelity and quantity of predictions, and show that, for large numbers of genomes, predictions do not become saturated but continue to grow linearly, for the majority of the species. We also examine the percentage of component (and composite) proteins with relation to the number of genes and further validate the functional categories that are highly represented in this robust set of detected genome-wide interactions. CONCLUSION: We illustrate the phylogenetic and functional diversity of gene fusion events across genomes, and their usefulness for accurate prediction of protein interaction and function

    Human Embryonic Stem Cell Derived Hepatocyte-Like Cells as a Tool for In Vitro Hazard Assessment of Chemical Carcinogenicity

    Get PDF
    Hepatocyte-like cells derived from the differentiation of human embryonic stem cells (hES-Hep) have potential to provide a human relevant in vitro test system in which to evaluate the carcinogenic hazard of chemicals. In this study, we have investigated this potential using a panel of 15 chemicals classified as noncarcinogens, genotoxic carcinogens, and nongenotoxic carcinogens and measured whole-genome transcriptome responses with gene expression microarrays. We applied an ANOVA model that identified 592 genes highly discriminative for the panel of chemicals. Supervised classification with these genes achieved a cross-validation accuracy of > 95%. Moreover, the expression of the response genes in hES-Hep was strongly correlated with that in human primary hepatocytes cultured in vitro. In order to infer mechanistic information on the consequences of chemical exposure in hES-Hep, we developed a computational method that measures the responses of biochemical pathways to the panel of treatments and showed that these responses were discriminative for the three toxicity classes and linked to carcinogenesis through p53, mitogen-activated protein kinases, and apoptosis pathway modules. It could further be shown that the discrimination of toxicity classes was improved when analyzing the microarray data at the pathway level. In summary, our results demonstrate, for the first time, the potential of human embryonic stem cell--derived hepatic cells as an in vitro model for hazard assessment of chemical carcinogenesis, although it should be noted that more compounds are needed to test the robustness of the assay

    ConsensusPathDB—a database for integrating human functional interaction networks

    Get PDF
    ConsensusPathDB is a database system for the integration of human functional interactions. Current knowledge of these interactions is dispersed in more than 200 databases, each having a specific focus and data format. ConsensusPathDB currently integrates the content of 12 different interaction databases with heterogeneous foci comprising a total of 26 133 distinct physical entities and 74 289 distinct functional interactions (protein–protein interactions, biochemical reactions, gene regulatory interactions), and covering 1738 pathways. We describe the database schema and the methods used for data integration. Furthermore, we describe the functionality of the ConsensusPathDB web interface, where users can search and visualize interaction networks, upload, modify and expand networks in BioPAX, SBML or PSI-MI format, or carry out over-representation analysis with uploaded identifier lists with respect to substructures derived from the integrated interaction network. The ConsensusPathDB database is available at: http://cpdb.molgen.mpg.d

    Consensus-Phenotype Integration of Transcriptomic and Metabolomic Data Implies a Role for Metabolism in the Chemosensitivity of Tumour Cells

    Get PDF
    Using transcriptomic and metabolomic measurements from the NCI60 cell line panel, together with a novel approach to integration of molecular profile data, we show that the biochemical pathways associated with tumour cell chemosensitivity to platinum-based drugs are highly coincident, i.e. they describe a consensus phenotype. Direct integration of metabolome and transcriptome data at the point of pathway analysis improved the detection of consensus pathways by 76%, and revealed associations between platinum sensitivity and several metabolic pathways that were not visible from transcriptome analysis alone. These pathways included the TCA cycle and pyruvate metabolism, lipoprotein uptake and nucleotide synthesis by both salvage and de novo pathways. Extending the approach across a wide panel of chemotherapeutics, we confirmed the specificity of the metabolic pathway associations to platinum sensitivity. We conclude that metabolic phenotyping could play a role in predicting response to platinum chemotherapy and that consensus-phenotype integration of molecular profiling data is a powerful and versatile tool for both biomarker discovery and for exploring the complex relationships between biological pathways and drug response

    VollstÀndigere und akkuratere Interaktionsnetzwerke zur AufklÀrung der molekularen Mechanismen von komplexen Krankheiten

    No full text
    Die menschliche Zelle umfasst eine große Menge verschiedener BiomolekĂŒle wie NukleinsĂ€uren, Proteine und Metabolite. Diese BiomolekĂŒle erfĂŒllen ihre Funktionen nicht isoliert, sondern durch ein komplexes Zusammenspiel untereinander. Erkenntnisse ĂŒber die Gesamtheit der molekularen Wechselwirkungen, die in der Zelle stattfinden, ist unentbehrlich fĂŒr das VerstĂ€ndnis zellulĂ€rer Prozesse auf der Systemebene. Zum Beispiel können molekulare Interaktionen oft erklĂ€ren, wie Funktionsstörungen bestimmter Gene etwa durch Mutation zu einer bestimmten Krankheit fĂŒhren. Gerade wegen diesem AufklĂ€rungspotential molekularer Wechselwirkungen wurden zu ihrer Identifizierung unterschiedliche Techniken entwickelt. Viele molekulare Interaktionen in der menschlichen Zelle sind bereits entdeckt und veröffentlicht worden, wenngleich sie schĂ€tzungsweise nur einen kleinen Teil der wirklich existierenden Wechselwirkungen darstellen. Diverse Datenbanken sind entwickelt worden um Interaktionsdaten, die zum Beispiel ĂŒber Datamining gewonnen werden, systematisch zu sammeln. Vorhandene Interaktionsnetzwerke werden bereits in verschiedenen Methoden eingesetzt, die zum Ziel haben, neue Erkenntnisse ĂŒber krankheitsrelevante Gene, Stoffwechselwege und Signalwege zu gewinnen. Ein tieferes VerstĂ€ndnis ĂŒber normale und krankheitsbedingte zellulĂ€re Prozesse auf der Systemebene ist allerdings durch zwei weitere Hauptfaktoren (neben der UnvollstĂ€ndigkeit vorhandener Interaktionsdaten) stark eingeschrĂ€nkt. Zum einen sind solche Daten in der Regel fehlerhaft, das heißt, sie enthalten viele falsch positive Interaktionen. Diese entstehen meistens durch Fehler bei den experimentellen Messungen oder gegebenenfalls beim Datamining. Zum anderen sind vorhandene Daten in Hunderten von Datenbanken verstreut, wobei jede Datenbank Interaktionen nur einer oder weniger Arten enthĂ€lt: manche Datenbanken enthalten ausschließlich Proteininteraktionen, wĂ€hrend andere auf Genregulationen, metabolische Reaktionen oder Signalwege spezialisiert sind. In der Zelle wirken all diese Arten von Interaktionen zusammen um biologische Prozesse zu treiben. Interaktionsdatenbanken mĂŒssen also integriert werden, damit ein vollstĂ€ndigeres Modell der zellulĂ€ren Biologie entsteht. Eine solche Integration ist dadurch erschwert, dass die einzelnen Datenbanken sehr unterschiedliche Datenmodelle und -formate haben. Diese Dissertation beschĂ€ftigt sich mit den Herausforderungen, dass vorhandene Interaktionsdaten zum einen fehlerhaft sind und zum anderen in vielen, wenig ĂŒberlappenden Datenbanken zerstreut sind. Zuerst wird eine neue Metadatenbank fĂŒr molekulare Wechselwirkungen namens ConsensusPathDB vorgestellt. Hier werden unterschiedliche Arten von Interaktionen aus vielen öffentlichen Ressourcen integriert um ein vollstĂ€ndigeres Bild der molekularen Wechselwirkungen in der menschlichen Zelle zu erzielen. Zur Zeit sind Wechselwirkungen sowie Signal- und Stoffwechselwege aus sechsundzwanzig öffentlichen Ressourcen in der Metadatenbank integriert. Deshalb stellt das in der ConsensusPathDB vorhandene Interaktionsnetzwerk das umfangreichste Modell der Wechselwirkungen in der humanen Zelle dar. Der Mehrwert der Datenintergation wird anhand einiger Beispiele veranschaulicht. Die Webschnittstelle der Datenbank (http://cpdb.molgen.mpg.de) bietet zahlreiche Tools fĂŒr Datensuche, Netzwerkanalyse und -visualiserung, sowie Interaktions- und Pathwaybasierte Analysen von Genexpressionsdaten. Diese stellen wichtige Hilfsmittel fĂŒr Biologen und Molekularmediziner dar. Zweitens wird eine neue Methode vorgestellt, mir der Proteininteraktionen bezĂŒglich ihrer Richtigkeit beurteilt werden. Die resultierenden Konfidenzwerte können benutzt werden um falsch positive Interaktionen zu detektieren, oder können als Interaktionsgewichte in netzwerkbasierten Methoden fungieren. Im Gegensatz zu vielen anderen Methoden werden hier keine ReferenzdatensĂ€tze oder zusĂ€tzliche Informationen ĂŒber die einzelnen Netzwerkelemente benötigt. Solche Daten sind oft nicht vorhanden, was vergleichbare Methoden zur Konfidenzwertbestimmung limitiert. Die vorgeschlagene Methode benutzt ausschließlich die Netzwerkstruktur, im Speziellen ihre ModularitĂ€t, um die Konfidenzwerte zu berechnen. Drittens wird ein zugleich vollstĂ€ndigeres und akkurateres Modell zellulĂ€rer Wechselwirkungen erstellt, indem die vorgestellte Konfidenzwert Methode auf die integrierten Daten aus ConsensusPathDB angewandt wird. Von dem resultierenden Netzwerk wird in einem neuen Verfahren zur Identifizierung von krankheitsrelevanten Genen und Subnetzwerken unter BerĂŒcksichtigung von Genexpressionsprofilen Gebrauch gemacht. Das integrative Verfahren wird auf Genexpressionsdaten aus Prostatakrebspatienten angewandt um sein Potential zu demonstrieren, Krebsgene richtig zu erkennen.The human cell comprises a large number of different biomolecules such as nucleic acids, proteins and metabolites. These molecules fulfill their functions not in isolation but rather through a complex interplay between each other. Knowledge about all molecular interactions that take place in the cell is key for understanding cellular processes at the systems level. For example, molecular interaction data can shed light on how a functional impairment of certain genes (caused e.g. by mutations) can lead to a certain disease. Because of this explanatory potential of molecular interactions, different techniques for their detection and prediction have been developed. Many human interactions have been detected and published, even though they probably represent only a small fraction of all interactions that take place in the living cell. Various databases have been developed to systematically store interactions that are e.g. mined from the scientific literature. Available interaction networks are already utilized in various mathematical methods aiming to gain insight into disease-related genes and pathways. A better systems-level understanding of biological processes in health and disease is made difficult mainly by two properties of current interaction data (beside their incompleteness). First, such data are known to be noisy, that is, they often contain false positive interactions. These result mainly from experimental or curation errors. Second, current interaction knowledge is dispersed among hundreds of interaction and pathway databases, each of which is focused only on one or very few types of interactions. For example, some databases contain only protein-protein interactions, while others focus either on gene regulatory interactions, metabolic reactions, or signaling reactions. At the same time, all types of interactions are deeply interconnected in the living cell to drive biological processes. Interaction databases must be integrated in order to obtain a more complete model of the molecular biology of the cell. Such an integration is particularly difficult because most databases have their own data model and file format. This dissertation addresses the problems that current interaction data are often contaminated with false positives on the one hand, and are dispersed in many, barely overlapping databases on the other hand. Firstly, a new interaction meta- database called ConsensusPathDB is introduced. It integrates different types of interactions from numerous public databases in order to create a more complete and unbiased picture of cellular biology on the molecular level. Currently, the database comprises interactions and pathways from a total of twenty-six resources, resulting in the most complete map of human interactions available. The added value of the meta-database is demonstrated with several examples. The ConsensusPathDB web interface (http://cpdb.molgen.mpg.de) features numerous tools for searching, analyzing and visualizing the underlying interaction network. Notably, it also provides tools for the analysis of gene expression data in the context of interactions and pathways that aim to facilitate research in the field of molecular medicine. Secondly, a novel method for confidence assessment of molecular interactions is presented. The confidence scores calculated by this method can serve to filter out false positive interactions, or can be used as interaction weights in methods that operate on probabilistic interaction data. In contrast to most other interaction confidence assessment methods, the proposed method requires no reference interaction sets or additional data about the separate genes/proteins or interactions. Such reference sets and additional information are not always available, which is a limiting factor for confidence assessment methods depending on them. The proposed method exploits solely the structure of the given interaction network, and more specifically its modularity, to calculate confidence scores. Thirdly, a more complete and at the same time more accurate model of molecular biology of the cell is created by applying the proposed confidence scoring method on the integrated interaction content of ConsensusPathDB. The resulting model is utilized in a new integrative approach that aims to identify disease-related genes and sub-networks given phenotype-specific gene expression data. The method is applied on expression data from prostate cancer patients to demonstrate its potential in identifying cancer causative genes
    corecore