50 research outputs found
Construction of a Pig Physical Interactome Using Sequence Homology and a Comprehensive Reference Human Interactome
The analysis of interaction networks is crucial for understanding molecular function and has an essential impact for genomewide studies. However, the interactomes of most species are largely incomplete and computational strategies that take into account sequence homology can help compensating for this lack of information using cross-species analysis. In this work we report the construction of a porcine interactome resource. We applied sequence homology matching and carried out bi-directional BLASTp searches for the currently available protein sequence collections of human and pig. Using this homology we were able to recover, on average, 71% of the proteins annotated for human pathways for the pig. Porcine protein-protein interactions were deduced from homologous proteins with known interactions in human. The result of this work is a resource comprising 204,699 predicted porcine interactions that can be used in genome analyses in order to enhance functional interpretation of data. The data can be visualized and downloaded from http://cpdb.molgen.mpg.de/pig
Evidence mining and novelty assessment of proteinâprotein interactions with the ConsensusPathDB plugin for Cytoscape
Summary: Proteinâprotein interaction detection methods are applied on a daily basis by molecular biologists worldwide. After generating a set of potential interactions, biologists face the problem of highlighting the ones that are novel and collecting evidence with respect to literature and annotation. This task can be as tedious as searching for every predicted interaction in several interaction data repositories, or manually screening the scientific literature. To facilitate the task of evidence mining and novelty assessment of proteinâprotein interactions, we have developed a Cytoscape plugin that automatically mines publication references, database references, interaction detection method descriptions and pathway annotation for a user-supplied network of interactions. The basis for the annotation is ConsensusPathDBâa meta-database that integrates numerous proteinâprotein, signaling, metabolic and gene regulatory interaction repositories for currently three species: Homo sapiens, Saccharomyces cerevisiae and Mus musculus
Denoising inferred functional association networks obtained by gene fusion analysis.
BACKGROUND: Gene fusion detection - also known as the 'Rosetta Stone' method - involves the identification of fused composite genes in a set of reference genomes, which indicates potential interactions between its un-fused counterpart genes in query genomes. The precision of this method typically improves with an ever-increasing number of reference genomes. RESULTS: In order to explore the usefulness and scope of this approach for protein interaction prediction and generate a high-quality, non-redundant set of interacting pairs of proteins across a wide taxonomic range, we have exhaustively performed gene fusion analysis for 184 genomes using an efficient variant of a previously developed protocol. By analyzing interaction graphs and applying a threshold that limits the maximum number of possible interactions within the largest graph components, we show that we can reduce the number of implausible interactions due to the detection of promiscuous domains. With this generally applicable approach, we generate a robust set of over 2 million distinct and testable interactions encompassing 696,894 proteins in 184 species or strains, most of which have never been the subject of high-throughput experimental proteomics. We investigate the cumulative effect of increasing numbers of genomes on the fidelity and quantity of predictions, and show that, for large numbers of genomes, predictions do not become saturated but continue to grow linearly, for the majority of the species. We also examine the percentage of component (and composite) proteins with relation to the number of genes and further validate the functional categories that are highly represented in this robust set of detected genome-wide interactions. CONCLUSION: We illustrate the phylogenetic and functional diversity of gene fusion events across genomes, and their usefulness for accurate prediction of protein interaction and function
Human Embryonic Stem Cell Derived Hepatocyte-Like Cells as a Tool for In Vitro Hazard Assessment of Chemical Carcinogenicity
Hepatocyte-like cells derived from the differentiation of human embryonic stem cells (hES-Hep) have potential to provide a human relevant in vitro test system in which to evaluate the carcinogenic hazard of chemicals. In this study, we have investigated this potential using a panel of 15 chemicals classified as noncarcinogens, genotoxic carcinogens, and nongenotoxic carcinogens and measured whole-genome transcriptome responses with gene expression microarrays. We applied an ANOVA model that identified 592 genes highly discriminative for the panel of chemicals. Supervised classification with these genes achieved a cross-validation accuracy of > 95%. Moreover, the expression of the response genes in hES-Hep was strongly correlated with that in human primary hepatocytes cultured in vitro. In order to infer mechanistic information on the consequences of chemical exposure in hES-Hep, we developed a computational method that measures the responses of biochemical pathways to the panel of treatments and showed that these responses were discriminative for the three toxicity classes and linked to carcinogenesis through p53, mitogen-activated protein kinases, and apoptosis pathway modules. It could further be shown that the discrimination of toxicity classes was improved when analyzing the microarray data at the pathway level. In summary, our results demonstrate, for the first time, the potential of human embryonic stem cell--derived hepatic cells as an in vitro model for hazard assessment of chemical carcinogenesis, although it should be noted that more compounds are needed to test the robustness of the assay
ConsensusPathDBâa database for integrating human functional interaction networks
ConsensusPathDB is a database system for the integration of human functional interactions. Current knowledge of these interactions is dispersed in more than 200 databases, each having a specific focus and data format. ConsensusPathDB currently integrates the content of 12 different interaction databases with heterogeneous foci comprising a total of 26 133 distinct physical entities and 74 289 distinct functional interactions (proteinâprotein interactions, biochemical reactions, gene regulatory interactions), and covering 1738 pathways. We describe the database schema and the methods used for data integration. Furthermore, we describe the functionality of the ConsensusPathDB web interface, where users can search and visualize interaction networks, upload, modify and expand networks in BioPAX, SBML or PSI-MI format, or carry out over-representation analysis with uploaded identifier lists with respect to substructures derived from the integrated interaction network. The ConsensusPathDB database is available at: http://cpdb.molgen.mpg.d
Consensus-Phenotype Integration of Transcriptomic and Metabolomic Data Implies a Role for Metabolism in the Chemosensitivity of Tumour Cells
Using transcriptomic and metabolomic measurements from the NCI60 cell line panel,
together with a novel approach to integration of molecular profile data, we show
that the biochemical pathways associated with tumour cell chemosensitivity to
platinum-based drugs are highly coincident, i.e. they describe a consensus
phenotype. Direct integration of metabolome and transcriptome data at the point
of pathway analysis improved the detection of consensus pathways by 76%,
and revealed associations between platinum sensitivity and several metabolic
pathways that were not visible from transcriptome analysis alone. These pathways
included the TCA cycle and pyruvate metabolism, lipoprotein uptake and
nucleotide synthesis by both salvage and de novo pathways. Extending the
approach across a wide panel of chemotherapeutics, we confirmed the specificity
of the metabolic pathway associations to platinum sensitivity. We conclude that
metabolic phenotyping could play a role in predicting response to platinum
chemotherapy and that consensus-phenotype integration of molecular profiling
data is a powerful and versatile tool for both biomarker discovery and for
exploring the complex relationships between biological pathways and drug
response
VollstÀndigere und akkuratere Interaktionsnetzwerke zur AufklÀrung der molekularen Mechanismen von komplexen Krankheiten
Die menschliche Zelle umfasst eine groĂe Menge verschiedener BiomolekĂŒle wie
NukleinsĂ€uren, Proteine und Metabolite. Diese BiomolekĂŒle erfĂŒllen ihre
Funktionen nicht isoliert, sondern durch ein komplexes Zusammenspiel
untereinander. Erkenntnisse ĂŒber die Gesamtheit der molekularen
Wechselwirkungen, die in der Zelle stattfinden, ist unentbehrlich fĂŒr das
VerstÀndnis zellulÀrer Prozesse auf der Systemebene. Zum Beispiel können
molekulare Interaktionen oft erklÀren, wie Funktionsstörungen bestimmter Gene
etwa durch Mutation zu einer bestimmten Krankheit fĂŒhren. Gerade wegen diesem
AufklÀrungspotential molekularer Wechselwirkungen wurden zu ihrer
Identifizierung unterschiedliche Techniken entwickelt. Viele molekulare
Interaktionen in der menschlichen Zelle sind bereits entdeckt und
veröffentlicht worden, wenngleich sie schÀtzungsweise nur einen kleinen Teil
der wirklich existierenden Wechselwirkungen darstellen. Diverse Datenbanken
sind entwickelt worden um Interaktionsdaten, die zum Beispiel ĂŒber Datamining
gewonnen werden, systematisch zu sammeln. Vorhandene Interaktionsnetzwerke
werden bereits in verschiedenen Methoden eingesetzt, die zum Ziel haben, neue
Erkenntnisse ĂŒber krankheitsrelevante Gene, Stoffwechselwege und Signalwege zu
gewinnen. Ein tieferes VerstĂ€ndnis ĂŒber normale und krankheitsbedingte
zellulÀre Prozesse auf der Systemebene ist allerdings durch zwei weitere
Hauptfaktoren (neben der UnvollstÀndigkeit vorhandener Interaktionsdaten)
stark eingeschrÀnkt. Zum einen sind solche Daten in der Regel fehlerhaft, das
heiĂt, sie enthalten viele falsch positive Interaktionen. Diese entstehen
meistens durch Fehler bei den experimentellen Messungen oder gegebenenfalls
beim Datamining. Zum anderen sind vorhandene Daten in Hunderten von
Datenbanken verstreut, wobei jede Datenbank Interaktionen nur einer oder
weniger Arten enthĂ€lt: manche Datenbanken enthalten ausschlieĂlich
Proteininteraktionen, wÀhrend andere auf Genregulationen, metabolische
Reaktionen oder Signalwege spezialisiert sind. In der Zelle wirken all diese
Arten von Interaktionen zusammen um biologische Prozesse zu treiben.
Interaktionsdatenbanken mĂŒssen also integriert werden, damit ein
vollstÀndigeres Modell der zellulÀren Biologie entsteht. Eine solche
Integration ist dadurch erschwert, dass die einzelnen Datenbanken sehr
unterschiedliche Datenmodelle und -formate haben. Diese Dissertation
beschÀftigt sich mit den Herausforderungen, dass vorhandene Interaktionsdaten
zum einen fehlerhaft sind und zum anderen in vielen, wenig ĂŒberlappenden
Datenbanken zerstreut sind. Zuerst wird eine neue Metadatenbank fĂŒr molekulare
Wechselwirkungen namens ConsensusPathDB vorgestellt. Hier werden
unterschiedliche Arten von Interaktionen aus vielen öffentlichen Ressourcen
integriert um ein vollstÀndigeres Bild der molekularen Wechselwirkungen in der
menschlichen Zelle zu erzielen. Zur Zeit sind Wechselwirkungen sowie Signal-
und Stoffwechselwege aus sechsundzwanzig öffentlichen Ressourcen in der
Metadatenbank integriert. Deshalb stellt das in der ConsensusPathDB vorhandene
Interaktionsnetzwerk das umfangreichste Modell der Wechselwirkungen in der
humanen Zelle dar. Der Mehrwert der Datenintergation wird anhand einiger
Beispiele veranschaulicht. Die Webschnittstelle der Datenbank
(http://cpdb.molgen.mpg.de) bietet zahlreiche Tools fĂŒr Datensuche,
Netzwerkanalyse und -visualiserung, sowie Interaktions- und Pathwaybasierte
Analysen von Genexpressionsdaten. Diese stellen wichtige Hilfsmittel fĂŒr
Biologen und Molekularmediziner dar. Zweitens wird eine neue Methode
vorgestellt, mir der Proteininteraktionen bezĂŒglich ihrer Richtigkeit
beurteilt werden. Die resultierenden Konfidenzwerte können benutzt werden um
falsch positive Interaktionen zu detektieren, oder können als
Interaktionsgewichte in netzwerkbasierten Methoden fungieren. Im Gegensatz zu
vielen anderen Methoden werden hier keine ReferenzdatensÀtze oder zusÀtzliche
Informationen ĂŒber die einzelnen Netzwerkelemente benötigt. Solche Daten sind
oft nicht vorhanden, was vergleichbare Methoden zur Konfidenzwertbestimmung
limitiert. Die vorgeschlagene Methode benutzt ausschlieĂlich die
Netzwerkstruktur, im Speziellen ihre ModularitÀt, um die Konfidenzwerte zu
berechnen. Drittens wird ein zugleich vollstÀndigeres und akkurateres Modell
zellulÀrer Wechselwirkungen erstellt, indem die vorgestellte Konfidenzwert
Methode auf die integrierten Daten aus ConsensusPathDB angewandt wird. Von dem
resultierenden Netzwerk wird in einem neuen Verfahren zur Identifizierung von
krankheitsrelevanten Genen und Subnetzwerken unter BerĂŒcksichtigung von
Genexpressionsprofilen Gebrauch gemacht. Das integrative Verfahren wird auf
Genexpressionsdaten aus Prostatakrebspatienten angewandt um sein Potential zu
demonstrieren, Krebsgene richtig zu erkennen.The human cell comprises a large number of different biomolecules such as
nucleic acids, proteins and metabolites. These molecules fulfill their
functions not in isolation but rather through a complex interplay between each
other. Knowledge about all molecular interactions that take place in the cell
is key for understanding cellular processes at the systems level. For example,
molecular interaction data can shed light on how a functional impairment of
certain genes (caused e.g. by mutations) can lead to a certain disease.
Because of this explanatory potential of molecular interactions, different
techniques for their detection and prediction have been developed. Many human
interactions have been detected and published, even though they probably
represent only a small fraction of all interactions that take place in the
living cell. Various databases have been developed to systematically store
interactions that are e.g. mined from the scientific literature. Available
interaction networks are already utilized in various mathematical methods
aiming to gain insight into disease-related genes and pathways. A better
systems-level understanding of biological processes in health and disease is
made difficult mainly by two properties of current interaction data (beside
their incompleteness). First, such data are known to be noisy, that is, they
often contain false positive interactions. These result mainly from
experimental or curation errors. Second, current interaction knowledge is
dispersed among hundreds of interaction and pathway databases, each of which
is focused only on one or very few types of interactions. For example, some
databases contain only protein-protein interactions, while others focus either
on gene regulatory interactions, metabolic reactions, or signaling reactions.
At the same time, all types of interactions are deeply interconnected in the
living cell to drive biological processes. Interaction databases must be
integrated in order to obtain a more complete model of the molecular biology
of the cell. Such an integration is particularly difficult because most
databases have their own data model and file format. This dissertation
addresses the problems that current interaction data are often contaminated
with false positives on the one hand, and are dispersed in many, barely
overlapping databases on the other hand. Firstly, a new interaction meta-
database called ConsensusPathDB is introduced. It integrates different types
of interactions from numerous public databases in order to create a more
complete and unbiased picture of cellular biology on the molecular level.
Currently, the database comprises interactions and pathways from a total of
twenty-six resources, resulting in the most complete map of human interactions
available. The added value of the meta-database is demonstrated with several
examples. The ConsensusPathDB web interface (http://cpdb.molgen.mpg.de)
features numerous tools for searching, analyzing and visualizing the
underlying interaction network. Notably, it also provides tools for the
analysis of gene expression data in the context of interactions and pathways
that aim to facilitate research in the field of molecular medicine. Secondly,
a novel method for confidence assessment of molecular interactions is
presented. The confidence scores calculated by this method can serve to filter
out false positive interactions, or can be used as interaction weights in
methods that operate on probabilistic interaction data. In contrast to most
other interaction confidence assessment methods, the proposed method requires
no reference interaction sets or additional data about the separate
genes/proteins or interactions. Such reference sets and additional information
are not always available, which is a limiting factor for confidence assessment
methods depending on them. The proposed method exploits solely the structure
of the given interaction network, and more specifically its modularity, to
calculate confidence scores. Thirdly, a more complete and at the same time
more accurate model of molecular biology of the cell is created by applying
the proposed confidence scoring method on the integrated interaction content
of ConsensusPathDB. The resulting model is utilized in a new integrative
approach that aims to identify disease-related genes and sub-networks given
phenotype-specific gene expression data. The method is applied on expression
data from prostate cancer patients to demonstrate its potential in identifying
cancer causative genes