Search CORE

The Francis Crick Institute

MPact: the MIPS protein interaction resource on yeast

Author: Güldener Ulrich
Mewes Hans-Werner
Münsterkötter Martin
Oesterheld Matthias
Pagel Philipp
Ruepp Andreas
Stümpflen Volker
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

In recent years, the Munich Information Center for Protein Sequences (MIPS) yeast protein–protein interaction (PPI) dataset has been used in numerous analyses of protein networks and has been called a gold standard because of its quality and comprehensiveness [H. Yu, N. M. Luscombe, H. X. Lu, X. Zhu, Y. Xia, J. D. Han, N. Bertin, S. Chung, M. Vidal and M. Gerstein (2004) Genome Res., 14, 1107–1118]. MPact and the yeast protein localization catalog provide information related to the proximity of proteins in yeast. Beside the integration of high-throughput data, information about experimental evidence for PPIs in the literature was compiled by experts adding up to 4300 distinct PPIs connecting 1500 proteins in yeast. As the interaction data is a complementary part of CYGD, interactive mapping of data on other integrated data types such as the functional classification catalog [A. Ruepp, A. Zollner, D. Maier, K. Albermann, J. Hani, M. Mokrejs, I. Tetko, U. Güldener, G. Mannhaupt, M. Münsterkötter and H. W. Mewes (2004) Nucleic Acids Res., 32, 5539–5545] is possible. A survey of signaling proteins and comparison with pathway data from KEGG demonstrates that based on these manually annotated data only an extensive overview of the complexity of this functional network can be obtained in yeast. The implementation of a web-based PPI-analysis tool allows analysis and visualization of protein interaction networks and facilitates integration of our curated data with high-throughput datasets. The complete dataset as well as user-defined sub-networks can be retrieved easily in the standardized PSI-MI format. The resource can be accessed through

SIMAP: the similarity matrix of proteins

Author: Arnold Roland
Lindner Dominik
Mewes H. Werner
Rattei Thomas
Stümpflen Volker
Tischler Patrick
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

Similarity Matrix of Proteins (SIMAP) () provides a database based on a pre-computed similarity matrix covering the similarity space formed by >4 million amino acid sequences from public databases and completely sequenced genomes. The database is capable of handling very large datasets and is updated incrementally. For sequence similarity searches and pairwise alignments, we implemented a grid-enabled software system, which is based on FASTA heuristics and the Smith–Waterman algorithm. Our ProtInfo system allows querying by protein sequences covered by the SIMAP dataset as well as by fragments of these sequences, highly similar sequences and title words. Each sequence in the database is supplemented with pre-calculated features generated by detailed sequence analyses. By providing WWW interfaces as well as web-services, we offer the SIMAP resource as an efficient and comprehensive tool for sequence similarity searches

University of Birmingham Research Portal

FGDB: a comprehensive fungal genome resource on the plant pathogen Fusarium graminearum

Author: Adam Gerhard
Güldener Ulrich
Haase Dirk
Mannhaupt Gertrud
Mewes Hans-Werner
Münsterkötter Martin
Oesterheld Matthias
Stümpflen Volker
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

The MIPS Fusarium graminearum Genome Database (FGDB) is a comprehensive genome database on one of the most devastating fungal plant pathogens of wheat and barley. FGDB provides information on two gene sets independently derived by automated annotation of the F.graminearum genome sequence. A complete manually revised gene set will be completed within the near future. The initial results of systematic manual correction of gene calls are already part of the current gene set. The database can be accessed to retrieve information from bioinformatics analyses and functional classifications of the proteins. The data are also organized in the well established MIPS catalogs and novel query techniques are available to search the data. The comprehensive set of gene calls was also used for the design of an Affymetrix GeneChip. The resource is accessible on

The Mouse Functional Genome Database (MfunGD): functional annotation of proteins in the light of their cellular context

Author: Brauner Barbara
Doudieu Octave Noubibou
Dunger-Kaltenbach Irmtraud
Fobo Gisela
Frishman Dmitrij
Frishman Goar
Mewes H. Werner
Montrone Corinna
Oesterheld Matthias
Pagel Philipp
Rattei Thomas
Riley Louise
Ruepp Andreas
Skornia Christine
Stümpflen Volker
Surmeli Dimitrij
Tetko Igor V.
van den Oever Jos
Wanka Steffi
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

MfunGD () provides a resource for annotated mouse proteins and their occurrence in protein networks. Manual annotation concentrates on proteins which are found to interact physically with other proteins. Accordingly, manually curated information from a protein–protein interaction database (MPPI) and a database of mammalian protein complexes is interconnected with MfunGD. Protein function annotation is performed using the Functional Catalogue (FunCat) annotation scheme which is widely used for the analysis of protein networks. The dataset is also supplemented with information about the literature that was used in the annotation process as well as links to the SIMAP Fasta database, the Pedant protein analysis system and cross-references to external resources. Proteins that so far were not manually inspected are annotated automatically by a graphical probabilistic model and/or superparamagnetic clustering. The database is continuously expanding to include the rapidly growing amount of functional information about gene products from mouse. MfunGD is implemented in GenRE, a J2EE-based component-oriented multi-tier architecture following the separation of concern principle

Electrically controlled light scattering from thermoreversible liquid-crystal gels

Author: Cees W. M. Bastiaansen
Dirk J. Broer
Ivashchenko H. V.
Paul Smith
Rob H. C. Janssen
Schlotmann R.
Theo A. Tervoort
Thierry A.
Volker Stümpflen
Publication venue: 'AIP Publishing'
Publication date
Field of study

Public Library of Science (PLOS)

HiNO: An Approach for Inferring Hierarchical Organization from Regulatory Networks

Author: A Clauset
AJ Levine
AL Barabási
B Vogelstein
E Ravasz
Francesco Falciani
G Balázsi
H Yu
HW Ma
HW Ma
J Häsler
L He
M Magnusson
Mara L. Hartsperger
O Hobert
R Jothi
Robert Strache
SS Shen-Orr
T Ravasi
TH Cormen
U Alon
Volker Stümpflen
ZN Oltvai
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

BACKGROUND: Gene expression as governed by the interplay of the components of regulatory networks is indeed one of the most complex fundamental processes in biological systems. Although several methods have been published to unravel the hierarchical structure of regulatory networks, weaknesses such as the incorrect or inconsistent assignment of elements to their hierarchical levels, the incapability to cope with cyclic dependencies within the networks or the need for a manual curation to retrieve non-overlapping levels remain unsolved. METHODOLOGY/RESULTS: We developed HiNO as a significant improvement of the so-called breadth-first-search (BFS) method. While BFS is capable of determining the overall hierarchical structures from gene regulatory networks, it especially has problems solving feed-forward type of loops leading to conflicts within the level assignments. We resolved these problems by adding a recursive correction approach consisting of two steps. First each vertex is placed on the lowest level that this vertex and its regulating vertices are assigned to (downgrade procedure). Second, vertices are assigned to the next higher level (upgrade procedure) if they have successors with the same level assignment and have themselves no regulators. We evaluated HiNO by comparing it with the BFS method by applying them to the regulatory networks from Saccharomyces cerevisiae and Escherichia coli, respectively. The comparison shows clearly how conflicts in level assignment are resolved in HiNO in order to produce correct hierarchical structures even on the local levels in an automated fashion. CONCLUSIONS: We showed that the resolution of conflicting assignments clearly improves the BFS-method. While we restricted our analysis to gene regulatory networks, our approach is suitable to deal with any directed hierarchical networks structure such as the interaction of microRNAs or the action of non-coding RNAs in general. Furthermore we provide a user-friendly web-interface for HiNO that enables the extraction of the hierarchical structure of any directed regulatory network. AVAILABILITY: HiNO is freely accessible at http://mips.helmholtz-muenchen.de/hino/

Public Library of Science (PLOS)

Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts

Author: AB Clegg
C Nedellec
D Klein
D Rebholz-Schuhmann
E Charniak
H Jose
Hans-Werner Mewes
I Donaldson
J Tsujii
J-H Eom
Jason Weston
K Fundel
L Hirschman
M Lease
M Palmer
Mark Isalan
R Collobert
R Collobert
R Hoffmann
Ronan Collobert
RT-H Tsai
S Bethard
S Pradhan
TH Tsai
Thorsten Barnickel
Volker Stümpflen
Y Kogan
Y Miyao
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA (“Semantic Extraction using a Neural Network Architecture”), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches

Structuring heterogeneous biological information using fuzzy clustering of k-partite graphs

Author: A Banerjee
A Clauset
A Misbahuddin
A Ruepp
AK Jain
AL Barabási
AN Langville
AP Erdös
B Long
CJ Sylvester
D Lee
D Lee
D Zhou
E Hüllermeier
E Ravasz
Fabian J Theis
Florian Blöchl
G Karypis
G Palla
H Cho
I Dhillon
J Bezdek
J Dunn
JB MacQueen
JB Pereira-Leal
K Devarajan
KI Goh
KV Mardia
M Barber
M Campos
M Fiorio
MA Yildirim
Mara L Hartsperger
N Gulbahce
P Paatero
P Wong
R Montanez
RC Samaco
RJ Shprintzen
RR Lebel
S Bauer
S Klamt
S Maslov
T Barnickel
Volker Stümpflen
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Extensive and automated data integration in bioinformatics facilitates the construction of large, complex biological networks. However, the challenge lies in the interpretation of these networks. While most research focuses on the unipartite or bipartite case, we address the more general but common situation of <it>k</it>-partite graphs. These graphs contain <it>k </it>different node types and links are only allowed between nodes of different types. In order to reveal their structural organization and describe the contained information in a more coarse-grained fashion, we ask how to detect clusters within each node type. Results Since entities in biological networks regularly have more than one function and hence participate in more than one cluster, we developed a <it>k</it>-partite graph partitioning algorithm that allows for overlapping (fuzzy) clusters. It determines for each node a degree of membership to each cluster. Moreover, the algorithm estimates a weighted <it>k</it>-partite graph that connects the extracted clusters. Our method is fast and efficient, mimicking the multiplicative update rules commonly employed in algorithms for non-negative matrix factorization. It facilitates the decomposition of networks on a chosen scale and therefore allows for analysis and interpretation of structures on various resolution levels. Applying our algorithm to a tripartite disease-gene-protein complex network, we were able to structure this graph on a large scale into clusters that are functionally correlated and biologically meaningful. Locally, smaller clusters enabled reclassification or annotation of the clusters' elements. We exemplified this for the transcription factor MECP2. Conclusions In order to cope with the overwhelming amount of information available from biomedical literature, we need to tackle the challenge of finding structures in large networks with nodes of multiple types. To this end, we presented a novel fuzzy <it>k</it>-partite graph partitioning algorithm that allows the decomposition of these objects in a comprehensive fashion. We validated our approach both on artificial and real-world data. It is readily applicable to any further problem.</p

Springer - Publisher Connector