Search CORE

36,560 research outputs found

A functional hierarchical organization of the protein sequence space

Author: Friedlich Moriah
Fromer Menachem
Kaplan Noam
Linial Michal
Publication venue: BioMed Central
Publication date: 14/12/2004
Field of study

BACKGROUND: It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity. RESULTS: In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust. CONCLUSIONS: We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins

Springer - Publisher Connector

PubMed Central

Hierarchical coexistence of universality and diversity controls robustness and multi-functionality in intermediate filament protein networks

Author: Markus J. Buehler
Theodor Ackbarow
Publication venue
Publication date: 25/08/2007
Field of study

Proteins constitute the elementary building blocks of a vast variety of biological materials such as cellular protein networks, spider silk or bone, where they create extremely robust, multi-functional materials by self-organization of structures over many length- and time scales, from nano to macro. Some of the structural features are commonly found in a many different tissues, that is, they are highly conserved. Examples of such universal building blocks include alpha-helices, beta-sheets or tropocollagen molecules. In contrast, other features are highly specific to tissue types, such as particular filament assemblies, beta-sheet nanocrystals in spider silk or tendon fascicles. These examples illustrate that the coexistence of universality and diversity – in the following referred to as the universality-diversity paradigm (UDP) – is an overarching feature in protein materials. This paradigm is a paradox: How can a structure be universal and diverse at the same time? In protein materials, the coexistence of universality and diversity is enabled by utilizing hierarchies, which serve as an additional dimension beyond the 3D or 4D physical space. This may be crucial to understand how their structure and properties are linked, and how these materials are capable of combining seemingly disparate properties such as strength and robustness. Here we illustrate how the UDP enables to unify universal building blocks and highly diversified patterns through formation of hierarchical structures that lead to multi-functional, robust yet highly adapted structures. We illustrate these concepts in an analysis of three types of intermediate filament proteins, including vimentin, lamin and keratin

Nature Precedings

A Hierarchical Approach to Protein Molecular Evolution

Author: Crameri
Cwirla
Davidson
Devlin
Fisch
Gilbert
Gilbert
Gram
Griffiths
Hawkins
Kamtekar
Kauffman
Kauffman
Kepler
L. D. Bogarad
Lawrence
M. W. Deem
Maeshiro
Mandecki
Moore
Netzer
Patten
Pennisi
Perelson
Riddle
Scott
Shapiro
Shapiro
Stemmer
Stemmer
Zhang
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 16/03/1999
Field of study

Biological diversity has evolved despite the essentially infinite complexity of protein sequence space. We present a hierarchical approach to the efficient searching of this space and quantify the evolutionary potential of our approach with Monte Carlo simulations. These simulations demonstrate that non-homologous juxtaposition of encoded structure is the rate-limiting step in the production of new tertiary protein folds. Non-homologous ``swapping'' of low energy secondary structures increased the binding constant of a simulated protein by

\approx10^7

relative to base substitution alone. Applications of our approach include the generation of new protein folds and modeling the molecular evolution of disease.Comment: 15 pages. 2 figures. LaTeX styl

arXiv.org e-Print Archive

Crossref

Caltech Authors

Safe Functional Inference for Uncharacterized Viral Proteins

Author: Michal Linial
Yaniv Loewenstein
Publication venue
Publication date: 14/08/2008
Field of study

The explosive growth in the number of sequenced genomes has created a flood of protein sequences with unknown structure and function. A routine protocol for functional inference on an input query sequence is based on a database search for homologues. Searching a query against a non-redundant database using BLAST (or more advanced methods, e.g. PSI-BLAST) suffers from several drawbacks: (i) a local alignment often dominates the results; (ii) the reported statistical score (i.e. E-value) is often misleading; (iii) incorrect annotations may be falsely propagated. 
Several systematic methods are commonly used to assign sequences with functions on a genomic scale. In Pfam (1) and resources alike, statistical profiles (HMMs) are built from semi-manual multiple alignments of seed homologous sequences. The profiles are then used to scan genomic sequences for additional family members. The drawbacks of this scheme are: (i) only families with a predetermined seed are considered; (ii) the query must have a detectable sequence similarity to seed sequences; (iii) attention to internal relationships among the family members or the relations to other families is lacking; (iv) family membership is often set by pre-determined thresholds.
An alternative to profile or model based methods for functional inference relies on a hierarchical clustering of the protein space, as implemented in the ProtoNet approach (2). The fundamental principle is the creation of a tree that captures evolutionary relatedness among protein families. The tree construction is fully automatic, and is based only on reported BLAST similarities among clustered sequences. The tree provides protein groupings in continuous evolutionary granularities, from closely related to distant superfamilies. Clusters in the ProtoNet tree show high correspondence with homologous sequence (i.e. Pfam and InterPro), functional (i.e. E.C. classification) and structural (i.e., SCOP) families (3). A new clustering scheme (4) has provided an extensive update to the ProtoNet process, which is now based on direct clustering of all detectable sequence similarities. 
Herein, we use the ProtoNet resource to develop a methodology for a consistent and safe functional inference for remote families. We illustrate the success of our approach towards clusters of poorly characterized viral proteins. Viral sequences are characterized by a rapid evolutionary rate which drives viral families to be even more remote (sequence-similarity-wise). Thus, functional inference for viral families is apparently an unsolved task. Despite this inherent difficulty, the new ProtoNet tree scaffold reliably captures weak evolutionary connections for viral families, which were previously overlooked. We take advantage of this, and propose new functional assignments for viral protein families.&#xa

Crossref

Nature Precedings

Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

Author: Bastien Olivier
Birkholtz Lyn-Marie
Breton Vincent
Grando Delphine
Hofmann-Apitius Martin
Jacq Nicolas
Joubert Fourie
Kasam Vinod
Louw Abraham I
Maréchal Eric
Ortet Philippe
Roy Sylvaine
Saïdani Nadia
Wells Gordon
Zimmermann Marc
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

Hal - Université Grenoble Alpes

HAL AMU

Fraunhofer-ePrints

HAL Clermont Université

HAL Descartes

HAL-CEA

ProdInra

arXiv.org e-Print Archive

HAL-IN2P3

Springer - Publisher Connector

PubMed Central

UPSpace at the University of Pretoria

Degree Landscapes in Scale-Free Networks

Author: Ala Trusina
Jacob Bock Axelsen
Kim Sneppen
L. Gao
Martin Rosvall
R. Albert
Sebastian Bernhardsson
V. Batagelj
Publication venue: 'American Physical Society (APS)'
Publication date: 08/12/2005
Field of study

We generalize the degree-organizational view of real-world networks with broad degree-distributions in a landscape analogue with mountains (high-degree nodes) and valleys (low-degree nodes). For example, correlated degrees between adjacent nodes corresponds to smooth landscapes (social networks), hierarchical networks to one-mountain landscapes (the Internet), and degree-disassortative networks without hierarchical features to rough landscapes with several mountains. We also generate ridge landscapes to model networks organized under constraints imposed by the space the networks are embedded in, associated to spatial or, in molecular networks, to functional localization. To quantify the topology, we here measure the widths of the mountains and the separation between different mountains.Comment: 4 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

CERN Document Server