Search CORE

1,797 research outputs found

PROSITE, a protein domain database for functional characterization and annotation

Author: Altschul
Amos Bairoch
Christian J. A. Sigrist
Clamp
de Castro
Dowell
Edouard de Castro
Finn
Finn
HMMER
Hulo
Hunter
Jimenez
Koua
Lorenzo Cerutti
Nicolas Hulo
Petra S. Langendijk-Genevaux
Prlić
Sigrist
Sigrist
Stockholm format
Virginie Bulliard
Vital-IT
Waterhouse
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 (∼70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanProsite to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/prosite/

Crossref

PubMed Central

Archive ouverte UNIGE

ProRule: a new database containing functional and structural information on PROSITE profiles

Author: Bairoch Amos
De Castro Edouard
Hulo Nicolas
Langendijk-Genevaux Petra S.
Le Saux Virginie
Sigrist Christian J. A.
Publication venue
Publication date: 02/08/2017
Field of study

Motivation: Increase the discriminatory power of PROSITE profiles to facilitate function determination and provide biologically relevant information about domains detected by profiles for the annotation of proteins. Summary: We have created a new database, ProRule, which contains additional information about PROSITE profiles. ProRule contains notably the position of structurally and/or functionally critical amino acids, as well as the condition they must fulfill to play their biological role. These supplementary data should help function determination and annotation of the UniProt Swiss-Prot knowledgebase. ProRule also contains information about the domain detected by the profile in the Swiss-Prot line format. Hence, ProRule can be used to make Swiss-Prot annotation more homogeneous and consistent. The format of ProRule can be extended to provide information about combination of domains. Availability: ProRule can be accessed through ScanProsite at http://www.expasy.org/tools/scanprosite. A file containing the rules will be made available under the PROSITE copyright conditions on our ftp site (ftp://www.expasy.org/databases/prosite/) by the next PROSITE release. Contact: [email protected]

RERO DOC Digital Library

Predicting active site residue annotations in the Pfam database

Author: A Ben-Shimon
A Gutteridge
AH Elcock
AH Liu
Alex Bateman
AR Panchenko
BM Beadle
CG Nevill-Manning
CH Wu
CT Porter
D La
EL Sonnhammer
H Yao
H Yao
I Letunic
Jaina Mistry
KC Chou
KM Mayer
M Ota
MJ Zvelebil
N Hulo
ND Rawlings
NJ Mulder
NV Petrova
O Lichtarge
P Aloy
P Puntervoll
PD Dobson
R Greaves
RD Finn
RD Finn
Robert D Finn
S Velankar
SR Eddy
W Tian
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Approximately 5% of Pfam families are enzymatic, but only a small fraction of the sequences within these families (<0.5%) have had the residues responsible for catalysis determined. To increase the active site annotations in the Pfam database, we have developed a strict set of rules, chosen to reduce the rate of false positives, which enable the transfer of experimentally determined active site residue data to other sequences within the same Pfam family. Description We have created a large database of predicted active site residues. On comparing our active site predictions to those found in UniProtKB, Catalytic Site Atlas, PROSITE and <it>MEROPS </it>we find that we make many novel predictions. On investigating the small subset of predictions made by these databases that are not predicted by us, we found these sequences did not meet our strict criteria for prediction. We assessed the sensitivity and specificity of our methodology and estimate that only 3% of our predicted sequences are false positives. Conclusion We have predicted 606110 active site residues, of which 94% are not found in UniProtKB, and have increased the active site annotations in Pfam by more than 200 fold. Although implemented for Pfam, the tool we have developed for transferring the data can be applied to any alignment with associated experimental active site data and is available for download. Our active site predictions are re-calculated at each Pfam release to ensure they are comprehensive and up to date. They provide one of the largest available databases of active site annotation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Bioinformatics as a Tool for Assessing the Quality of Sub-Cellular Proteomic Strategies and Inferring Functions of Proteins: Plant Cell Wall Proteomics as a Test Case

Author: Clemente Hélène San
Jamet Elisabeth
Pont-Lezica Rafael
Publication venue: Libertas Academica
Publication date: 01/01/2009
Field of study

Bioinformatics is used at three different steps of proteomic studies of sub-cellular compartments. First one is protein identification from mass spectrometry data. Second one is prediction of sub-cellular localization, and third one is the search of functional domains to predict the function of identified proteins in order to answer biological questions. The aim of the work was to get a new tool for improving the quality of proteomics of sub-cellular compartments. Starting from the analysis of problems found in databases, we designed a new Arabidopsis database named ProtAnnDB (http://www.polebio.scsv.ups-tlse.fr/ProtAnnDB/). It collects in one page predictions of sub-cellular localization and of functional domains made by available software. Using this database allows not only improvement of interpretation of proteomic data (top-down analysis), but also of procedures to isolate sub-cellular compartments (bottom-up quality control)

Directory of Open Access Journals

PubMed Central

NRProF: Neural response based protein function prediction algorithm

Author: Wang J
Xiao QW
Yalamanchili HK
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

A large amount of proteomic data is being generated due to the advancements in high-throughput genome sequencing. But the rate of functional annotation of these sequences falls far behind. To fill the gap between the number of sequences and their annotations, fast and accurate automated annotation methods are required. Many methods, such as GOblet, GOfigure, and Gotcha, are designed based on the BLAST search. Unfortunately, the sequence coverage of these methods is low as they cannot detect the remote homologues. The lack of annotation coverage of the existing methods advocates novel methods to improve protein function prediction. Here we present a automated protein functional assignment method based on the neural response algorithm, which simulates the neuronal behavior of the visual cortex in the human brain. The main idea of this algorithm is to define a distance metric that corresponds to the similarity of the subsequences and reflects how the human brain can distinguish different sequences. Given query protein, we predict the most similar target protein using a two layered neural response algorithm and thereby assigned the GO term of the target protein to the query. Our method predicted and ranked the actual leaf GO term among the top 5 probable GO terms with 87.66% accuracy. Results of the 5-fold cross validation and the comparison with PFP and FFPred servers indicate the prominent performance by our method. The NRProF program, the dataset, and help files are available at http://www.jjwanglab.org/NRProF/. © 2011 IEEE.published_or_final_versionThe 2011 IEEE International Conference on Systems Biology (ISB), Zhuhai, China, 2-4 September 2011. In Conference Proceedings, 2011, p. 33-4

HKU Scholars Hub

MACSIMS : multiple alignment of complete sequences information management system

BACKGROUND: In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. RESULTS: MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. CONCLUSION: MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Dundee Online Publications

Molecular mechanisms of the non-coenzyme action of thiamin in brain. Biochemical, structural and pathway analysis

Author: Andrey Vovk
Contestabile Roberto
DI SALVO Martino Luigi
Garik Mkrtchyan
Lucien Bettendorff
Parroni Alessia
Thilo Kaehne
Vasily Aleshin
Victoria Bunik
Yulia Parkhomenko
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Thiamin (vitamin B1) is a pharmacological agent boosting central metabolism through the action of the coenzyme thiamin diphosphate (ThDP). However, positive effects, including improved cognition, of high thiamin doses in neurodegeneration may be observed without increased ThDP or ThDPdependent enzymes in brain. Here, we determine protein partners and metabolic pathways where thiamin acts beyond its coenzyme role. Malate dehydrogenase, glutamate dehydrogenase and pyridoxal kinase were identified as abundant proteins binding to thiamin- or thiazolium-modified sorbents. Kinetic studies, supported by structural analysis, revealed allosteric regulation of these proteins by thiamin and/or its derivatives. Thiamin triphosphate and adenylated thiamin triphosphate activate glutamate dehydrogenase. Thiamin and ThDP regulate malate dehydrogenase isoforms and pyridoxal kinase. Thiamin regulation of enzymes related to malate-aspartate shuttle may impact on malate/citrate exchange, responsible for exporting acetyl residues from mitochondria. Indeed, bioinformatic analyses found an association between thiamin- and thiazolium-binding proteins and the term acetylation. Our interdisciplinary study shows that thiamin is not only a coenzyme for acetyl-CoA production, but also an allosteric regulator of acetyl-CoA metabolism including regulatory acetylation of proteins and acetylcholine biosynthesis. Moreover, thiamin action in neurodegeneration may also involve neurodegeneration-related 14-3-3, DJ-1 and β-amyloid precursor proteins identified among the thiamin- and/or thiazolium-binding proteins

Archivio della ricerca- Università di Roma La Sapienza

Recommended from our members

Protein sequences insight into heavy metal tolerance in Cronobacter sakazakii BAA-894 encoded by plasmid pESA3

Author: Chaturvedi N
Forsythe S
Kajsik M
Pandey PN
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/09/2015
Field of study

The recently annotated genome of the bacterium Cronobacter sakazakii BAA-894 suggests the organism has the ability to bind heavy metals. This study demonstrates heavy metal tolerance in Cronobacter sakazakii, in which proteins with the heavy metal interaction were recognized by computational and experimental study. As the result, approximately one fourth of proteins encoded on the plasmid pESA3 are proposed to have potential interaction with heavy metals. Interaction between heavy metals and predicted proteins was further corroborated using protein crystal structures from protein data bank database and comparison of metal-binding ligands. In addition with, a phylogenetic study was undertaken for most toxic heavy metals, like arsenic, cadmium, lead and mercury and obtained related tree pattern for lead, cadmium and arsenic. Laboratory studies confirmed the organism's tolerance to tellurite, copper and silver. These experimental and computational study data extend our understanding of the genes encoding for proteins of this important neonatal pathogen and provides further insights into the genotypes associated with features that can contribute to its persistence in the environment. The information will be of value for future environmental protection from heavy toxic metals

Nottingham Trent Institutional Repository (IRep)

Plant protein-coding gene families: emerging bioinformatics approaches

Author: Altschul
Andreeva
Attwood
Beers
Benson
Bru
Cambra
Carretero-Paulet
Chain
Chen
Cochrane
Cuff
de Lima Morais
Del Bem
Enright
Faro
Feng
Finn
Fraser
Frech
Garcia-Lorenzo
Guilfoyle
Guindon
Haft
Hunter
Kaminuma
Kersey
Klimke
Kolodziejczyk
Kotsyfakis
Lees
Leinonen
Letunic
Li
Li
Lijavetzky
Lima
Liolios
Lu
Manuel Martinez
Marchler-Bauer
Martinez
Martinez
Martinez
Mi
Moreno-Risueno
Mugford
Nikolskaya
Nissen
Paterson
Pearson
Perez-Rodriguez
Philippe
Plett
Proost
Pruitt
Rautengarten
Rawlings
Remington
Roberts
Rouard
Sigrist
Singh
Swaminathan
Takahashi
Tatusov
Tian
Tyler
UniProt_Consortium
Van de Peer
Vercammen
Wang
Yu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Protein-coding gene families are sets of similar genes with a shared evolutionary origin and, generally, with similar biological functions. In plants, the size and role of gene families has been only partially addressed. However, suitable bioinformatics tools are being developed to cluster the enormous number of sequences currently available in databases. Specifically, comparative genomic databases promise to become powerful tools for gene family annotation in plant clades. In this review, I evaluate the data retrieved from various gene family databases, the ease with which they can be extracted and how useful the extracted information is

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

FragKB: Structural and Literature Annotation Resource of Conserved Peptide Fragments and Residues

Author: Alfonso Valencia
Ashish V. Tendulkar
Fabio Rapallo
Gonzalo López
Martin Krallinger
Pramod P. Wangikar
Victor de la Torre
Publication venue: Public Library of Science
Publication date: 18/03/2010
Field of study

BACKGROUND: FragKB (Fragment Knowledgebase) is a repository of clusters of structurally similar fragments from proteins. Fragments are annotated with information at the level of sequence, structure and function, integrating biological descriptions derived from multiple existing resources and text mining. METHODOLOGY: FragKB contains approximately 400,000 conserved fragments from 4,800 representative proteins from PDB. Literature annotations are extracted from more than 1,700 articles and are available for over 12,000 fragments. The underlying systematic annotation workflow of FragKB ensures efficient update and maintenance of this database. The information in FragKB can be accessed through a web interface that facilitates sequence and structural visualization of fragments together with known literature information on the consequences of specific residue mutations and functional annotations of proteins and fragment clusters. FragKB is accessible online at http://ubio.bioinfo.cnio.es/biotools/fragkb/. SIGNIFICANCE: The information presented in FragKB can be used for modeling protein structures, for designing novel proteins and for functional characterization of related fragments. The current release is focused on functional characterization of proteins through inspection of conservation of the fragments

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central