1,797 research outputs found

    PROSITE, a protein domain database for functional characterization and annotation

    Get PDF
    PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 (∼70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanProsite to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http://www.expasy.org/prosite/

    ProRule: a new database containing functional and structural information on PROSITE profiles

    Get PDF
    Motivation: Increase the discriminatory power of PROSITE profiles to facilitate function determination and provide biologically relevant information about domains detected by profiles for the annotation of proteins. Summary: We have created a new database, ProRule, which contains additional information about PROSITE profiles. ProRule contains notably the position of structurally and/or functionally critical amino acids, as well as the condition they must fulfill to play their biological role. These supplementary data should help function determination and annotation of the UniProt Swiss-Prot knowledgebase. ProRule also contains information about the domain detected by the profile in the Swiss-Prot line format. Hence, ProRule can be used to make Swiss-Prot annotation more homogeneous and consistent. The format of ProRule can be extended to provide information about combination of domains. Availability: ProRule can be accessed through ScanProsite at http://www.expasy.org/tools/scanprosite. A file containing the rules will be made available under the PROSITE copyright conditions on our ftp site (ftp://www.expasy.org/databases/prosite/) by the next PROSITE release. Contact: [email protected]

    Predicting active site residue annotations in the Pfam database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Approximately 5% of Pfam families are enzymatic, but only a small fraction of the sequences within these families (<0.5%) have had the residues responsible for catalysis determined. To increase the active site annotations in the Pfam database, we have developed a strict set of rules, chosen to reduce the rate of false positives, which enable the transfer of experimentally determined active site residue data to other sequences within the same Pfam family.</p> <p>Description</p> <p>We have created a large database of predicted active site residues. On comparing our active site predictions to those found in UniProtKB, Catalytic Site Atlas, PROSITE and <it>MEROPS </it>we find that we make many novel predictions. On investigating the small subset of predictions made by these databases that are not predicted by us, we found these sequences did not meet our strict criteria for prediction. We assessed the sensitivity and specificity of our methodology and estimate that only 3% of our predicted sequences are false positives.</p> <p>Conclusion</p> <p>We have predicted 606110 active site residues, of which 94% are not found in UniProtKB, and have increased the active site annotations in Pfam by more than 200 fold. Although implemented for Pfam, the tool we have developed for transferring the data can be applied to any alignment with associated experimental active site data and is available for download. Our active site predictions are re-calculated at each Pfam release to ensure they are comprehensive and up to date. They provide one of the largest available databases of active site annotation.</p

    Bioinformatics as a Tool for Assessing the Quality of Sub-Cellular Proteomic Strategies and Inferring Functions of Proteins: Plant Cell Wall Proteomics as a Test Case

    Get PDF
    Bioinformatics is used at three different steps of proteomic studies of sub-cellular compartments. First one is protein identification from mass spectrometry data. Second one is prediction of sub-cellular localization, and third one is the search of functional domains to predict the function of identified proteins in order to answer biological questions. The aim of the work was to get a new tool for improving the quality of proteomics of sub-cellular compartments. Starting from the analysis of problems found in databases, we designed a new Arabidopsis database named ProtAnnDB (http://www.polebio.scsv.ups-tlse.fr/ProtAnnDB/). It collects in one page predictions of sub-cellular localization and of functional domains made by available software. Using this database allows not only improvement of interpretation of proteomic data (top-down analysis), but also of procedures to isolate sub-cellular compartments (bottom-up quality control)

    NRProF: Neural response based protein function prediction algorithm

    Get PDF
    A large amount of proteomic data is being generated due to the advancements in high-throughput genome sequencing. But the rate of functional annotation of these sequences falls far behind. To fill the gap between the number of sequences and their annotations, fast and accurate automated annotation methods are required. Many methods, such as GOblet, GOfigure, and Gotcha, are designed based on the BLAST search. Unfortunately, the sequence coverage of these methods is low as they cannot detect the remote homologues. The lack of annotation coverage of the existing methods advocates novel methods to improve protein function prediction. Here we present a automated protein functional assignment method based on the neural response algorithm, which simulates the neuronal behavior of the visual cortex in the human brain. The main idea of this algorithm is to define a distance metric that corresponds to the similarity of the subsequences and reflects how the human brain can distinguish different sequences. Given query protein, we predict the most similar target protein using a two layered neural response algorithm and thereby assigned the GO term of the target protein to the query. Our method predicted and ranked the actual leaf GO term among the top 5 probable GO terms with 87.66% accuracy. Results of the 5-fold cross validation and the comparison with PFP and FFPred servers indicate the prominent performance by our method. The NRProF program, the dataset, and help files are available at http://www.jjwanglab.org/NRProF/. © 2011 IEEE.published_or_final_versionThe 2011 IEEE International Conference on Systems Biology (ISB), Zhuhai, China, 2-4 September 2011. In Conference Proceedings, 2011, p. 33-4

    MACSIMS : multiple alignment of complete sequences information management system

    Get PDF
    BACKGROUND: In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. RESULTS: MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. CONCLUSION: MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at

    Molecular mechanisms of the non-coenzyme action of thiamin in brain. Biochemical, structural and pathway analysis

    Get PDF
    Thiamin (vitamin B1) is a pharmacological agent boosting central metabolism through the action of the coenzyme thiamin diphosphate (ThDP). However, positive effects, including improved cognition, of high thiamin doses in neurodegeneration may be observed without increased ThDP or ThDPdependent enzymes in brain. Here, we determine protein partners and metabolic pathways where thiamin acts beyond its coenzyme role. Malate dehydrogenase, glutamate dehydrogenase and pyridoxal kinase were identified as abundant proteins binding to thiamin- or thiazolium-modified sorbents. Kinetic studies, supported by structural analysis, revealed allosteric regulation of these proteins by thiamin and/or its derivatives. Thiamin triphosphate and adenylated thiamin triphosphate activate glutamate dehydrogenase. Thiamin and ThDP regulate malate dehydrogenase isoforms and pyridoxal kinase. Thiamin regulation of enzymes related to malate-aspartate shuttle may impact on malate/citrate exchange, responsible for exporting acetyl residues from mitochondria. Indeed, bioinformatic analyses found an association between thiamin- and thiazolium-binding proteins and the term acetylation. Our interdisciplinary study shows that thiamin is not only a coenzyme for acetyl-CoA production, but also an allosteric regulator of acetyl-CoA metabolism including regulatory acetylation of proteins and acetylcholine biosynthesis. Moreover, thiamin action in neurodegeneration may also involve neurodegeneration-related 14-3-3, DJ-1 and β-amyloid precursor proteins identified among the thiamin- and/or thiazolium-binding proteins

    Plant protein-coding gene families: emerging bioinformatics approaches

    Get PDF
    Protein-coding gene families are sets of similar genes with a shared evolutionary origin and, generally, with similar biological functions. In plants, the size and role of gene families has been only partially addressed. However, suitable bioinformatics tools are being developed to cluster the enormous number of sequences currently available in databases. Specifically, comparative genomic databases promise to become powerful tools for gene family annotation in plant clades. In this review, I evaluate the data retrieved from various gene family databases, the ease with which they can be extracted and how useful the extracted information is

    FragKB: Structural and Literature Annotation Resource of Conserved Peptide Fragments and Residues

    Get PDF
    BACKGROUND: FragKB (Fragment Knowledgebase) is a repository of clusters of structurally similar fragments from proteins. Fragments are annotated with information at the level of sequence, structure and function, integrating biological descriptions derived from multiple existing resources and text mining. METHODOLOGY: FragKB contains approximately 400,000 conserved fragments from 4,800 representative proteins from PDB. Literature annotations are extracted from more than 1,700 articles and are available for over 12,000 fragments. The underlying systematic annotation workflow of FragKB ensures efficient update and maintenance of this database. The information in FragKB can be accessed through a web interface that facilitates sequence and structural visualization of fragments together with known literature information on the consequences of specific residue mutations and functional annotations of proteins and fragment clusters. FragKB is accessible online at http://ubio.bioinfo.cnio.es/biotools/fragkb/. SIGNIFICANCE: The information presented in FragKB can be used for modeling protein structures, for designing novel proteins and for functional characterization of related fragments. The current release is focused on functional characterization of proteins through inspection of conservation of the fragments
    corecore