57 research outputs found

    Improved mutation tagging with gene identifiers applied to membrane protein stability prediction

    Get PDF
    Background The automated retrieval and integration of information about protein point mutations in combination with structure, domain and interaction data from literature and databases promises to be a valuable approach to study structure-function relationships in biomedical data sets. Results We developed a rule- and regular expression-based protein point mutation retrieval pipeline for PubMed abstracts, which shows an F-measure of 87% for the mutation retrieval task on a benchmark dataset. In order to link mutations to their proteins, we utilize a named entity recognition algorithm for the identification of gene names co-occurring in the abstract, and establish links based on sequence checks. Vice versa, we could show that gene recognition improved from 77% to 91% F-measure when considering mutation information given in the text. To demonstrate practical relevance, we utilize mutation information from text to evaluate a novel solvation energy based model for the prediction of stabilizing regions in membrane proteins. For five G protein-coupled receptors we identified 35 relevant single mutations and associated phenotypes, of which none had been annotated in the UniProt or PDB database. In 71% reported phenotypes were in compliance with the model predictions, supporting a relation between mutations and stability issues in membrane proteins. Conclusion We present a reliable approach for the retrieval of protein mutations from PubMed abstracts for any set of genes or proteins of interest. We further demonstrate how amino acid substitution information from text can be utilized for protein structure stability studies on the basis of a novel energy model

    PHI-base: a new database for pathogen host interactions

    Get PDF
    To utilize effectively the growing number of verified genes that mediate an organism's ability to cause disease and/or to trigger host responses, we have developed PHI-base. This is a web-accessible database that currently catalogs 405 experimentally verified pathogenicity, virulence and effector genes from 54 fungal and Oomycete pathogens, of which 176 are from animal pathogens, 227 from plant pathogens and 3 from pathogens with a fungal host. PHI-base is the first on-line resource devoted to the identification and presentation of information on fungal and Oomycete pathogenicity genes and their host interactions. As such, PHI-base is a valuable resource for the discovery of candidate targets in medically and agronomically important fungal and Oomycete pathogens for intervention with synthetic chemistries and natural products. Each entry in PHI-base is curated by domain experts and supported by strong experimental evidence (gene/transcript disruption experiments) as well as literature references in which the experiments are described. Each gene in PHI-base is presented with its nucleotide and deduced amino acid sequence as well as a detailed description of the predicted protein's function during the host infection process. To facilitate data interoperability, we have annotated genes using controlled vocabularies (Gene Ontology terms, Enzyme Commission Numbers and so on), and provide links to other external data sources (e.g. NCBI taxonomy and EMBL). We welcome new data for inclusion in PHI-base, which is freely accessed at

    MeMotif: a database of linear motifs in α-helical transmembrane proteins

    Get PDF
    Membrane proteins are important for many processes in the cell and used as main drug targets. The increasing number of high-resolution structures available makes for the first time a characterization of local structural and functional motifs in α-helical transmembrane proteins possible. MeMotif (http://projects.biotec.tu-dresden.de/memotif) is a database and wiki which collects more than 2000 known and novel computationally predicted linear motifs in α-helical transmembrane proteins. Motifs are fully described in terms of several structural and functional features and editable. Motifs contained in MeMotif can be used in different biological applications, from the identification of biochemically important functional residues which are candidates for mutagenesis experiments to the improvement of tools for transmembrane protein modeling

    SuperCYP: a comprehensive database on Cytochrome P450 enzymes including a tool for analysis of CYP-drug interactions

    Get PDF
    Much of the information on the Cytochrome P450 enzymes (CYPs) is spread across literature and the internet. Aggregating knowledge about CYPs into one database makes the search more efficient. Text mining on 57 CYPs and drugs led to a mass of papers, which were screened manually for facts about metabolism, SNPs and their effects on drug degradation. Information was put into a database, which enables the user not only to look up a particular CYP and all metabolized drugs, but also to check tolerability of drug-cocktails and to find alternative combinations, to use metabolic pathways more efficiently. The SuperCYP database contains 1170 drugs with more than 3800 interactions including references. Approximately 2000 SNPs and mutations are listed and ordered according to their effect on expression and/or activity. SuperCYP (http://bioinformatics.charite.de/supercyp) is a comprehensive resource focused on CYPs and drug metabolism. Homology-modeled structures of the CYPs can be downloaded in PDB format and related drugs are available as MOL-files. Within the resource, CYPs can be aligned with each other, drug-cocktails can be ‘mixed’, SNPs, protein point mutations, and their effects can be viewed and corresponding PubMed IDs are given. SuperCYP is meant to be a platform and a starting point for scientists and health professionals for furthering their research

    Open Access

    No full text
    A framework for assessing the consistency of drug classes across source

    Improved mutation tagging with gene identifiers applied to membrane protein stability prediction

    Get PDF
    Background The automated retrieval and integration of information about protein point mutations in combination with structure, domain and interaction data from literature and databases promises to be a valuable approach to study structure-function relationships in biomedical data sets. Results We developed a rule- and regular expression-based protein point mutation retrieval pipeline for PubMed abstracts, which shows an F-measure of 87% for the mutation retrieval task on a benchmark dataset. In order to link mutations to their proteins, we utilize a named entity recognition algorithm for the identification of gene names co-occurring in the abstract, and establish links based on sequence checks. Vice versa, we could show that gene recognition improved from 77% to 91% F-measure when considering mutation information given in the text. To demonstrate practical relevance, we utilize mutation information from text to evaluate a novel solvation energy based model for the prediction of stabilizing regions in membrane proteins. For five G protein-coupled receptors we identified 35 relevant single mutations and associated phenotypes, of which none had been annotated in the UniProt or PDB database. In 71% reported phenotypes were in compliance with the model predictions, supporting a relation between mutations and stability issues in membrane proteins. Conclusion We present a reliable approach for the retrieval of protein mutations from PubMed abstracts for any set of genes or proteins of interest. We further demonstrate how amino acid substitution information from text can be utilized for protein structure stability studies on the basis of a novel energy model

    Data from: U-Index, a dataset and an impact metric for informatics tools and databases

    No full text
    Measuring the usage of informatics resources such as software tools and databases is essential to quantifying their impact, value and return on investment. We have developed a publicly available dataset of informatics resource publications and their citation network, along with an associated metric (u-Index) to measure informatics resources’ impact over time. Our dataset differentiates the context in which citations occur to distinguish between ‘awareness’ and ‘usage’, and uses a citing universe of open access publications to derive citation counts for quantifying impact. Resources with a high ratio of usage citations to awareness citations are likely to be widely used by others and have a high u-Index score. We have pre-calculated the u-Index for nearly 100,000 informatics resources. We demonstrate how the u-Index can be used to track informatics resource impact over time. The method of calculating the u-Index metric, the pre-computed u-Index values, and the dataset we compiled to calculate the u-Index are publicly available
    corecore