171 research outputs found

    Last rolls of the yoyo: Assessing the human canonical protein count

    Get PDF
    In 2004, when the protein estimate from the finished human genome was only 24,000, the surprise was compounded as reviewed estimates fell to 19,000 by 2014. However, variability in the total canonical protein counts (i.e. excluding alternative splice forms) of open reading frames (ORFs) in different annotation portals persists. This work assesses these differences and possible causes. A 16-year analysis of Ensembl and UniProtKB/Swiss-Prot shows convergence to a protein number of ~20,000. The former had shown some yo-yoing, but both have now plateaued. Nine major annotation portals, reviewed at the beginning of 2017, gave a spread of counts from 21,819 down to 18,891. The 4-way cross-reference concordance (within UniProt) between Ensembl, Swiss-Prot, Entrez Gene and the Human Gene Nomenclature Committee (HGNC) drops to 18,690, indicating methodological differences in protein definitions and experimental existence support between sources. The Swiss-Prot and neXtProt evidence criteria include mass spectrometry peptide verification and also cross-references for antibody detection from the Human Protein Atlas. Notwithstanding, hundreds of Swiss-Prot entries are classified as non-coding biotypes by HGNC. The only inference that protein numbers might still rise comes from numerous reports of small ORF (smORF) discovery. However, while there have been recent cases of protein verifications from previous miss-annotation of non-coding RNA, very few have passed the Swiss-Prot curation and genome annotation thresholds. The post-genomic era has seen both advances in data generation and improvements in the human reference assembly. Notwithstanding, current numbers, while persistently discordant, show that the earlier yo-yoing has largely ceased. Given the importance to biology and biomedicine of defining the canonical human proteome, the task will need more collaborative inter-source curation combined with broader and deeper experimental confirmation in vivo and in vitro of proteins predicted in silico. The eventual closure could be well be below ~19,000

    Retrieving GPCR data from public databases

    Get PDF

    Expanding opportunities for mining bioactive chemistry from patents

    Get PDF
    Bioactive structures published in medicinal chemistry patents typically exceed those in papers by at least twofold and may precede them by several years. The Big-Bang of open automated extraction since 2012 has contributed to over 15 million patent-derived compounds in PubChem. While mapping between chemical structures, assay results and protein targets from patent documents is challenging, these relationships can be harvested using open tools and are beginning to be curated into databases

    Challenges of connecting chemistry to pharmacology: perspectives from curating the IUPHAR/BPS Guide to PHARMACOLOGY

    Get PDF
    Connecting chemistry to pharmacology (c2p) has been an objective of GtoPdb and its precursor IUPHAR-DB since 2003. This has been achieved by populating our database with expert-curated relationships between documents, assays, quantitative results, chemical structures, their locations within the documents and the protein targets in the assays (D-A-R-C-P). A wide range of challenges associated with this are described in this perspective, using illustrative examples from GtoPdb entries. Our selection process begins with judgements of pharmacological relevance and scientific quality. Even though we have a stringent focus for our small-data extraction we note that assessing the quality of papers has become more difficult over the last 15 years. We discuss ambiguity issues with the resolution of authors’ descriptions of A-R-C-P entities to standardised identifiers. We also describe developments that have made this somewhat easier over the same period both in the publication ecosystem as well as enhancements of our internal processes over recent years. This perspective concludes with a look at challenges for the future including the wider capture of mechanistic nuances and possible impacts of text mining on automated entity extractio

    Hydrolases in GtoPdb v.2023.1

    Get PDF
    Listed in this section are hydrolases not accumulated in other parts of the Concise Guide, such as monoacylglycerol lipase and acetylcholinesterase. Pancreatic lipase is the predominant mechanism of fat digestion in the alimentary system; its inhibition is associated with decreased fat absorption. CES1 is present at lower levels in the gut than CES2 (P23141), but predominates in the liver, where it is responsible for the hydrolysis of many aliphatic, aromatic and steroid esters. Hormone-sensitive lipase is also a relatively non-selective esterase associated with steroid ester hydrolysis and triglyceride metabolism, particularly in adipose tissue. Endothelial lipase is secreted from endothelial cells and regulates circulating cholesterol in high density lipoproteins

    Hydrolases (version 2019.4) in the IUPHAR/BPS Guide to Pharmacology Database

    Get PDF
    Listed in this section are hydrolases not accumulated in other parts of the Concise Guide, such as monoacylglycerol lipase and acetylcholinesterase. Pancreatic lipase is the predominant mechanism of fat digestion in the alimentary system; its inhibition is associated with decreased fat absorption. CES1 is present at lower levels in the gut than CES2 (P23141), but predominates in the liver, where it is responsible for the hydrolysis of many aliphatic, aromatic and steroid esters. Hormone-sensitive lipase is also a relatively non-selective esterase associated with steroid ester hydrolysis and triglyceride metabolism, particularly in adipose tissue. Endothelial lipase is secreted from endothelial cells and regulates circulating cholesterol in high density lipoproteins

    Small-molecule Bioactivity Databases

    Get PDF

    Amino acid sequence of β-galactoside-binding bovine heart lectin Member of a novel class of vertebrate proteins

    Get PDF
    AbstractA variety of animal tissues contain β-galactoside-binding lectins with molecular masses in the range 13–17 kDa. There is evidence that these lectins may constitute a new protein family although their function in vivo is not yet clear. In this work the major part of the amino acid sequence of the 13 kDa lectin from bovine heart muscle has been determined. Comparison of this sequence with the cDNA-deduced sequence published for the chick embryo skin lectin showed 58% homology. Comparison of the bovine lectin sequence with partial sequences from two cDNA clones from a human hepatoma library and partial amino acid sequences of human lung lectin showed 70, 40 and 85% homology, respectively. The sequences of these vertebrate lectins are thus clearly related, supporting earlier results of immunological cross-reactivity within this group of proteins. Computer searching of protein sequence databases did not detect significant homologies between the bovine lectin sequence and other known proteins
    • …
    corecore