44 research outputs found

    Annotated chemical patent corpus: A gold standard for text mining

    Get PDF
    Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, t

    Mucopolysaccharidosis VI

    Get PDF
    Mucopolysaccharidosis VI (MPS VI) is a lysosomal storage disease with progressive multisystem involvement, associated with a deficiency of arylsulfatase B leading to the accumulation of dermatan sulfate. Birth prevalence is between 1 in 43,261 and 1 in 1,505,160 live births. The disorder shows a wide spectrum of symptoms from slowly to rapidly progressing forms. The characteristic skeletal dysplasia includes short stature, dysostosis multiplex and degenerative joint disease. Rapidly progressing forms may have onset from birth, elevated urinary glycosaminoglycans (generally >100 μg/mg creatinine), severe dysostosis multiplex, short stature, and death before the 2nd or 3rd decades. A more slowly progressing form has been described as having later onset, mildly elevated glycosaminoglycans (generally <100 μg/mg creatinine), mild dysostosis multiplex, with death in the 4th or 5th decades. Other clinical findings may include cardiac valve disease, reduced pulmonary function, hepatosplenomegaly, sinusitis, otitis media, hearing loss, sleep apnea, corneal clouding, carpal tunnel disease, and inguinal or umbilical hernia. Although intellectual deficit is generally absent in MPS VI, central nervous system findings may include cervical cord compression caused by cervical spinal instability, meningeal thickening and/or bony stenosis, communicating hydrocephalus, optic nerve atrophy and blindness. The disorder is transmitted in an autosomal recessive manner and is caused by mutations in the ARSB gene, located in chromosome 5 (5q13-5q14). Over 130 ARSB mutations have been reported, causing absent or reduced arylsulfatase B (N-acetylgalactosamine 4-sulfatase) activity and interrupted dermatan sulfate and chondroitin sulfate degradation. Diagnosis generally requires evidence of clinical phenotype, arylsulfatase B enzyme activity <10% of the lower limit of normal in cultured fibroblasts or isolated leukocytes, and demonstration of a normal activity of a different sulfatase enzyme (to exclude multiple sulfatase deficiency). The finding of elevated urinary dermatan sulfate with the absence of heparan sulfate is supportive. In addition to multiple sulfatase deficiency, the differential diagnosis should also include other forms of MPS (MPS I, II IVA, VII), sialidosis and mucolipidosis. Before enzyme replacement therapy (ERT) with galsulfase (Naglazyme®), clinical management was limited to supportive care and hematopoietic stem cell transplantation. Galsulfase is now widely available and is a specific therapy providing improved endurance with an acceptable safety profile. Prognosis is variable depending on the age of onset, rate of disease progression, age at initiation of ERT and on the quality of the medical care provided

    Combined PI3K and CDK2 inhibition induces cell death and enhances in vivo antitumour activity in colorectal cancer

    Get PDF
    Background: The phosphatidylinositol-3-kinase/mammalian target of rapamycin (PI3K/mTOR) pathway is commonly deregulated in human cancer, hence many PI3K and mTOR inhibitors have been developed and have now reached clinical trials. Similarly, CDKs have been investigated as cancer drug targets. Methods: We have synthesised and characterised a series of 6-aminopyrimidines identified from a kinase screen that inhibit PI3K and/or mTOR and/or CDK2. Kinase inhibition, tumour cell growth, cell cycle distribution, cytotoxicity and signalling experiments were undertaken in HCT116 and HT29 colorectal cancer cell lines, and in vivo HT29 efficacy studies. Results: 2,6-Diaminopyrimidines with an O4-cyclohexylmethyl substituent and a C-5-nitroso or cyano group (1,2,5) induced cell cycle phase alterations and were growth inhibitory (GI50<20 μM). Compound 1, but not 2 or 5, potently inhibits CDK2 (IC50=0.1 nM) as well as PI3K, and was cytotoxic at growth inhibitory concentrations. Consistent with kinase inhibition data, compound 1 reduced phospho-Rb and phospho-rS6 at GI50 concentrations. Combination of NU6102 (CDK2 inhibitor) and pictilisib (GDC-0941; pan-PI3K inhibitor) resulted in synergistic growth inhibition, and enhanced cytotoxicity in HT29 cells in vitro and HT29 tumour growth inhibition in vivo. Conclusions: These studies identified a novel series of mixed CDK2/PI3K inhibitors and demonstrate that dual targeting of CDK2 and PI3K can result in enhanced antitumour activity

    The CHEMDNER corpus of chemicals and drugs and its annotation principles

    Get PDF
    The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus

    Get Your Atoms in Orderî—¸An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm

    No full text
    Finding a canonical ordering of the atoms in a molecule is a prerequisite for generating a unique representation of the molecule. The canonicalization of a molecule is usually accomplished by applying some sort of graph relaxation algorithm, the most common of which is the Morgan algorithm. There are known issues with that algorithm that lead to noncanonical atom orderings as well as problems when it is applied to large molecules like proteins. Furthermore, each cheminformatics toolkit or software provides its own version of a canonical ordering, most based on unpublished algorithms, which also complicates the generation of a universal unique identifier for molecules. We present an alternative canonicalization approach that uses a standard stable-sorting algorithm instead of a Morgan-like index. Two new invariants that allow canonical ordering of molecules with dependent chirality as well as those with highly symmetrical cyclic graphs have been developed. The new approach proved to be robust and fast when tested on the 1.45 million compounds of the ChEMBL 20 data set in different scenarios like random renumbering of input atoms or SMILES round tripping. Our new algorithm is able to generate a canonical order of the atoms of protein molecules within a few milliseconds. The novel algorithm is implemented in the open-source cheminformatics toolkit RDKit. With this paper, we provide a reference Python implementation of the algorithm that could easily be integrated in any cheminformatics toolkit. This provides a first step toward a common standard for canonical atom ordering to generate a universal unique identifier for molecules other than InChI

    Get Your Atoms in Orderî—¸An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm

    No full text
    Finding a canonical ordering of the atoms in a molecule is a prerequisite for generating a unique representation of the molecule. The canonicalization of a molecule is usually accomplished by applying some sort of graph relaxation algorithm, the most common of which is the Morgan algorithm. There are known issues with that algorithm that lead to noncanonical atom orderings as well as problems when it is applied to large molecules like proteins. Furthermore, each cheminformatics toolkit or software provides its own version of a canonical ordering, most based on unpublished algorithms, which also complicates the generation of a universal unique identifier for molecules. We present an alternative canonicalization approach that uses a standard stable-sorting algorithm instead of a Morgan-like index. Two new invariants that allow canonical ordering of molecules with dependent chirality as well as those with highly symmetrical cyclic graphs have been developed. The new approach proved to be robust and fast when tested on the 1.45 million compounds of the ChEMBL 20 data set in different scenarios like random renumbering of input atoms or SMILES round tripping. Our new algorithm is able to generate a canonical order of the atoms of protein molecules within a few milliseconds. The novel algorithm is implemented in the open-source cheminformatics toolkit RDKit. With this paper, we provide a reference Python implementation of the algorithm that could easily be integrated in any cheminformatics toolkit. This provides a first step toward a common standard for canonical atom ordering to generate a universal unique identifier for molecules other than InChI
    corecore