30 research outputs found

    Del gen a la patofisiología: Nuevas enfermedades asociadas al catabolismo de los aminoácidos ramificados

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 14-06-2016Los aminoácidos de cadena lateral ramificada (BCAAs, por sus siglas en inglés - Branched- Chain Amino Acids) son elementos clave en el metabolismo y señalización celular. Su catabolismo es un proceso eminentemente mitocondrial que empieza con una transaminación seguida de una descarboxilación oxidativa catalizada por el complejo multienzimático mitocondrial BCKDH. Los defectos en el catabolismo temprano de los BCAAs resultan en una enfermedad rara de caracter neurológico y fatal desarrollo conocida como “enfermedad de la orina de jarabe de arce” o MSUD, claro ejemplo de la importancia de su correcto catabolismo en la salud. A lo largo de este trabajo hemos descrito un grupo de pacientes con nuevas enfermedades asociadas al metabolismo de los BCAAs. Estos fueron referidos a nuestro laboratorio en base a su perfil metabólico en plasma y a la sospecha clínica de una metabolopatía. Basándonos en su estratificación buscamos la causa genética responsable de la patología mediante el empleo de las diferentes tecnologías disponibles para el diagnóstico genético. En el primer caso descrito encontramos la mutación causante de patología mediante la descripción previa de una región de pérdida de heterozigosidad que abarcaba el cromosoma 4 entero. Esto dirigió nuestra atención hacia el gen PPM1K, que codifica para la fosfatasa PP2Cm, propuesta en modelos animales como la reguladora positiva de la actividad del complejo BCKDH. Confirmamos una deleción en homozigosis de dos pares de bases en dicho gen, que causaba un cambio en la fase de lectura, resultando en la aparición a 5’ de un codón de parada prematura. El subsiguiente análisis del efecto de la mutación confirmó la relación entre la pérdida de función de PP2Cm con la disminución e la actividad de BCKDH, responsable del fenotipo de MSUD descrito en el paciente. En el segundo grupo de pacientes se diagnosticó un defecto en la quinasa reguladora del complejo BCKDH, la proteína BCKDK. Estos pacientes se caracterizaban por un comportamiento de tipo autista, asociado a retraso mental y de crecimento y a una disminución en la concentración de BCAAs en plasma. El primer paso para el diagnóstico fue la determinación de dicho perfil bioquímico como la firma metabólica de la enfermedad, lo que se consiguió mediante la caracterización de la tasa de metabolismo de la [U-13C]-leucina en fibroblastos en cultivo. Posteriormente caracterizamos dos mutaciones de péridida de función en el gen BCKDK, c.520C>G y c.1166T>C, responsables del fenotipo molecular descrito. BCKDK es responsable de la fosforilación e inactivación del complejo BCKDH, con lo que un defecto en la quinasa supone la activación constitutiva del complejo. En el caso del último paciente realizamos un estudio de Secuenciación Masiva (NGS) en el que se evaluaron las regiones exónicas codificantes de más de 4.800 genes. Mediante la aplicación de una serie de filtros lógicos encontramos una mutación en el gen BCAT2, que codifica para la transaminasa mitocondrial responsable del primer paso del metabolismo de los BCAAs. En la búsqueda de dianas de acción para el tratamiento de la enfermedad dirigimos nuestra investigación hacia el estudio de la función mitocondrial tras la alteración en la regulación del metabolismo (tanto por defecto en PP2Cm como en BCKDK) por diversas razones. En primer lugar, debemos recordar que el metabolismo de los aminoácidos ramificados es un proceso mitocondrial en si mismo. En suma a esto, PP2Cm había sido descrita como una proteína esencial en la función mitocondrial en modelos animales, y se han descrito alteraciones mitocondriales en la etiopatogénesis de diversos tipos de autismo, haciendo de la mitocondria un objeto interesante de estudio. Encontramos un incremento en la producción de O2 ·- en ambas situaciones, especialmente llamativo en los fibroblastos de los pacientes BCKDK. La valoración de las tasas de consumo de oxígeno revelaron una disminución en las respiraciones basal y asociada a ATP, que se reflejaron en una reducción en la concentración de ATP celular. Mediante los análisis posteriores no pudimos ratificar la implicación de PP2Cm en el poro de transición mitocondrial mPTP, tal y como se había descrito en otros modelos, pero si confimamos una relación con la sensibilidad a la supervivencia celular, registrada como un incremento en la activación de rutas MAPK y sobreexpresión de ciertos genes proapoptóticos. Con todo esto, queda aún por determinar qué elementos del fenotipo descrito se deben a los aumentos en BCAAs y cuales a la deficiencia en PP2Cm en si misma. Al analizar la respuesta mitocondrial al estado hiper catabólico debido a la deficiencia en BCKDK observamos mitocondrias más alargadas y con forma más tubular en los fibroblastos de los pacientes, así como un aumento en la expresión de las proteínas relacionadas con la fusión mitocondrial OPA1 y Mfn2, resultados que se reprodujeron en fibroblastos control interferidos para BCKDK. Estos resultados apuntan hacia la existencia de diversos mecanismos mitocondriales para mitigar el daño causado por el decremento en las concentraciones de aminoácidos, otorgando a la mitocondria una posición relevante en el desarrollo de la patología. Nuestros resultados ensalzan la importancia en el equilibrio del metabolismo de los BCAA. La descripción de nuevos genes implicados en una ruta metabólica permiten la definición o validación de una aproximación terapéutica a la enfermedad, y avala la bísqueda de nuevas dianas de acción dirigidas contra las diferencias partes de la enfermedadBranched-Chain Amino Acids (BCAAs) are key elements in cell metabolism and signaling. Their catabolism is mainly a mitochondrial process that starts with a transamination followed by an irreversible oxidative decarboxylation, catalyzed by the mitochondrial multienzymatic complex BCKDH. Defects on the early catabolism of BCAAs result in the neurological fatal rare disease Maple Syrup Urine Disease, providing a clear insight of the importance of their correct metabolism in health. Throughout this work we have described a set of patients with novel diseases related to BCAA metabolism. The patients were referred to our lab due to their blood metabolites profile and clinical diagnosis. Based on such stratification, we sought the genetic cause underlying the pathology, relying on different technologies for genetic diagnosis. In the first case reported we found the disease-causing mutation upon the description of a region of loss of heterozigosity that ranged the whole chromosome 4. These results pointed our attention towards PPM1K gene, coding for the phosphatase PP2Cm, thought to be the phosphatase positively regulating BCKDH complex activity. We confirmed an homozygous deletion of two base pairs on such gene, causing a frame shift that resulted in an upstream premature stop codon. Further analysis of the mutation effect confirmed the relationship of the loss of PP2Cm function with a decreased BCKDHc activity, responsible for the MSUD-like metabolic pattern found in the patient. The diagnosis of a defect in BCKDHc-regulating kinase BCKDK in the second set of patients, characterized by an autistic-like behavior with growth and mental retardation and a decrease in BCAAs blood concentration, involved the establishment of such decrease as the metabolic signature for the disease, achieved through characterization of the metabolism rate of [U-13C]-Leucine in cultured fibroblasts. We found two loss-of-function mutations in BCKDK, c.520C>G and c.1166T>C, proved to be causing the described molecular phenotype. BCKDK is responsible for the phosphorylation and inactivation of BCKDH complex, so a defect in the kinase results in its constitutive activation. As for the last patient, a Next Generation Sequencing study was performed, evaluating the coding exons of over 4.800 genes. Through the establishment of a few logic filters we found a mutation in BCAT2, coding for the mitochondrial transaminase responsible of the first step of BCAA metabolism. On the search of actionable targets for disease treatment we pointed our research towards the study of the mitochondrial performance upon disruption in the BCAA metabolism regulation (both absence of PP2Cm and of BCKDK) for several reasons. We must recall that branched chain amino acids metabolism is a mitochondrial process itself. On top of that, PP2Cm had been described as a critical mitochondrial protein in animal models, and mitochondrial alterations have been described in the etiopathogenesis of some types of autism. We found an increase in O2 ·- production in both states, specially significant in the BCKDK patients’ cells, in which the increase recorded was over two-fold. Appraisal of oxygen consumption rates revealed a decrease in the basal and ATP-linked respiration, that was indeed echoed in a great reduction of ATP cellular concentration. On further analysis, we couldn’t ratify the implication of PP2Cm in mPTP, as had been described in other models, but we did confirm an implication in cell survival sensitivity, as we registered an increase in MAPK activation and overexpression of certain pro-apoptotic genes. Therefore, it is still unclear what part of the phenotype described accounts for the BCAAs and what for PP2Cm defficency itself. Once analyzing the mitochondrial response to the hyper-catabolic state due to BCKDK deficiency, we registered more elongated and tubular mitochondria and increased fusion-related proteins Mfn2 and OPA1, results reproduced in BCKDK-interfered control fibroblasts. These results point towards diverse mitochondrial mechanisms to mitigate the damage caused by the decrease in amino acids, through the shape and funcion relationship, placing the mitochondria in a relevant position in the disease development. Our results highlight the importance of the equilibrium in the BCAA metabolism regulation. The description of new genes involved in a metabolic route allows the definition or validation of a therapeutic approach to the disorder, and empowers the search of novel actionable pathways that address the different facets of the diseas

    Evaluation of chemical and gene/protein entity recognition systems at BioCreative V.5: the CEMP and GPRO patents tracks

    Get PDF
    This paper presents the results of the BioCreative V.5 offline tasks related to the evaluation of the performance as well as assess progress made by strategies used for the automatic recognition of mentions of chemical names and gene in running text of medicinal chemistry patent abstracts. A total of 21 teams submitted results for at least one of these tasks. The CEMP (chemical entity mention in patents) task entailed the detection of chemical named entity mentions. A total of 14 teams submitted 56 runs. The top performing team reached an F-score of 0.90 with a precision of 0.88 and a recall of 0.93. The GPRO (gene and protein related object) task focused on the detection of mentions of gene and protein related objects. The 7 participating teams (30 runs) had to detect gene/protein mentions that could be linked to at least one biological database, such as SwissProt or EntrezGene. The best F-score, recall and precision in this task were of 0.79, 0.83 and 0.77, respectively. The CEMP and GPRO gold standard corpora included training sets of 21,000 records and test sets of 9,000 records. Similar to the previous BioCreative CHEMDNER tasks, evaluation was based on micro-averaged F-score. The BeCalm platform supported prediction submission and evaluation (http://www.becalm.eu).We acknowledge the OpenMinted (654021) and the ELIXIREXCELERATE (676559) H2020 projects, and the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology for funding. The Spanish National Bioinformatics Institute (INB) unit at the Spanish National Cancer Research Centre (CNIO) is a member of the INB, PRB2-ISCIII and is supported by grant PT13/0001/0030, of the PE I+D+i 2013-2016, funded by ISCIII and ERDF.info:eu-repo/semantics/publishedVersio

    The Markyt visualisation, prediction and benchmark platform for chemical and gene entity recognition at BioCreative/CHEMDNER challenge

    Get PDF
    Biomedical text mining methods and technologies have improved significantly in the last decade. Considerable efforts have been invested in understanding the main challenges of biomedical literature retrieval and extraction and proposing solutions to problems of practical interest. Most notably, community-oriented initiatives such as the BioCreative challenge have enabled controlled environments for the comparison of automatic systems while pursuing practical biomedical tasks. Under this scenario, the present work describes the Markyt Web-based document curation platform, which has been implemented to support the visualisation, prediction and benchmark of chemical and gene mention annotations at BioCreative/CHEMDNER challenge. Creating this platform is an important step for the systematic and public evaluation of automatic prediction systems and the reusability of the knowledge compiled for the challenge. Markyt was not only critical to support the manual annotation and annotation revision process but also facilitated the comparative visualisation of automated results against the manually generated Gold Standard annotations and comparative assessment of generated results. We expect that future biomedical text mining challenges and the text mining community may benefit from the Markyt platform to better explore and interpret annotations and improve automatic system predictions. Database URL: http://www.markyt.org, https://github.com/sing-group/MarkytThis work was partially funded by the [14VI05] Contract-Programme from the University of Vigo and the Agrupamento INBIOMED from DXPCTSUG-FEDER unha maneira de facer Europa (2012/273) as well as by the Foundation for Applied Medical Research, University of Navarra (Pamplona, Spain). The research leading to these results has received funding from the European Union's Seventh Framework Programme FP7/REGPOT-2012-2013.1 under grant agreement no 316265, BIOCAPS

    Comprehensive analysis of GABAA-A1R developmental alterations in Rett Syndrome: setting the focus for therapeutic targets in the time frame of the disease

    Get PDF
    Rett syndrome, a serious neurodevelopmental disorder, has been associated with an altered expression of different synaptic-related proteins and aberrant glutamatergic and γ-aminobutyric acid (GABA)ergic neurotransmission. Despite its severity, it lacks a therapeutic option. Through this work we aimed to define the relationship between MeCP2 and GABAA.-A1 receptor expression, emphasizing the time dependence of such relationship. For this, we analyzed the expression of the ionotropic receptor subunit in different MeCP2 gene-dosage and developmental conditions, in cells lines, and in primary cultured neurons, as well as in different developmental stages of a Rett mouse model. Further, RNAseq and systems biology analysis was performed from post-mortem brain biopsies of Rett patients. We observed that the modulation of the MeCP2 expression in cellular models (both Neuro2a (N2A) cells and primary neuronal cultures) revealed a MeCP2 positive effect on the GABAA.-A1 receptor subunit expression, which did not occur in other proteins such as KCC2 (Potassium-chloride channel, member 5). In the Mecp2+/- mouse brain, both the KCC2 and GABA subunits expression were developmentally regulated, with a decreased expression during the pre-symptomatic stage, while the expression was variable in the adult symptomatic mice. Finally, the expression of the gamma-aminobutyric acid (GABA) receptor-related synaptic proteins from the postmortem brain biopsies of two Rett patients was evaluated, specifically revealing the GABA A1R subunit overexpression. The identification of the molecular changes along with the Rett syndrome prodromic stages strongly endorses the importance of time frame when addressing this disease, supporting the need for a neurotransmission-targeted early therapeutic intervention

    CHEMDNER: The drugs and chemical names extraction challenge

    Get PDF
    Natural language processing (NLP) and text mining technologies for the chemical domain (ChemNLP or chemical text mining) are key to improve the access and integration of information from unstructured data such as patents or the scientific literature. Therefore, the BioCreative organizers posed the CHEMDNER (chemical compound and drug name recognition) community challenge, which promoted the development of novel, competitive and accessible chemical text mining systems. This task allowed a comparative assessment of the performance of various methodologies using a carefully prepared collection of manually labeled text prepared by specially trained chemists as Gold Standard data. We evaluated two important aspects: one covered the indexing of documents with chemicals (chemical document indexing - CDI task), and the other was concerned with finding the exact mentions of chemicals in text (chemical entity mention recognition - CEM task). 27 teams (23 academic and 4 commercial, a total of 87 researchers) returned results for the CHEMDNER tasks: 26 teams for CEM and 23 for the CDI task. Top scoring teams obtained an F-score of 87.39% for the CEM task and 88.20% for the CDI task, a very promising result when compared to the agreement between human annotators (91%). The strategies used to detect chemicals included machine learning methods (e.g. conditional random fields) using a variety of features, chemistry and drug lexica, and domain-specific rules. We expect that the tools and resources resulting from this effort will have an impact in future developments of chemical text mining applications and will form the basis to find related chemical information for the detected entities, such as toxicological or pharmacogenomic properties

    Thiamine transporter-2 deficiency: outcome and treatment monitoring

    Get PDF
    Background: The clinical characteristics distinguishing treatable thiamine transporter-2 deficiency (ThTR2) due to SLC19A3 genetic defects from the other devastating causes of Leigh syndrome are sparse. Methods. We report the clinical follow-up after thiamine and biotin supplementation in four children with ThTR2 deficiency presenting with Leigh and biotin-thiamine-responsive basal ganglia disease phenotypes. We established whole-blood thiamine reference values in 106 non-neurological affected children and monitored thiamine levels in SLC19A3 patients after the initiation of treatment. We compared our results with those of 69 patients with ThTR2 deficiency after a review of the literature. Results: At diagnosis, the patients were aged 1 month to 17 years, and all of them showed signs of acute encephalopathy, generalized dystonia, and brain lesions affecting the dorsal striatum and medial thalami. One patient died of septicemia, while the remaining patients evidenced clinical and radiological improvements shortly after the initiation of thiamine. Upon follow-up, the patients received a combination of thiamine (10-40 mg/kg/day) and biotin (1-2 mg/kg/day) and remained stable with residual dystonia and speech difficulties. After establishing reference values for the different age groups, whole-blood thiamine quantification was a useful method for treatment monitoring. Conclusions: ThTR2 deficiency is a reversible cause of acute dystonia and Leigh encephalopathy in the pediatric years. Brain lesions affecting the dorsal striatum and medial thalami may be useful in the differential diagnosis of other causes of Leigh syndrome. Further studies are needed to validate the therapeutic doses of thiamine and how to monitor them in these patientsAntecedentes: Las características clínicas distintivas del déficit tratable del trasportador de tiamina tipo 2 (ThTR2) debido a defectos genéticos del SLC19A3 de las otras causas devastadores del síndrome de Leigh son escasas. Métodos: Presentamos el seguimiento clínico después de la administración de suplementos de tiamina y biotina a cuatro niños con deficiencia ThTR2 que presentaban fenotipos de biotin-thiamine responsive basal ganglia disease y síndrome de Leigh. Hemos establecido valores de referencia de tiamina en sangre total en 106 niños sin patología neurológica y monitorizamos los niveles de tiamina en pacientes con mutación del SLC19A3 después del inicio del tratamiento. Hemos comparado nuestros resultados con los de 69 pacientes con deficiencia ThTR2 después de una revisión de la literatura. Resultados: Al momento del diagnóstico , los pacientes tenían entre 1 mes a 17 años, y todos ellos mostraron signos medial. Un paciente murió de septicemia, mientras que el resto de pacientes evidenciaron mejoras clínicas y radiológicas poco después del inicio de la tiamina. Al seguimiento, los pacientes recibieron una combinación de tiamina (10–40 mg/kg/día) y biotina (1–2 mg/kg/día) y se mantuvieron estables, aunque con distonía y dificultades del habla residual. Después de establecer valores de referencia para los diferentes grupos de edad, la cuantificación de tiamina en sangre total demuestra ser un método útil para el seguimiento del tratamiento. Conclusiones: La deficiencia ThTR2 es una causa reversible de la distonía aguda y síndrome de Leigh en la edad pediátrica. Las lesiones cerebrales que afectan el cuerpo estriado dorsal y tálamo medial pueden ser útiles en el diagnóstico diferencial de otras causas de síndrome de Leigh. Se necesitan más estudios para validar las dosis de tiamina y la monitorización terapéutica de estos pacientesSupported by Fondo de Investigación Sanitaria Grant PI12/02010 and PI12/02078; Centre for Biomedical Research on Rare Diseases, an initiative of the Instituto de Salud Carlos III, Barcelona, Spain; Agència de Gestio’ d’Ajuts Universitaris i de Recerca-Agaur FI-DGR 2014 (JD Ortigoza-Escobar

    Impact of Diabetes on 10‐Year Outcomes Following ST‐Segment–Elevation Myocardial Infarction: Insights From the EXAMINATION‐EXTEND Trial

    Get PDF
    BACKGROUND: Long-term outcomes of ST-segment-elevation myocardial infarction in patients with diabetes have been barely investigated. The objective of this analysis from the EXAMINATION-EXTEND (10-Years Follow-Up of the EXAMINATION trial) trial was to compare 10-year outcomes of patients with ST-segment-elevation myocardial infarction with and without diabetes. METHODS AND RESULTS: Of the study population, 258 patients had diabetes and 1240 did not. The primary end point was patient-oriented composite end point of all-cause death, any myocardial infarction, or any revascularization. Secondary end points were the individual components of the primary combined end point, cardiac death, target vessel myocardial infarction, target lesion revascularization, and stent thrombosis. All end points were adjusted for potential confounders. At 10 years, patients with diabetes showed a higher incidence of patient-oriented composite end point compared with those without (46.5% versus 33.0%; adjusted hazard ratio [HR], 1.31 [95% CI, 1.05-1.61]; P=0.016) mainly driven by a higher incidence of any revascularization (24.4% versus 16.6%; adjusted HR, 1.61 [95% CI, 1.19-2.17]; P=0.002). Specifically, patients with diabetes had a higher incidence of any revascularization during the first 5 years of follow-up (20.2% versus 12.8%; adjusted HR, 1.57 [95% CI, 1.13-2.19]; P=0.007) compared with those without diabetes. No statistically significant differences were found with respect to the other end points. CONCLUSIONS: Patients with ST-segment-elevation myocardial infarction who had diabetes had worse clinical outcome at 10 years compared with those without diabetes, mainly driven by a higher incidence of any revascularizations in the first 5 years

    Overview of recent TJ-II stellarator results

    Get PDF
    The main results obtained in the TJ-II stellarator in the last two years are reported. The most important topics investigated have been modelling and validation of impurity transport, validation of gyrokinetic simulations, turbulence characterisation, effect of magnetic configuration on transport, fuelling with pellet injection, fast particles and liquid metal plasma facing components. As regards impurity transport research, a number of working lines exploring several recently discovered effects have been developed: the effect of tangential drifts on stellarator neoclassical transport, the impurity flux driven by electric fields tangent to magnetic surfaces and attempts of experimental validation with Doppler reflectometry of the variation of the radial electric field on the flux surface. Concerning gyrokinetic simulations, two validation activities have been performed, the comparison with measurements of zonal flow relaxation in pellet-induced fast transients and the comparison with experimental poloidal variation of fluctuations amplitude. The impact of radial electric fields on turbulence spreading in the edge and scrape-off layer has been also experimentally characterized using a 2D Langmuir probe array. Another remarkable piece of work has been the investigation of the radial propagation of small temperature perturbations using transfer entropy. Research on the physics and modelling of plasma core fuelling with pellet and tracer-encapsulated solid-pellet injection has produced also relevant results. Neutral beam injection driven Alfvénic activity and its possible control by electron cyclotron current drive has been examined as well in TJ-II. Finally, recent results on alternative plasma facing components based on liquid metals are also presentedThis work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom research and training programme 2014–2018 under Grant Agreement No. 633053. It has been partially funded by the Ministerio de Ciencia, Inovación y Universidades of Spain under projects ENE2013-48109-P, ENE2015-70142-P and FIS2017-88892-P. It has also received funds from the Spanish Government via mobility grant PRX17/00425. The authors thankfully acknowledge the computer resources at MareNostrum and the technical support provided by the Barcelona S.C. It has been supported as well by The Science and Technology Center in Ukraine (STCU), Project P-507F

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    The CHEMDNER corpus of chemicals and drugs and its annotation principles

    Get PDF
    The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus
    corecore