21 research outputs found

    Biblio-MetReS for user-friendly mining of genes and biological processes in scientific documents

    Get PDF
    One way to initiate the reconstruction of molecular circuits is by using automated text-mining techniques. Developing more efficient methods for such reconstruction is a topic of active research, and those methods are typically included by bioinfor- maticians in pipelines used to mine and curate large literature datasets. Nevertheless, experimental biologists have a limited number of available user-friendly tools that use text-mining for network reconstruction and require no programming skills to use. One of these tools is Biblio-MetReS. Originally, this tool permitted an on-the-fly analysis of documents contained in a number of web-based literature databases to identify co-occurrence of proteins/genes. This approach ensured results that were always up-to-date with the latest live version of the databases. However, this `up-to- dateness' came at the cost of large execution times. Here we report an evolution of the application Biblio-MetReS that permits constructing co-occurrence networks for genes, GO processes, Pathways, or any combination of the three types of entities and graphically represent those entities.We show that the performance of Biblio- MetReS in identifying gene co-occurrence is as least as good as that of other com- parable applications (STRING and iHOP). In addition, we also show that the iden- tification of GO processes is on par to that reported in the latest BioCreAtIvE chal- lenge. Finally, we also report the implementation of a new strategy that combines on-the-fly analysis of new documents with preprocessed information from docu- ments that were encountered in previous analyses. This combination simultaneously decreases program run time and maintains `up-to-dateness' of the results.RA was partially supported by the Ministerio de Ciencia e Innovación (MICINN, Spain through grant BFU2010-17704). FS was partially funded by the MICINN, with grants TIN2011-28689-C02-02. The authors are members of the research groups 2009SGR809 and 2009SGR145, funded by the “Generalitat de Catalunya”. AU is funded by a Generalitat de Catalunya (AGAUR) PhD fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Biblio-MetReS: A bibliometric network reconstruction application and server

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Reconstruction of genes and/or protein networks from automated analysis of the literature is one of the current targets of text mining in biomedical research. Some user-friendly tools already perform this analysis on precompiled databases of abstracts of scientific papers. Other tools allow <b>expert </b>users to elaborate and analyze the full content of a corpus of scientific documents. However, to our knowledge, no <b>user friendly </b>tool that simultaneously analyzes the latest set of scientific documents available on line and reconstructs the set of genes referenced in those documents is available.</p> <p>Results</p> <p>This article presents such a tool, Biblio-MetReS, and compares its functioning and results to those of other user-friendly applications (iHOP, STRING) that are widely used. Under similar conditions, Biblio-MetReS creates networks that are comparable to those of other user friendly tools. Furthermore, analysis of full text documents provides more complete reconstructions than those that result from using only the abstract of the document.</p> <p>Conclusions</p> <p>Literature-based automated network reconstruction is still far from providing complete reconstructions of molecular networks. However, its value as an auxiliary tool is high and it will increase as standards for reporting biological entities and relationships become more widely accepted and enforced. Biblio-MetReS is an application that can be downloaded from <url>http://metres.udl.cat/</url>. It provides an easy to use environment for researchers to reconstruct their networks of interest from an always up to date set of scientific documents.</p

    Ovine footrot in Southern Portugal: detection of Dichelobacter nodosus and Fusobacterium necrophorum in sheep with different lesion scores.

    Get PDF
    The Mediterranean climate region of Alentejo in the Southern of Portugal is an important sheep production centre but little is known about the presence and characteristics of Dichelobacter nodosus in association with Fusobacterium necrophorum in the different footrot lesion scores. DNA from 261 interdigital biopsy samples, taken from 14 footrot affected flocks and from three non-affected flocks, were analysed for the presence of D. nodosus and F. necrophorum by real-time PCR. Both virulence and serogroup were determined for 132 and 53 D. nodosus positive biopsy samples, respectively. The co-infection with both bacteria was the commonest epidemiological finding associated with a greater disease severity. There was a statistically significant association (p =  0.002) between footrot-affected flocks and the presence of D. nodosus. Most D. nodosus positive samples were virulent (96.2%) and belonged to serogroup B (90%)

    The CHEMDNER corpus of chemicals and drugs and its annotation principles

    Get PDF
    The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus

    Development of computational tools to assist in the reconstruction of molecular networks

    Get PDF
    L'objectiu d'aquesta tesi és desenvolupar i implementar un conjunt d'eines de mineria de dades per ajudar en la reconstrucció de circuits biològics a través de l'anàlisi i la integració de grans conjunts de dades biològiques. Aquests circuits són importants perquè regulen tots els processos que controlen la vida i la salut dels organismes. El treball principal de la tesis es centra en l'anàlisi de les dades bibliòmiques desenvolupant-se dues eines, Biblio-MetReS per la reconstrucció de xarxes de PPIs i la identificació dels processos en què intervenen aquestes xarxes, i CheNER per la identificació de noms de compostos químics. L'eina final desenvolupada es centra en la integració de mètodes per a l'anàlisi estructural i modelització de proteïnes amb mètodes d'acoblament per a la predicció de complexos físics de proteïna-proteïna.El objetivo de esta tesis es desarrollar e implementar un conjunto de herramientas de minería de datos para ayudar en la reconstrucción de circuitos biológicos a través del análisis y la integración de grandes conjuntos de datos biológicos. Estos circuitos son importantes porque regulan todos los procesos que controlan la vida y la salud de los organismos. El trabajo principal de la tesis se centra en el análisis de los datos bibliómicos, desarrollándose con este fin dos herramientas diferentes, Biblio-MetReS para la reconstrucción de redes PPIs y la identificación de los procesos en que intervienen estas redes, y CheNER para la identificación de nombres de compuestos químicos. La herramienta final que he desarrollado se centra en la integración de métodos para el análisis estructural y modelado de proteínas con métodos de acoplamiento para la predicción de complejos físicos de proteína-proteína.The aim of this thesis is the development and implementation of a set of data mining tools to aid in the reconstruction of biological circuits through analysis and integration of large biological datasets. These circuits are important because they regulate and maintain life and health in organisms. The main part of the thesis is focused on analyzing bibliomic data for which I develop two tools, Biblio-MetReS for the reconstruction of PPIs networks and to identify the processes in which the networks are involved, and CheNER for the identification of chemical compounds names. The final tool developed focuses on the integration of methods for structural analysis and modeling of proteins with docking methods for prediction of native protein-protein physical complexes
    corecore