38 research outputs found

    A System for Identifying Named Entities in Biomedical Text: how Results From two Evaluations Reflect on Both the System and the Evaluations

    Get PDF
    We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal

    Exploring the boundaries: gene and protein identification in biomedical text

    Get PDF
    Background: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. Methods: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. Results: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the “open ” evaluation and a precision of 0.78 and recall of 0.85 in the “closed ” evaluation. Conclusions: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches. Background The explosion of information in the biomedical domain and particularly in genetics has highlighted the need for automated text information extraction techniques. MEDLINE, the primary research database serving the biomedical community, currently contains over 14 million abstracts, with 60,000 new abstracts appearing each month. There is also an impressive number of molecular biological databases covering a

    An annotation scheme for information status in dialogue

    Get PDF
    We present an annotation scheme for information status (IS) in dialogue, and validate it on three Switchboard dialogues. We show that our scheme has good reproducibility, and compare it with previous attempts to code IS and related features. We eventually apply the scheme to 147 dialogues, thus producing a corpus that contains nearly 70,000 NPs annotated for IS and over 15,000 coreference links.

    Biomechanical signaling within the developing zebrafish heart attunes endocardial growth to myocardial chamber dimensions

    Get PDF
    Intra-organ communication guides morphogenetic processes that are essential for an organ to carry out complex physiological functions. In the heart, the growth of the myocardium is tightly coupled to that of the endocardium, a specialized endothelial tissue that lines its interior. Several molecular pathways have been implicated in the communication between these tissues including secreted factors, components of the extracellular matrix, or proteins involved in cell-cell communication. Yet, it is unknown how the growth of the endocardium is coordinated with that of the myocardium. Here, we show that an increased expansion of the myocardial atrial chamber volume generates higher junctional forces within endocardial cells. This leads to biomechanical signaling involving VE-cadherin, triggering nuclear localization of the Hippo pathway transcriptional regulator Yap1 and endocardial proliferation. Our work suggests that the growth of the endocardium results from myocardial chamber volume expansion and ends when the tension on the tissue is relaxed

    Information Retrieval Systems Adapted to the Biomedical Domain

    Get PDF
    The terminology used in Biomedicine shows lexical peculiarities that have required the elaboration of terminological resources and information retrieval systems with specific functionalities. The main characteristics are the high rates of synonymy and homonymy, due to phenomena such as the proliferation of polysemic acronyms and their interaction with common language. Information retrieval systems in the biomedical domain use techniques oriented to the treatment of these lexical peculiarities. In this paper we review some of the techniques used in this domain, such as the application of Natural Language Processing (BioNLP), the incorporation of lexical-semantic resources, and the application of Named Entity Recognition (BioNER). Finally, we present the evaluation methods adopted to assess the suitability of these techniques for retrieving biomedical resources.Comment: 6 pages, 4 table

    Towards a Protein-Protein Interaction information extraction system: recognizing named entities

    Full text link
    [EN] The majority of biological functions of any living being are related to Protein Protein Interactions (PPI). PPI discoveries are reported in form of research publications whose volume grows day after day. Consequently, automatic PPI information extraction systems are a pressing need for biologists. In this paper we are mainly concerned with the named entity detection module of PPIES (the PPI information extraction system we are implementing) which recognizes twelve entity types relevant in PPI context. It is composed of two sub-modules: a dictionary look-up with extensive normalization and acronym detection, and a Conditional Random Field classifier. The dictionary look-up module has been tested with Interaction Method Task (IMT), and it improves by approximately 10% the current solutions that do not use Machine Learning (ML). The second module has been used to create a classifier using the Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA 04) data set. It does not use any external resources, or complex or ad hoc post-processing, and obtains 77.25%, 75.04% and 76.13 for precision, recall, and F1-measure, respectively, improving all previous results obtained for this data set.This work has been funded by MICINN, Spain, as part of the "Juan de la Cierva" Program and the Project DIANA-Applications (TIN2012-38603-C02-01), as well as the by the European Commission as part of the WIQ-EI IRSES Project (Grant No. 269180) within the FP 7 Marie Curie People Framework.Danger Mercaderes, RM.; Pla Santamaría, F.; Molina Marco, A.; Rosso, P. (2014). Towards a Protein-Protein Interaction information extraction system: recognizing named entities. Knowledge-Based Systems. 57:104-118. https://doi.org/10.1016/j.knosys.2013.12.010S1041185

    1 The Effect of Feature Hierarchies on Frequencies of Passivization in English

    No full text
    It is my pleasure to thank Joan Bresnan, my principal advisor, for making this thesis seem manageable, and for pointing me in directions for research, and for consistently being enthusiastic and suggesting solutions to problem after problem, and equally, for suggesting problems and keeping me aware of the issues involved. And particularly for always making me feel more comfortable, in classes and meetings and a number of other situations, to express myself freely, and for handling it gracefully when I was incoherent. I was lucky to be able to work with her and it was a unique experience watching her work. Thanks to Chris Manning, my second advisor, for being helpful in a number of ways despite the incredible demands on his time this year, particularly in providing advice with statistics, and helping me to search corpora, and setting up software for searching them, and for reading what I wrote critically, and in general for being down-to-earth. Thanks to both Chris and Joan for giving me the opportunity to work on such an interesting project. I have really enjoyed our meetings and watching the ideas in the project take shape. Thanks especially for a wonderful trip to Hong Kong. And thanks to Tom Wasow for helping me get involved with the project and for teaching, with Ivan Sag, an introductory syntax class which I very much liked. A very special thanks to Ivan Sag, who is almost entirely responsible, along with Tom Wasow, fo
    corecore