Search CORE

3,735 research outputs found

Open-domain Anatomical Entity Mention Detection

Author: Ananiadou S
Ohta T
Pyysalo S
Tsujii J
Publication venue
Publication date: 01/01/2012
Field of study

Cell line name recognition in support of the identification of synthetic lethality in cancer from text

Author: Ginter Filip
Kaewphan Suwisa
Ohta Tomoko
Pyysalo Sampo
Van de Peer Yves
Van Landeghem Sofie
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/10/2015
Field of study

Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus. Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers

UPSpace at the University of Pretoria

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Spanish named entity recognition in the biomedical domain

Author: Cotik Viviana
Rodríguez Hontoria Horacio
Vivaldi Palatresi Jorge
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Named Entity Recognition in the clinical domain and in languages different from English has the difficulty of the absence of complete dictionaries, the informality of texts, the polysemy of terms, the lack of accordance in the boundaries of an entity, the scarcity of corpora and of other resources available. We present a Named Entity Recognition method for poorly resourced languages. The method was tested with Spanish radiology reports and compared with a conditional random fields system.Peer ReviewedPostprint (author's final draft

From POS tagging to dependency parsing for biomedical event extraction

Author: Nguyen Dat Quoc
Verspoor Karin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/01/2019
Field of study

Background: Given the importance of relation or event extraction from biomedical research publications to support knowledge capture and synthesis, and the strong dependency of approaches to this information extraction task on syntactic information, it is valuable to understand which approaches to syntactic processing of biomedical text have the highest performance. Results: We perform an empirical study comparing state-of-the-art traditional feature-based and neural network-based models for two core natural language processing tasks of part-of-speech (POS) tagging and dependency parsing on two benchmark biomedical corpora, GENIA and CRAFT. To the best of our knowledge, there is no recent work making such comparisons in the biomedical context; specifically no detailed analysis of neural models on this data is available. Experimental results show that in general, the neural models outperform the feature-based models on two benchmark biomedical corpora GENIA and CRAFT. We also perform a task-oriented evaluation to investigate the influences of these models in a downstream application on biomedical event extraction, and show that better intrinsic parsing performance does not always imply better extrinsic event extraction performance. Conclusion: We have presented a detailed empirical study comparing traditional feature-based and neural network-based models for POS tagging and dependency parsing in the biomedical context, and also investigated the influence of parser selection for a biomedical event extraction downstream task. Availability of data and material: We make the retrained models available at https://github.com/datquocnguyen/BioPosDepComment: Accepted for publication in BMC Bioinformatic

arXiv.org e-Print Archive

Directory of Open Access Journals