Search CORE

7 research outputs found

Coreference Resolution in Biomedical Texts: a Machine Learning Approach

Author: Hong Huaqing
Su Jian
Tateisi Yuka
Tsujii Jun\u27ichi
Yang Xiaofeng
Publication venue: Dagstuhl Seminar Proceedings. 08131 - Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives
Publication date: 01/01/2008
Field of study

Motivation: Coreference resolution, the process of identifying different mentions of an entity, is a very important component in a text-mining system. Compared with the work in news articles, the existing study of coreference resolution in biomedical texts is quite preliminary by only focusing on specific types of anaphors like pronouns or definite noun phrases, using heuristic methods, and running on small data sets. Therefore, there is a need for an in-depth exploration of this task in the biomedical domain. Results: In this article, we presented a learning-based approach to coreference resolution in the biomedical domain. We made three contributions in our study. Firstly, we annotated a large scale coreference corpus, MedCo, which consists of 1,999 medline abstracts in the GENIA data set. Secondly, we proposed a detailed framework for the coreference resolution task, in which we augmented the traditional learning model by incorporating non-anaphors into training. Lastly, we explored various sources of knowledge for coreference resolution, particularly, those that can deal with the complexity of biomedical texts. The evaluation on the MedCo corpus showed promising results. Our coreference resolution system achieved a high precision of 85.2% with a reasonable recall of 65.3%, obtaining an F-measure of 73.9%. The results also suggested that our augmented learning model significantly boosted precision (up to 24.0%) without much loss in recall (less than 5%), and brought a gain of over 8% in F-measure

Dagstuhl Research Online Publication Server

Coreference based event-argument relation extraction on biomedical text

Author: Asahara Masayuki
Hirao Tsutomu
Matsumoto Yuji
Riedel Sebastian
Yoshikawa Katsumasa
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

This paper presents a new approach to exploit coreference information for extracting event-argument (E-A) relations from biomedical documents. This approach has two advantages: (1) it can extract a large number of valuable E-A relations based on the concept of salience in discourse; (2) it enables us to identify E-A relations over sentence boundaries (cross-links) using transitivity of coreference relations. We propose two coreference-based models: a pipeline based on Support Vector Machine (SVM) classifiers, and a joint Markov Logic Network (MLN). We show the effectiveness of these models on a biomedical event corpus. Both models outperform the systems that do not use coreference information. When the two proposed models are compared to each other, joint MLN outperforms pipeline SVM with gold coreference information

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery