Search CORE

39 research outputs found

A Listwise Approach to Coreference Resolution in Multiple Languages

Author: Ngo Bach Xuan
Nguyen Minh Le
Shimazu Akira
Tran Oanh Thi
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Cross-lingual Coreference Resolution of Pronouns

Author: Novák Michal
Žabokrtský Zdeněk
Publication venue
Publication date: 01/01/2014
Field of study

This work is, to our knowledge, a first attempt at a machine learning approach to cross-lingual coreference resolution, i.e. coreference resolution (CR) performed on a bitext. Focusing on CR of English pronouns, we leverage language differences and enrich the feature set of a standard monolingual CR system for English with features extracted from the Czech side of the bitext. Our work also includes a supervised pronoun aligner that outperforms a GIZA++ baseline in terms of both intrinsic evaluation and evaluation on CR. The final cross-lingual CR system has successfully outperformed both a monolingual CR and a cross-lingual projection system

Biblio at Institute of Formal and Applied Linguistics

A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks

Author: Ruder Sebastian
Sanh Victor
Wolf Thomas
Publication venue
Publication date: 26/11/2018
Field of study

Much effort has been devoted to evaluate whether multi-task learning can be leveraged to learn rich representations that can be used in various Natural Language Processing (NLP) down-stream applications. However, there is still a lack of understanding of the settings in which multi-task learning has a significant effect. In this work, we introduce a hierarchical model trained in a multi-task learning setup on a set of carefully selected semantic tasks. The model is trained in a hierarchical fashion to introduce an inductive bias by supervising a set of low level tasks at the bottom layers of the model and more complex tasks at the top layers of the model. This model achieves state-of-the-art results on a number of tasks, namely Named Entity Recognition, Entity Mention Detection and Relation Extraction without hand-engineered features or external NLP tools like syntactic parsers. The hierarchical training supervision induces a set of shared semantic representations at lower layers of the model. We show that as we move from the bottom to the top layers of the model, the hidden states of the layers tend to represent more complex semantic information.Comment: 8 pages, 1 figure, To appear in Proceedings of AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Joint Learning for Coreference Resolution with Markov Logic

Author: JIANG Jing
LI Sujian
SONG YANG
WANG Houfeng
ZHAO Xin
Publication venue
Publication date: 01/01/2012
Field of study

Pairwise coreference resolution models must merge pairwise coreference decisions to generate final outputs. Traditional merging methods adopt different strategies such as the bestfirst method and enforcing the transitivity constraint, but most of these methods are used independently of the pairwise learning methods as an isolated inference procedure at the end. We propose a joint learning model which combines pairwise classification and mention clustering with Markov logic. Experimental results show that our joint learning system outperforms independent learning systems. Our system gives a better performance than all the learning-based systems from the CoNLL-2011 shared task on the same dataset. Compared with the best system from CoNLL-2011, which employs a rule-based method, our system shows competitive performance.

CiteSeerX

Institutional Knowledge at Singapore Management University

Sortal anaphora resolution to enhance relation extraction from biomedical literature

Author: A Haghighi
A Rahman
AR Aronson
AR Aronson
AT McCray
BJ Grosz
C Gasperin
CD Manning
CM Miller
D Hristovski
D Weissenbacher
E Hovy
G Hripscak
G Rosemblat
Graciela Rosemblat
H Kilicoglu
H Kilicoglu
H Kilicoglu
H Kilicoglu
H Kilicoglu
H Kilicoglu
H Lee
Halil Kilicoglu
I Segura-Bedmar
J Castaño
J Cohen
J D’Souza
J Zheng
JD Kim
JJ Kim
K Yoshikawa
KB Cohen
LH Smith
M Choi
M Miwa
M Torii
Marcelo Fiszman
NLT Nguyen
O Bodenreider
P Stenetorp
P Thompson
S Bergsma
S Lappin
S Pradhan
T Lavergne
TC Rindflesch
Thomas C. Rindflesch
V Ng
V Ng
WM Soon
X Yang
Y Kim
Y Xu
Ö Uzuner
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Fine-grained Dutch named entity recognition

Author: Desmet Bart
Hoste Veronique
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This paper describes the creation of a fine-grained named entity annotation scheme and corpus for Dutch, and experiments on automatic main type and subtype named entity recognition. We give an overview of existing named entity annotation schemes, and motivate our own, which describes six main types (persons, organizations, locations, products, events and miscellaneous named entities) and finer-grained information on subtypes and metonymic usage. This was applied to a one-million-word subset of the Dutch SoNaR reference corpus. The classifier for main type named entities achieves a micro-averaged F-score of 84.91 %, and is publicly available, along with the corpus and annotations

Ghent University Academic Bibliography

Odkrivanje koreferenčnosti v slovenskem jeziku na označenih besedilih iz coref149

Author: Marko Bajec
Slavko Žitnik
Publication venue: 'University of Ljubljana'
Publication date: 01/06/2018
Field of study

Odkrivanje koreferenčnosti je ena izmed treh ključnih nalog ekstrakcije informacij iz besedil, kamor spadata še prepoznavanje imenskih entitet in ekstrakcija povezav. Namen odkrivanja koreferenčnosti je prek celotnega besedila ustrezno združiti vse omenitve entitet v skupine, v katerih vsaka skupina predstavlja svojo entiteto. Metode za reševanje te naloge se za nekatere jezike z več govorci razvijajo že dalj časa, medtem ko za slovenski jezik še niso bile izdelane. V prispevku predstavljamo nov, ročno označen korpus za odkrivanje koreferenčnosti v slovenskem jeziku – korpus coref149. Za avtomatsko odkrivanje koreferenčnosti smo prilagodili sistem SkipCor, ki smo ga izdelali za angleški jezik. Sistem SkipCor je na slovenskem gradivu dosegel 76 % ocene CoNLL 2012. Ob tem smo analizirali še vplive posameznih tipov značilk in preverili, katere so pogoste napake. Pri analiziranju besedil smo razvili tudi programsko knjižnico s spletnim vmesnikom, prek katere je možno izvesti vse opisane analize in neposredno primerjati njihovo uspešnost. Rezultati analiz so obetavni in primerljivi z rezultati pri drugih, bolj razširjenih jezikih. S tem smo dokazali, da je avtomatsko odkrivanje koreferenčnosti v slovenskem jeziku lahko uspešno, v prihodnosti pa bi bilo potrebno izdelati še večji in kvalitetnejši korpus, v katerem bodo koreferenčno naslovljene vse posebnosti slovenskega jezika, kar bi omogočilo izgradnjo učinkovitih metod za avtomatsko reševanje koreferenčnih problemov

Directory of Open Access Journals

Journals of Faculty of Arts, University of Ljubljana

Integrating knowledge graph embeddings to improve mention representation for bridging anaphora resolution

Author: Denis Pascal
Pandit Onkar
Ralaivola Liva
Publication venue: HAL CCSD
Publication date: 12/12/2020
Field of study

International audienceLexical semantics and world knowledge are crucial for interpreting bridging anaphora. Yet, existing computational methods for acquiring and injecting this type of information into bridging resolution systems suffer important limitations. Based on explicit querying of external knowledge bases, earlier approaches are computationally expensive (hence, hardly scalable) and they map the data to be processed into high-dimensional spaces (careful handling of the curse of dimensionality and overfitting has to be in order). In this work, we take a different and principled approach which naturally addresses these issues. Specifically, we convert the external knowledge source (in this case, WordNet) into a graph, and learn embeddings of the graph nodes of low dimension to capture the crucial features of the graph topology and, at the same time, rich semantic information. Once properly identified from the mention text spans, these low dimensional graph node embeddings are combined with distributional text-based embeddings to provide enhanced mention representations. We illustrate the effectiveness of our approach by evaluating it on commonly used datasets, namely ISNotes (Markert et al., 2012) and BASHI (Rösiger, 2018). Our enhanced mention representations yield significant accuracy improvements on both datasets when compared to different standalone text-based mention representations

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Review of coreference resolution in English and Persian

Author: Aznaveh Ahmad Mahmoudi
Mohammadi Hassan Haji
Talebpour Alireza
Yazdani Samaneh
Publication venue
Publication date: 08/11/2022
Field of study

Coreference resolution (CR) is one of the most challenging areas of natural language processing. This task seeks to identify all textual references to the same real-world entity. Research in this field is divided into coreference resolution and anaphora resolution. Due to its application in textual comprehension and its utility in other tasks such as information extraction systems, document summarization, and machine translation, this field has attracted considerable interest. Consequently, it has a significant effect on the quality of these systems. This article reviews the existing corpora and evaluation metrics in this field. Then, an overview of the coreference algorithms, from rule-based methods to the latest deep learning techniques, is provided. Finally, coreference resolution and pronoun resolution systems in Persian are investigated.Comment: 44 pages, 11 figures, 5 table

arXiv.org e-Print Archive