39 research outputs found

    A Listwise Approach to Coreference Resolution in Multiple Languages

    Get PDF

    Cross-lingual Coreference Resolution of Pronouns

    Get PDF
    This work is, to our knowledge, a first attempt at a machine learning approach to cross-lingual coreference resolution, i.e. coreference resolution (CR) performed on a bitext. Focusing on CR of English pronouns, we leverage language differences and enrich the feature set of a standard monolingual CR system for English with features extracted from the Czech side of the bitext. Our work also includes a supervised pronoun aligner that outperforms a GIZA++ baseline in terms of both intrinsic evaluation and evaluation on CR. The final cross-lingual CR system has successfully outperformed both a monolingual CR and a cross-lingual projection system

    A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks

    Full text link
    Much effort has been devoted to evaluate whether multi-task learning can be leveraged to learn rich representations that can be used in various Natural Language Processing (NLP) down-stream applications. However, there is still a lack of understanding of the settings in which multi-task learning has a significant effect. In this work, we introduce a hierarchical model trained in a multi-task learning setup on a set of carefully selected semantic tasks. The model is trained in a hierarchical fashion to introduce an inductive bias by supervising a set of low level tasks at the bottom layers of the model and more complex tasks at the top layers of the model. This model achieves state-of-the-art results on a number of tasks, namely Named Entity Recognition, Entity Mention Detection and Relation Extraction without hand-engineered features or external NLP tools like syntactic parsers. The hierarchical training supervision induces a set of shared semantic representations at lower layers of the model. We show that as we move from the bottom to the top layers of the model, the hidden states of the layers tend to represent more complex semantic information.Comment: 8 pages, 1 figure, To appear in Proceedings of AAAI 201

    Joint Learning for Coreference Resolution with Markov Logic

    Get PDF
    Pairwise coreference resolution models must merge pairwise coreference decisions to generate final outputs. Traditional merging methods adopt different strategies such as the bestfirst method and enforcing the transitivity constraint, but most of these methods are used independently of the pairwise learning methods as an isolated inference procedure at the end. We propose a joint learning model which combines pairwise classification and mention clustering with Markov logic. Experimental results show that our joint learning system outperforms independent learning systems. Our system gives a better performance than all the learning-based systems from the CoNLL-2011 shared task on the same dataset. Compared with the best system from CoNLL-2011, which employs a rule-based method, our system shows competitive performance.

    Fine-grained Dutch named entity recognition

    Get PDF
    This paper describes the creation of a fine-grained named entity annotation scheme and corpus for Dutch, and experiments on automatic main type and subtype named entity recognition. We give an overview of existing named entity annotation schemes, and motivate our own, which describes six main types (persons, organizations, locations, products, events and miscellaneous named entities) and finer-grained information on subtypes and metonymic usage. This was applied to a one-million-word subset of the Dutch SoNaR reference corpus. The classifier for main type named entities achieves a micro-averaged F-score of 84.91 %, and is publicly available, along with the corpus and annotations

    Odkrivanje koreferenčnosti v slovenskem jeziku na označenih besedilih iz coref149

    Get PDF
    Odkrivanje koreferenčnosti je ena izmed treh ključnih nalog ekstrakcije informacij iz besedil, kamor spadata še prepoznavanje imenskih entitet in ekstrakcija povezav. Namen odkrivanja koreferenčnosti je prek celotnega besedila ustrezno združiti vse omenitve entitet v skupine, v katerih vsaka skupina predstavlja svojo entiteto. Metode za reševanje te naloge se za nekatere jezike z več govorci razvijajo že dalj časa, medtem ko za slovenski jezik še niso bile izdelane. V prispevku predstavljamo nov, ročno označen korpus za odkrivanje koreferenčnosti v slovenskem jeziku – korpus coref149. Za avtomatsko odkrivanje koreferenčnosti smo prilagodili sistem SkipCor, ki smo ga izdelali za angleški jezik. Sistem SkipCor je na slovenskem gradivu dosegel 76 % ocene CoNLL 2012. Ob tem smo analizirali še vplive posameznih tipov značilk in preverili, katere so pogoste napake. Pri analiziranju besedil smo razvili tudi programsko knjižnico s spletnim vmesnikom, prek katere je možno izvesti vse opisane analize in neposredno primerjati njihovo uspešnost. Rezultati analiz so obetavni in primerljivi z rezultati pri drugih, bolj razširjenih jezikih. S tem smo dokazali, da je avtomatsko odkrivanje koreferenčnosti v slovenskem jeziku lahko uspešno, v prihodnosti pa bi bilo potrebno izdelati še večji in kvalitetnejši korpus, v katerem bodo koreferenčno naslovljene vse posebnosti slovenskega jezika, kar bi omogočilo izgradnjo učinkovitih metod za avtomatsko reševanje koreferenčnih problemov

    Integrating knowledge graph embeddings to improve mention representation for bridging anaphora resolution

    Get PDF
    International audienceLexical semantics and world knowledge are crucial for interpreting bridging anaphora. Yet, existing computational methods for acquiring and injecting this type of information into bridging resolution systems suffer important limitations. Based on explicit querying of external knowledge bases, earlier approaches are computationally expensive (hence, hardly scalable) and they map the data to be processed into high-dimensional spaces (careful handling of the curse of dimensionality and overfitting has to be in order). In this work, we take a different and principled approach which naturally addresses these issues. Specifically, we convert the external knowledge source (in this case, WordNet) into a graph, and learn embeddings of the graph nodes of low dimension to capture the crucial features of the graph topology and, at the same time, rich semantic information. Once properly identified from the mention text spans, these low dimensional graph node embeddings are combined with distributional text-based embeddings to provide enhanced mention representations. We illustrate the effectiveness of our approach by evaluating it on commonly used datasets, namely ISNotes (Markert et al., 2012) and BASHI (Rösiger, 2018). Our enhanced mention representations yield significant accuracy improvements on both datasets when compared to different standalone text-based mention representations

    Review of coreference resolution in English and Persian

    Full text link
    Coreference resolution (CR) is one of the most challenging areas of natural language processing. This task seeks to identify all textual references to the same real-world entity. Research in this field is divided into coreference resolution and anaphora resolution. Due to its application in textual comprehension and its utility in other tasks such as information extraction systems, document summarization, and machine translation, this field has attracted considerable interest. Consequently, it has a significant effect on the quality of these systems. This article reviews the existing corpora and evaluation metrics in this field. Then, an overview of the coreference algorithms, from rule-based methods to the latest deep learning techniques, is provided. Finally, coreference resolution and pronoun resolution systems in Persian are investigated.Comment: 44 pages, 11 figures, 5 table
    corecore