Search CORE

500 research outputs found

Antecedent selection techniques for high-recall roreference resolution

Author: Versley Yannick
Publication venue
Publication date: 01/01/2007
Field of study

We investigate methods to improve the recall in coreference resolution by also trying to resolve those definite descriptions where no earlier mention of the referent shares the same lexical head (coreferent bridging). The problem, which is notably harder than identifying coreference relations among mentions which have the same lexical head, has been tackled with several rather different approaches, and we attempt to provide a meaningful classification along with a quantitative comparison. Based on the different merits of the methods, we discuss possibilities to improve them and show how they can be effectively combined

Hochschulschriftenserver - Universität Frankfurt am Main

PP Attachment Ambiguity Resolution with Corpus-Based Pattern Distributions and Lexical Signatures

Author: Gala Nuria
Lafourcade Mathieu
Publication venue: 'Baishideng Publishing Group Inc.'
Publication date: 01/01/2006
Field of study

Invited PaperInternational audienceIn this paper, we propose a method combining unsupervised learning of lexical frequencies with semantic information aiming at improving PP attachment ambiguity resolution. Using the output of a robust parser, i.e. the set of all possible attachments for a given sentence, we query the Web and obtain statistical information about the frequencies of the attachments distributions as well as lexical signatures of the terms on the patterns. All this information is used to weight the dependencies yielded by the parser

CiteSeerX

HAL AMU

Apport d'un corpus comparable déséquilibré à l'extraction de lexiques bilingues

Author: Morin Emmanuel
Publication venue: HAL CCSD
Publication date: 24/06/2009
Field of study

National audienceThe main work in bilingual lexicon extraction from comparable corpora is based on the implicit hypothesis that corpora are balanced. However, the different related approaches are relatively insensitive to sizes of each part of the comparable corpus. Within this context, we study the influence of unbalanced comparable corpora on the quality of bilingual terminology extraction through different experiments. Our results show the conditions under which the use of an unbalanced comparable corpus can induce a significant gain in the quality of extracted lexicons

Corpus Wide Argument Mining -- a Working Solution

Author: Aharonov Ranit
Alzate Carlos
Bilu Yonatan
Choshen Leshem
Dankin Lena
Ein-Dor Liat
Gera Ariel
Gleize Martin
Halfon Alon
Hou Yufang
Shnarch Eyal
Slonim Noam
Sznajder Benjamin
Publication venue
Publication date: 25/11/2019
Field of study

One of the main tasks in argument mining is the retrieval of argumentative content pertaining to a given topic. Most previous work addressed this task by retrieving a relatively small number of relevant documents as the initial source for such content. This line of research yielded moderate success, which is of limited use in a real-world system. Furthermore, for such a system to yield a comprehensive set of relevant arguments, over a wide range of topics, it requires leveraging a large and diverse corpus in an appropriate manner. Here we present a first end-to-end high-precision, corpus-wide argument mining system. This is made possible by combining sentence-level queries over an appropriate indexing of a very large corpus of newspaper articles, with an iterative annotation scheme. This scheme addresses the inherent label bias in the data and pinpoints the regions of the sample space whose manual labeling is required to obtain high-precision among top-ranked candidates

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Bootstrapping Lexical Choice via Multiple-Sequence Alignment

Author: Barzilay Regina
Lee Lillian
Publication venue
Publication date: 01/01/2002
Field of study

An important component of any generation system is the mapping dictionary, a lexicon of elementary semantic expressions and corresponding natural language realizations. Typically, labor-intensive knowledge-based methods are used to construct the dictionary. We instead propose to acquire it automatically via a novel multiple-pass algorithm employing multiple-sequence alignment, a technique commonly used in bioinformatics. Crucially, our method leverages latent information contained in multi-parallel corpora -- datasets that supply several verbalizations of the corresponding semantics rather than just one. We used our techniques to generate natural language versions of computer-generated mathematical proofs, with good results on both a per-component and overall-output basis. For example, in evaluations involving a dozen human judges, our system produced output whose readability and faithfulness to the semantic input rivaled that of a traditional generation system.Comment: 8 pages; to appear in the proceedings of EMNLP-200

arXiv.org e-Print Archive

CiteSeerX

Columbia University Academic Commons

Introduction to the CoNLL-2000 Shared Task: Chunking

Author: Buchholz Sabine
Sang Erik F. Tjong Kim
Publication venue
Publication date: 01/01/2000
Field of study

We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlapping groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance.Comment: 6 page

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Tilburg University Repository