Search CORE

72 research outputs found

A constraint-based approach to noun phrase coreference resolution in German newspaper text

Author: Versley Yannick
Publication venue
Publication date: 01/01/2006
Field of study

In this paper, we investigate the usefulness of a wide range of features for their usefulness in the resolution of nominal coreference, both as hard constraints (i.e. completely removing elements from the list of possible candidates) as well as soft constraints (where a cumulation of violations of soft constraints will make it less likely that a candidate is chosen as the antecedent). We present a state of the art system based on such constraints and weights estimated with a maximum entropy model, using lexical information to resolve cases of coreferent bridging

Hochschulschriftenserver - Universität Frankfurt am Main

Decorrelation and shallow semantic patterns for distributional clustering of nouns and verbs

Author: Versley Yannick
Publication venue
Publication date: 01/01/2009
Field of study

Distributional approximations to lexical semantics are very useful not only in helping the creation of lexical semantic resources (Kilgariff et al., 2004; Snow et al., 2006), but also when directly applied in tasks that can benefit from large-coverage semantic knowledge such as coreference resolution (Poesio et al., 1998; Gasperin and Vieira, 2004; Versley, 2007), word sense disambiguation (Mc- Carthy et al., 2004) or semantical role labeling (Gordon and Swanson, 2007). We present a model that is built from Webbased corpora using both shallow patterns for grammatical and semantic relations and a window-based approach, using singular value decomposition to decorrelate the feature space which is otherwise too heavily influenced by the skewed topic distribution of Web corpora

CiteSeerX

Hochschulschriftenserver - Universität Frankfurt am Main

The many aspects of fine-grained sentiment analysis : an overview of the task and its main challenges

Author: De Clercq Orphée
Publication venue: IARIA
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

SUC-CORE: A Balanced Corpus Annotated with Noun Phrase Coreference

Author: Kristina Nilsson Björkenstam
Publication venue: 'Linkoping University Electronic Press'
Publication date
Field of study

Crossref

Optimization issues in machine learning of coreference resolution

Author: Hoste Veronique
Publication venue: Universiteit Antwerpen. Faculteit Letteren en Wijsbegeerte.
Publication date: 01/01/2005
Field of study

Ghent University Academic Bibliography

Un corpus pour optimiser l’identification automatique des chaînes de référence

Author: Longo Laurence
Publication venue: 'OpenEdition'
Publication date: 16/10/2013
Field of study

Nous présentons l’étude d’un corpus multi-genres constitué pour identifier de manière automatique les chaînes de référence (CR). Les CR sont des marqueurs linguistiques permettant d’identifier des ruptures ou des continuations thématiques dans le discours. Cette étude s’inscrit dans un projet visant le développement d’un outil de détection automatique de thèmes pour optimiser l’indexation des documents dans un moteur de recherche. Le moteur de recherche utilise l’indexation thématique et prend en compte le genre du document pour fournir à l’utilisateur les documents pertinents liés à sa requête. Dans notre perspective de Traitement Automatique des Langues, nous utilisons un corpus composé de cinq genres textuels (articles journalistiques, éditoriaux, romans, lois européennes, rapports publics) pour étudier les CR. Nous avons défini cinq critères pour comparer les CR suivant le genre textuel : la longueur moyenne des CR (nombre de maillons), la distance moyenne entre deux maillons d’une CR, la catégorie grammaticale privilégiée dans l’ensemble des maillons des CR, la classe grammaticale des premiers maillons des CR, la correspondance entre le premier maillon d’une CR et le thème phrastique (élément préverbal). L’étude a révélé des différences quant au matériau linguistique présent dans les CR suivant le genre textuel. Nous utilisons ces propriétés dans notre calcul des CR, pour paramétrer notre outil suivant le genre. Nous discutons les résultats obtenus.We present a multi-genre corpus study to automatically identify reference chains. Reference chains are linguistic markers identifying topic continuation or topic shift in discourse. The study is part of a project aiming at developing a system for automatic topic detection to optimize documents indexing in a search engine. The search engine uses topic indexing but also document genre to provide the user with relevant documents related to its application. In the view of Natural Language Processing, we use a corpus of five genres (articles, editorials, novels, European laws, public reports) to study the reference chains. We define five criteria to compare reference chains according textual genre : the average length of the reference chains (number of mentions), the average distance between two mentions of a reference chain, the grammatical category preferred in all mentions of the reference chains, the grammatical class of the first mentions of the reference chains, the correspondence between the first mention of a reference chain and the sentence topic. The corpus analysis reveals several differences across genres. We use these properties to configure our system according to the genre. We then discuss the results

OpenEdition

Un corpus pour optimiser l’identification automatique des chaînes de référence

Author: Longo Laurence
Publication venue: Cahiers de praxématique
Publication date: 16/10/2013
Field of study

OpenEdition

Anaphora resolution for Arabic machine translation :a case study of nafs

Author: Hamouda Wafya
Publication venue: Newcastle Univeristy
Publication date: 01/01/2014
Field of study

PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

Newcastle University eTheses

Error propagation

Author: Lê Minh Ngoc
Publication venue: Independently published
Publication date: 28/05/2021
Field of study

VU Research Portal