Search CORE

7 research outputs found

One Homonym per Translation

Author: Hauer Bradley
Kondrak Grzegorz
Publication venue
Publication date: 23/01/2020
Field of study

The study of homonymy is vital to resolving fundamental problems in lexical semantics. In this paper, we propose four hypotheses that characterize the unique behavior of homonyms in the context of translations, discourses, collocations, and sense clusters. We present a new annotated homonym resource that allows us to test our hypotheses on existing WSD resources. The results of the experiments provide strong empirical evidence for the hypotheses. This study represents a step towards a computational method for distinguishing between homonymy and polysemy, and constructing a definitive inventory of coarse-grained senses.Comment: 8 pages, including reference

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

The use of and-coordination in terms of its syntactic (a)symmetry in argumentative essays : a corpus-based study of three university learner groups in MICUSP and NUCLE.

Author: Nguyen NhuQuynh Luu
Publication venue
Publication date: 01/01/2013
Field of study

Studies found EL learners overuse and as an additive connector at the sentence-initial position (Bolton, Hung, & Nelson, 2002), and they underuse and as a coordinator (Leung, 2005). Generally, the use of the and-coordinator has often been overlooked in corpus research and in English teaching because of its seemingly simplicity. To test previous findings about the and-coordinator and to examine the influence of English proficiency on the use of and in academic writing, three learner corpora--MICUSP-NNS (advanced level), MICUSP-NS (advanced level), and NUCLE-NNS (upper-intermediate) were compared, with regard to the use of (a)symmetric structures of the and-coordination. Each corpus contains 31 argumentative essays written by 31 university students

SHAREOK repository

One Sense per Collocation and Genre/Topic Variations

Author: David Martinez
Eneko Agirre
Publication venue
Publication date: 01/01/2000
Field of study

This paper revisits the one sense per collocation hypothesis using fine-grained sense distinctions and two different corpora

CiteSeerX

Crossref

TR-2002011: Corpus-Based Ambiguity Resolution of Biomedical Terms Using Knowledge Bases and Machine Learning

Author: Liu Hongfang
Publication venue: CUNY Academic Works
Publication date: 01/01/2002
Field of study

City University of New York

Recommended from our members

High-performance Word Sense Disambiguation with Less Manual Effort

Author: Dligach Dmitriy
Publication venue: CU Scholar
Publication date: 01/01/2010
Field of study

Supervised learning is a widely used paradigm in Natural Language Processing. This paradigm involves learning a classifier from annotated examples and applying it to unseen data. We cast word sense disambiguation, our task of interest, as a supervised learning problem. We then formulate the end goal of this dissertation: to develop a series of methods aimed at achieving the highest possible word sense disambiguation performance with the least reliance on manual effort. We begin by implementing a word sense disambiguation system, which utilizes rich linguistic features to better represent the contexts of ambiguous words. Our state-of-the-art system captures three types of linguistic features: lexical, syntactic, and semantic. Traditionally, semantic features are extracted with the help of expensive hand-crafted lexical resources. We propose a novel unsupervised approach to extracting a similar type of semantic information from unlabeled corpora. We show that incorporating this information into a classification framework leads to performance improvements. The result is a system that outperforms traditional methods while eliminating the reliance on manual effort for extracting semantic data. We then proceed by attacking the problem of reducing the manual effort from a different direction. Supervised word sense disambiguation relies on annotated data for learning sense classifiers. However, annotation is expensive since it requires a large time investment from expert labelers. We examine various annotation practices and propose several approaches for making them more efficient. We evaluate the proposed approaches and compare them to the existing ones. We show that the annotation effort can often be reduced significantly without sacrificing the performance of the models trained on the annotated data

CU Scholar Institutional Repository

Joint Discourse-aware Concept Disambiguation and Clustering

Author: Fahrni Angela Petra
Publication venue
Publication date: 01/01/2016
Field of study

This thesis addresses the tasks of concept disambiguation and clustering. Concept disambiguation is the task of linking common nouns and proper names in a text – henceforth called mentions – to their corresponding concepts in a predefined inventory. Concept clustering is the task of clustering mentions, so that all mentions in one cluster denote the same concept. In this thesis, we investigate concept disambiguation and clustering from a discourse perspective and propose a discourse-aware approach for joint concept disambiguation and clustering in the framework of Markov logic. The contributions of this thesis are fourfold: Joint Concept Disambiguation and Clustering. In previous approaches, concept disambiguation and concept clustering have been considered as two separate tasks (Schütze, 1998; Ji & Grishman, 2011). We analyze the relationship between concept disambiguation and concept clustering and argue that these two tasks can mutually support each other. We propose the – to our knowledge – first joint approach for concept disambiguation and clustering. Discourse-Aware Concept Disambiguation. One of the determining factors for concept disambiguation and clustering is the context definition. Most previous approaches use the same context definition for all mentions (Milne & Witten, 2008b; Kulkarni et al., 2009; Ratinov et al., 2011, inter alia). We approach the question which context is relevant to disambiguate a mention from a discourse perspective and state that different mentions require different notions of contexts. We state that the context that is relevant to disambiguate a mention depends on its embedding into discourse. However, how a mention is embedded into discourse depends on its denoted concept. Hence, the identification of the denoted concept and the relevant concept mutually depend on each other. We propose a binwise approach with three different context definitions and model the selection of the context definition and the disambiguation jointly. Modeling Interdependencies with Markov Logic. To model the interdependencies between concept disambiguation and concept clustering as well as the interdependencies between the context definition and the disambiguation, we use Markov logic (Domingos & Lowd, 2009). Markov logic combines first order logic with probabilities and allows us to concisely formalize these interdependencies. We investigate how we can balance between linguistic appropriateness and time efficiency and propose a hybrid approach that combines joint inference with aggregation techniques. Concept Disambiguation and Clustering beyond English: Multi- and Cross-linguality. Given the vast amount of texts written in different languages, the capability to extend an approach to cope with other languages than English is essential. We thus analyze how our approach copes with other languages than English and show that our approach largely scales across languages, even without retraining. Our approach is evaluated on multiple data sets originating from different sources (e.g. news, web) and across multiple languages. As an inventory, we use Wikipedia. We compare our approach to other approaches and show that it achieves state-of-the-art results. Furthermore, we show that joint concept disambiguating and clustering as well as joint context selection and disambiguation leads to significant improvements ceteris paribus

Heidelberger Dokumentenserver