831 research outputs found
Semantic Sort: A Supervised Approach to Personalized Semantic Relatedness
We propose and study a novel supervised approach to learning statistical
semantic relatedness models from subjectively annotated training examples. The
proposed semantic model consists of parameterized co-occurrence statistics
associated with textual units of a large background knowledge corpus. We
present an efficient algorithm for learning such semantic models from a
training sample of relatedness preferences. Our method is corpus independent
and can essentially rely on any sufficiently large (unstructured) collection of
coherent texts. Moreover, the approach facilitates the fitting of semantic
models for specific users or groups of users. We present the results of
extensive range of experiments from small to large scale, indicating that the
proposed method is effective and competitive with the state-of-the-art.Comment: 37 pages, 8 figures A short version of this paper was already
published at ECML/PKDD 201
Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis
The availability of on-line corpora is rapidly changing the field of natural language processing (NLP) from one dominated by theoretical models of often very specific linguistic phenomena to one guided by computational models that simultaneously account for a wide variety of phenomena that occur in real-world text. Thus far, among the best-performing and most robust systems for reading and summarizing large amounts of real-world text are knowledge-based natural language systems. These systems rely heavily on domain-specific, handcrafted knowledge to handle the myriad syntactic, semantic, and pragmatic ambiguities that pervade virtually all aspects of sentence analysis. Not surprisingly, however, generating this knowledge for new domains is time-consuming, difficult, and error-prone, and requires the expertise of computational linguists familiar with the underlying NLP system. This thesis presents Kenmore, a general framework for domain-specific knowledge acquisition for conceptual sentence analysis. To ease the acquisition of knowledge in new domains, Kenmore exploits an on-line corpus using symbolic machine learning techniques and robust sentence analysis while requiring only minimal human intervention. Unlike most approaches to knowledge acquisition for natural language systems, the framework uniformly addresses a range of subproblems in sentence analysis, each of which traditionally had required a separate computational mechanism. The thesis presents the results of using Kenmore with corpora from two real-world domains (1) to perform part-of-speech tagging, semantic feature tagging, and concept tagging of all open-class words in the corpus; (2) to acquire heuristics for part-ofspeech disambiguation, semantic feature disambiguation, and concept activation; and (3) to find the antecedents of relative pronouns
Recommended from our members
SenseLearner: Word Sense Disambiguation for All Words in Unrestricted Text
This paper describes SenseLearner, a minimally supervised word sense disambiguation system that attempts to disambiguate all content words in a text using WordNet senses
Evaluating large-scale knowledge resources across languages
This paper presents an empirical evaluation in a multilingual scenario of the semantic knowledge present on publicly available large-scale knowledge resources. The study covers a wide range of manually and automatically derived large-scale knowledge resources for English and Spanish. In order to establish a fair and neutral comparison, the knowledge resources are evaluated using the same method on two Word Sense Disambiguation tasks (Senseval-3 English and Spanish Lexical Sample Tasks). First, this study empirically demonstrates that the combination of the knowledge contained in these resources surpass the most frequent sense classi er for English. Second, we also show that this large-scale topical knowledge acquired from one language can be successfully ported to other languages.Peer ReviewedPostprint (author’s final draft
Combining Knowledge- and Corpus-based Word-Sense-Disambiguation Methods
In this paper we concentrate on the resolution of the lexical ambiguity that
arises when a given word has several different meanings. This specific task is
commonly referred to as word sense disambiguation (WSD). The task of WSD
consists of assigning the correct sense to words using an electronic dictionary
as the source of word definitions. We present two WSD methods based on two main
methodological approaches in this research area: a knowledge-based method and a
corpus-based method. Our hypothesis is that word-sense disambiguation requires
several knowledge sources in order to solve the semantic ambiguity of the
words. These sources can be of different kinds--- for example, syntagmatic,
paradigmatic or statistical information. Our approach combines various sources
of knowledge, through combinations of the two WSD methods mentioned above.
Mainly, the paper concentrates on how to combine these methods and sources of
information in order to achieve good results in the disambiguation. Finally,
this paper presents a comprehensive study and experimental work on evaluation
of the methods and their combinations
Recommended from our members
UNT-Yahoo: SuperSenseLearner: Combining SenseLearner with SuperSense and other Coarse Semantic Features
This paper discusses combining SenseLearner with SuperSence and other coarse semantic features
Studying Individual Differences in Language Comprehension: The Challenges of Item-Level Variability and Well-Matched Control Conditions
Translating experimental tasks that were designed to investigate differences between conditions at the group-level into valid and reliable instruments to measure individual differences in cognitive skills is challenging (Hedge et al., 2018; Rouder et al., 2019; Rouder & Haaf, 2019). For psycholinguists, the additional complexities associated with selecting or constructing language stimuli, and the need for appropriate well-matched baseline conditions make this endeavour particularly complex. In a typical experiment, a process-of-interest (e.g. ambiguity resolution) is targeted by contrasting performance in an experimental condition with performance in a well-matched control condition. In many cases, careful between-condition matching precludes the same participant from encountering all stimulus items. Unfortunately, solutions that work for group-level research (e.g. constructing counterbalanced experiment versions) are inappropriate for individual-differences designs. As a case study, we report an ambiguity resolution experiment that illustrates the steps that researchers can take to address this issue and assess whether their measurement instrument is both valid and reliable. On the basis of our findings, we caution against the widespread approach of using datasets from group-level studies to also answer important questions about individual differences
SCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis Using Artificial Neural Networks
In this paper, we describe a so-called screening approach for learning robust
processing of spontaneously spoken language. A screening approach is a flat
analysis which uses shallow sequences of category representations for analyzing
an utterance at various syntactic, semantic and dialog levels. Rather than
using a deeply structured symbolic analysis, we use a flat connectionist
analysis. This screening approach aims at supporting speech and language
processing by using (1) data-driven learning and (2) robustness of
connectionist networks. In order to test this approach, we have developed the
SCREEN system which is based on this new robust, learned and flat analysis.
In this paper, we focus on a detailed description of SCREEN's architecture,
the flat syntactic and semantic analysis, the interaction with a speech
recognizer, and a detailed evaluation analysis of the robustness under the
influence of noisy or incomplete input. The main result of this paper is that
flat representations allow more robust processing of spontaneous spoken
language than deeply structured representations. In particular, we show how the
fault-tolerance and learning capability of connectionist networks can support a
flat analysis for providing more robust spoken-language processing within an
overall hybrid symbolic/connectionist framework.Comment: 51 pages, Postscript. To be published in Journal of Artificial
Intelligence Research 6(1), 199
- …