62,544 research outputs found
Retrieving with good sense
Although always present in text, word sense ambiguity only recently became regarded as a problem to information
retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in
disambiguation research. This paper first outlines this research and surveys the resulting efforts in information
retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt
from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval
Recommended from our members
Beyond definition: Organising semantic information in bilingual dictionaries
This paper considers the process of organising semantic information in bilingual dictionaries with diachronic coverage, from selecting the textual source-material to designing the entries. The discussion centres on practical aspects of ancient Greek lexicography. First, the traditional semantic frameworks are described. Then, more recent approaches are noted, notably those of Adrados and of Chadwick, both of which aim to integrate contextual data within a semantic framework. Since the relevance of contextual information varies with lemma part of speech, different configurations are required for entries describing nouns, adjectives, and verbs. These are illustrated by three entries from a Greek-English dictionary currently being written at Cambridge. In order to organise data to this level of specificity, stylistic templates are indispensable, and digital software provides a means of providing them. However, systems designed for writing new dictionaries require different features from those designed for encoding pre-existing texts. A description is given of how the lexicographic requirements of the Cambridge dictionary were met by a user-designed system
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
Hedonic and Transcendent Conceptions of Value
In this paper we introduce a conceptual distinction between a hedonic and transcendent conception of value. We posit three linguistic earmarks by which one can distinguish these conceptions of value. We seek validation for the conceptual distinctions by examining the language contained in reviews of cars and reviews of paintings. In undertaking the empirical examination, we draw on the work of M.A.K. Halliday to identify clauses as fundamental units of meaning and to specify process types that can be mapped onto theoretical distinctions between the two conceptions of value. Extensions of this research are discussed
The polysemy of the Spanish verb sentir: a behavioral profile analysis
This study investigates the intricate polysemy of the Spanish perception verb sentir (‘feel’) which, analogous to the more-studied visual perception verbs ver (‘see’) and mirar (‘look’), also displays an ample gamut of semantic uses in various syntactic environments. The investigation is based on a corpus-based behavioral profile (BP) analysis. Besides its methodological merits as a quantitative, systematic and verifiable approach to the study of meaning and to polysemy in particular, the BP analysis offers qualitative usage-based evidence for cognitive linguistic theorizing. With regard to the polysemy of sentir, the following questions were addressed: (1) What is the prototype of each cluster of senses? (2) How are the different senses structured: how many senses should be distinguished – i.e. which senses cluster together and which senses should be kept separately? (3) Which senses are more related to each other and which are highly distinguishable? (4) What morphosyntactic variables make them more or less distinguishable? The results show that two significant meaning clusters can be distinguished, which coincide with the division between the middle voice uses (sentirse) and the other uses (sentir). Within these clusters, a number of meaningful subclusters emerge, which seem to coincide largely with the more general semantic categories of physical, cognitive and emotional perception
Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell
This paper presents an approach for investigating the nature of semantic
information captured by word embeddings. We propose a method that extends an
existing human-elicited semantic property dataset with gold negative examples
using crowd judgments. Our experimental approach tests the ability of
supervised classifiers to identify semantic features in word embedding vectors
and com- pares this to a feature-identification method based on full vector
cosine similarity. The idea behind this method is that properties identified by
classifiers, but not through full vector comparison are captured by embeddings.
Properties that cannot be identified by either method are not. Our results
provide an initial indication that semantic properties relevant for the way
entities interact (e.g. dangerous) are captured, while perceptual information
(e.g. colors) is not represented. We conclude that, though preliminary, these
results show that our method is suitable for identifying which properties are
captured by embeddings.Comment: Accepted to the EMNLP workshop "Analyzing and interpreting neural
networks for NLP
- …