25,083 research outputs found
Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods
Measuring the similarity of short written contexts is a fundamental problem
in Natural Language Processing. This article provides a unifying framework by
which short context problems can be categorized both by their intended
application and proposed solution. The goal is to show that various problems
and methodologies that appear quite different on the surface are in fact very
closely related. The axes by which these categorizations are made include the
format of the contexts (headed versus headless), the way in which the contexts
are to be measured (first-order versus second-order similarity), and the
information used to represent the features in the contexts (micro versus macro
views). The unifying thread that binds together many short context applications
and methods is the fact that similarity decisions must be made between contexts
that share few (if any) words in common.Comment: 23 page
The polysemy of the Spanish verb sentir: a behavioral profile analysis
This study investigates the intricate polysemy of the Spanish perception verb sentir (‘feel’) which, analogous to the more-studied visual perception verbs ver (‘see’) and mirar (‘look’), also displays an ample gamut of semantic uses in various syntactic environments. The investigation is based on a corpus-based behavioral profile (BP) analysis. Besides its methodological merits as a quantitative, systematic and verifiable approach to the study of meaning and to polysemy in particular, the BP analysis offers qualitative usage-based evidence for cognitive linguistic theorizing. With regard to the polysemy of sentir, the following questions were addressed: (1) What is the prototype of each cluster of senses? (2) How are the different senses structured: how many senses should be distinguished – i.e. which senses cluster together and which senses should be kept separately? (3) Which senses are more related to each other and which are highly distinguishable? (4) What morphosyntactic variables make them more or less distinguishable? The results show that two significant meaning clusters can be distinguished, which coincide with the division between the middle voice uses (sentirse) and the other uses (sentir). Within these clusters, a number of meaningful subclusters emerge, which seem to coincide largely with the more general semantic categories of physical, cognitive and emotional perception
Distributed representation of multi-sense words: A loss-driven approach
Word2Vec's Skip Gram model is the current state-of-the-art approach for
estimating the distributed representation of words. However, it assumes a
single vector per word, which is not well-suited for representing words that
have multiple senses. This work presents LDMI, a new model for estimating
distributional representations of words. LDMI relies on the idea that, if a
word carries multiple senses, then having a different representation for each
of its senses should lead to a lower loss associated with predicting its
co-occurring words, as opposed to the case when a single vector representation
is used for all the senses. After identifying the multi-sense words, LDMI
clusters the occurrences of these words to assign a sense to each occurrence.
Experiments on the contextual word similarity task show that LDMI leads to
better performance than competing approaches.Comment: PAKDD 2018 Best paper award runner-u
A role for the developing lexicon in phonetic category acquisition
Infants segment words from fluent speech during the same period when they are learning phonetic categories, yet accounts of phonetic category acquisition typically ignore information about the words in which sounds appear. We use a Bayesian model to illustrate how feedback from segmented words might constrain phonetic category learning by providing information about which sounds occur together in words. Simulations demonstrate that word-level information can successfully disambiguate overlapping English vowel categories. Learning patterns in the model are shown to parallel human behavior from artificial language learning tasks. These findings point to a central role for the developing lexicon in phonetic category acquisition and provide a framework for incorporating top-down constraints into models of category learning
Make Me Walk, Make Me Talk, Do Whatever You Please: Barbie and Exceptions
Barbie represents an aspiration to an ideal and also a never-ending mutability. Barbie is the perfect woman, and she is also grotesque, plasticized hyperreality, presenting a femininity exaggerated to the point of caricature. Barbie’s marketplace success, combined with (and likely related to) her overlapping and contradictory meanings, also allow her to embody some key exceptions to copyright and trademark law. Though Mattel’s lawsuits were not responsible for the initial recognition of those exceptions, they illuminate key principles and contrasts in American law. Mattel attempted to use both copyright and trademark to control the meaning of Barbie, reflecting a trend towards such overlapping claims. In order to ensure that their combined scope is no greater than the sum of their parts, both trademark and copyright defenses ought to be considered together. The Barbie cases highlight the problem that overlaps between the two regimes can challenge the very idea of IP boundaries, unless robust defenses exist against overclaiming
Natural language understanding: instructions for (Present and Future) use
In this paper I look at Natural Language Understanding, an area of Natural Language Processing aimed at making sense of text, through the lens of a visionary future: what do we expect a machine should be able to understand? and what are the key dimensions that require the attention of researchers to make this dream come true
In search of grammaticalization in synchronic dialect data: General extenders in north-east England
In this paper, we draw on a socially stratified corpus of dialect data collected in north-east England to test recent proposals that grammaticalization processes are implicated in the synchronic variability of general extenders (GEs), i.e., phrase- or clause-final constructions such as and that and or something. Combining theoretical insights from the framework of grammaticalization with the empirical methods of variationist sociolinguistics, we operationalize key diagnostics of grammaticalization (syntagmatic length, decategorialization, semantic-pragmatic change) as independent factor groups in the quantitative analysis of GE variability. While multivariate analyses reveal rapid changes in apparent time to the social conditioning of some GE variants in our data, they do not reveal any evidence of systematic changes in the linguistic conditioning of variants in apparent time that would confirm an interpretation of ongoing grammaticalization. These results lead us to questio
- …