4,158 research outputs found
Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation
Existing approaches to automatic VerbNet-style verb classification are
heavily dependent on feature engineering and therefore limited to languages
with mature NLP pipelines. In this work, we propose a novel cross-lingual
transfer method for inducing VerbNets for multiple languages. To the best of
our knowledge, this is the first study which demonstrates how the architectures
for learning word embeddings can be applied to this challenging
syntactic-semantic task. Our method uses cross-lingual translation pairs to tie
each of the six target languages into a bilingual vector space with English,
jointly specialising the representations to encode the relational information
from English VerbNet. A standard clustering algorithm is then run on top of the
VerbNet-specialised representations, using vector dimensions as features for
learning verb classes. Our results show that the proposed cross-lingual
transfer approach sets new state-of-the-art verb classification performance
across all six target languages explored in this work.Comment: EMNLP 2017 (long paper
Using distributional similarity to organise biomedical terminology
We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy
Experimental Support for a Categorical Compositional Distributional Model of Meaning
Modelling compositional meaning for sentences using empirical distributional
methods has been a challenge for computational linguists. We implement the
abstract categorical model of Coecke et al. (arXiv:1003.4394v1 [cs.CL]) using
data from the BNC and evaluate it. The implementation is based on unsupervised
learning of matrices for relational words and applying them to the vectors of
their arguments. The evaluation is based on the word disambiguation task
developed by Mitchell and Lapata (2008) for intransitive sentences, and on a
similar new experiment designed for transitive sentences. Our model matches the
results of its competitors in the first experiment, and betters them in the
second. The general improvement in results with increase in syntactic
complexity showcases the compositional power of our model.Comment: 11 pages, to be presented at EMNLP 2011, to be published in
Proceedings of the 2011 Conference on Empirical Methods in Natural Language
Processin
Distributional Measures of Semantic Distance: A Survey
The ability to mimic human notions of semantic distance has widespread
applications. Some measures rely only on raw text (distributional measures) and
some rely on knowledge sources such as WordNet. Although extensive studies have
been performed to compare WordNet-based measures with human judgment, the use
of distributional measures as proxies to estimate semantic distance has
received little attention. Even though they have traditionally performed poorly
when compared to WordNet-based measures, they lay claim to certain uniquely
attractive features, such as their applicability in resource-poor languages and
their ability to mimic both semantic similarity and semantic relatedness.
Therefore, this paper presents a detailed study of distributional measures.
Particular attention is paid to flesh out the strengths and limitations of both
WordNet-based and distributional measures, and how distributional measures of
distance can be brought more in line with human notions of semantic distance.
We conclude with a brief discussion of recent work on hybrid measures
A discriminative approach to grounded spoken language understanding in interactive robotics
Spoken Language Understanding in Interactive Robotics provides computational models of human-machine communication based on the vocal input. However, robots operate in specific environments and the correct interpretation of the spoken sentences depends on the physical, cognitive and linguistic aspects triggered by the operational environment. Grounded language processing should exploit both the physical constraints of the context as well as knowledge assumptions of the robot. These include the subjective perception of the environment that explicitly affects linguistic reasoning. In this work, a standard linguistic pipeline for semantic parsing is extended toward a form of perceptually informed natural language processing that combines discriminative learning and distributional semantics. Empirical results achieve up to a 40% of relative error reduction
- …