6,750 research outputs found
A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection
Compositionality in language refers to how much the meaning of some phrase
can be decomposed into the meaning of its constituents and the way these
constituents are combined. Based on the premise that substitution by synonyms
is meaning-preserving, compositionality can be approximated as the semantic
similarity between a phrase and a version of that phrase where words have been
replaced by their synonyms. Different ways of representing such phrases exist
(e.g., vectors [1] or language models [2]), and the choice of representation
affects the measurement of semantic similarity.
We propose a new compositionality detection method that represents phrases as
ranked lists of term weights. Our method approximates the semantic similarity
between two ranked list representations using a range of well-known distance
and correlation metrics. In contrast to most state-of-the-art approaches in
compositionality detection, our method is completely unsupervised. Experiments
with a publicly available dataset of 1048 human-annotated phrases shows that,
compared to strong supervised baselines, our approach provides superior
measurement of compositionality using any of the distance and correlation
metrics considered
Evaluating Semantic Parsing against a Simple Web-based Question Answering Model
Semantic parsing shines at analyzing complex natural language that involves
composition and computation over multiple pieces of evidence. However, datasets
for semantic parsing contain many factoid questions that can be answered from a
single web document. In this paper, we propose to evaluate semantic
parsing-based question answering models by comparing them to a question
answering baseline that queries the web and extracts the answer only from web
snippets, without access to the target knowledge-base. We investigate this
approach on COMPLEXQUESTIONS, a dataset designed to focus on compositional
language, and find that our model obtains reasonable performance (35 F1
compared to 41 F1 of state-of-the-art). We find in our analysis that our model
performs well on complex questions involving conjunctions, but struggles on
questions that involve relation composition and superlatives.Comment: *sem 201
The Mechanism of Additive Composition
Additive composition (Foltz et al, 1998; Landauer and Dumais, 1997; Mitchell
and Lapata, 2010) is a widely used method for computing meanings of phrases,
which takes the average of vector representations of the constituent words. In
this article, we prove an upper bound for the bias of additive composition,
which is the first theoretical analysis on compositional frameworks from a
machine learning point of view. The bound is written in terms of collocation
strength; we prove that the more exclusively two successive words tend to occur
together, the more accurate one can guarantee their additive composition as an
approximation to the natural phrase vector. Our proof relies on properties of
natural language data that are empirically verified, and can be theoretically
derived from an assumption that the data is generated from a Hierarchical
Pitman-Yor Process. The theory endorses additive composition as a reasonable
operation for calculating meanings of phrases, and suggests ways to improve
additive compositionality, including: transforming entries of distributional
word vectors by a function that meets a specific condition, constructing a
novel type of vector representations to make additive composition sensitive to
word order, and utilizing singular value decomposition to train word vectors.Comment: More explanations on theory and additional experiments added.
Accepted by Machine Learning Journa
- …