174 research outputs found
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
Implanting Rational Knowledge into Distributed Representation at Morpheme Level
Previously, researchers paid no attention to the creation of unambiguous
morpheme embeddings independent from the corpus, while such information plays
an important role in expressing the exact meanings of words for parataxis
languages like Chinese. In this paper, after constructing the Chinese lexical
and semantic ontology based on word-formation, we propose a novel approach to
implanting the structured rational knowledge into distributed representation at
morpheme level, naturally avoiding heavy disambiguation in the corpus. We
design a template to create the instances as pseudo-sentences merely from the
pieces of knowledge of morphemes built in the lexicon. To exploit hierarchical
information and tackle the data sparseness problem, the instance proliferation
technique is applied based on similarity to expand the collection of
pseudo-sentences. The distributed representation for morphemes can then be
trained on these pseudo-sentences using word2vec. For evaluation, we validate
the paradigmatic and syntagmatic relations of morpheme embeddings, and apply
the obtained embeddings to word similarity measurement, achieving significant
improvements over the classical models by more than 5 Spearman scores or 8
percentage points, which shows very promising prospects for adoption of the new
source of knowledge.Comment: AAAI 201
- …