112 research outputs found
A Step toward Compositional Semantics: E-HowNet a Lexical Semantic Representation System
PACLIC 23 / City University of Hong Kong / 3-5 December 200
Semantic Representation and Composition for Unknown Compounds in E-HowNet
PACLIC 20 / Wuhan, China / 1-3 November, 200
Implanting Rational Knowledge into Distributed Representation at Morpheme Level
Previously, researchers paid no attention to the creation of unambiguous
morpheme embeddings independent from the corpus, while such information plays
an important role in expressing the exact meanings of words for parataxis
languages like Chinese. In this paper, after constructing the Chinese lexical
and semantic ontology based on word-formation, we propose a novel approach to
implanting the structured rational knowledge into distributed representation at
morpheme level, naturally avoiding heavy disambiguation in the corpus. We
design a template to create the instances as pseudo-sentences merely from the
pieces of knowledge of morphemes built in the lexicon. To exploit hierarchical
information and tackle the data sparseness problem, the instance proliferation
technique is applied based on similarity to expand the collection of
pseudo-sentences. The distributed representation for morphemes can then be
trained on these pseudo-sentences using word2vec. For evaluation, we validate
the paradigmatic and syntagmatic relations of morpheme embeddings, and apply
the obtained embeddings to word similarity measurement, achieving significant
improvements over the classical models by more than 5 Spearman scores or 8
percentage points, which shows very promising prospects for adoption of the new
source of knowledge.Comment: AAAI 201
- …