516 research outputs found
A common semantic space for monolingual and cross-lingual meta-embeddings
This master’s thesis presents a new technique for creating monolingual and cross-lingual meta-embeddings. Our method integrates multiple word embeddings created from complementary techniques, textual sources, knowledge bases and languages. Existing word vectors are projected to a common semantic space using linear transformations and averaging. With our method the resulting meta-embeddings maintain the dimensionality of the original embeddings without losing information while dealing with the out-of-vocabulary (OOV) problem. Furthermore, empirical evaluation demonstrates the effectiveness of our technique with respect to previous work on various intrinsic and extrinsic multilingual evaluations
Cross-Language Question Re-Ranking
We study how to find relevant questions in community forums when the language
of the new questions is different from that of the existing questions in the
forum. In particular, we explore the Arabic-English language pair. We compare a
kernel-based system with a feed-forward neural network in a scenario where a
large parallel corpus is available for training a machine translation system,
bilingual dictionaries, and cross-language word embeddings. We observe that
both approaches degrade the performance of the system when working on the
translated text, especially the kernel-based system, which depends heavily on a
syntactic kernel. We address this issue using a cross-language tree kernel,
which compares the original Arabic tree to the English trees of the related
questions. We show that this kernel almost closes the performance gap with
respect to the monolingual system. On the neural network side, we use the
parallel corpus to train cross-language embeddings, which we then use to
represent the Arabic input and the English related questions in the same space.
The results also improve to close to those of the monolingual neural network.
Overall, the kernel system shows a better performance compared to the neural
network in all cases.Comment: SIGIR-2017; Community Question Answering; Cross-language Approaches;
Question Retrieval; Kernel-based Methods; Neural Networks; Distributed
Representation
- …