Search CORE

3 research outputs found

Recommended from our members

Cross-lingual semantic specialization via lexical relation induction

Author: Glavaš G
Korhonen Anna-Leena
Ponti Edoardo
Reichart R
Vulić I
Publication venue: 'Organisation for Economic Co-Operation and Development (OECD)'
Publication date: 01/01/2020
Field of study

Semantic specialization integrates structured linguistic knowledge from external resources (such as lexical relations in WordNet) into pretrained distributional vectors in the form of constraints. However, this technique cannot be leveraged in many languages, because their structured external resources are typically incomplete or non-existent. To bridge this gap, we propose a novel method that transfers specialization from a resource-rich source language (English) to virtually any target language. Our specialization transfer comprises two crucial steps: 1) Inducing noisy constraints in the target language through automatic word translation; and 2) Filtering the noisy constraints via a state-of-the-art relation prediction model trained on the source language constraints. This allows us to specialize any set of distributional vectors in the target language with the refined constraints. We prove the effectiveness of our method through intrinsic word similarity evaluation in 8 languages, and with 3 downstream tasks in 5 languages: lexical simplification, dialog state tracking, and semantic textual similarity. The gains over the previous state-of-art specialization methods are substantial and consistent across languages. Our results also suggest that the transfer method is effective even for lexically distant source-target language pairs. Finally, as a by-product, our method produces lists of WordNet-style lexical relations in resource-poor languages

Apollo (Cambridge)

Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

Author: Chen H
Liu F
Liu Q
Lu J
Porter A
Zhang G
Zhang Y
Publication venue: 'Elsevier BV'
Publication date: 01/11/2018
Field of study

© 2018 All rights reserved. Topic extraction presents challenges for the bibliometric community, and its performance still depends on human intervention and its practical areas. This paper proposes a novel kernel k-means clustering method incorporated with a word embedding model to create a solution that effectively extracts topics from bibliometric data. The experimental results of a comparison of this method with four clustering baselines (i.e., k-means, fuzzy c-means, principal component analysis, and topic models) on two bibliometric datasets demonstrate its effectiveness across either a relatively broad range of disciplines or a given domain. An empirical study on bibliometric topic extraction from articles published by three top-tier bibliometric journals between 2000 and 2017, supported by expert knowledge-based evaluations, provides supplemental evidence of the method's ability on topic extraction. Additionally, this empirical analysis reveals insights into both overlapping and diverse research interests among the three journals that would benefit journal publishers, editorial boards, and research communities

OPUS - University of Technology Sydney

Semantic structure-based word embedding by incorporating concept convergence and word divergence

Author: Gao Y
Huang H
Liu Q
Lu J
Xuan J
Zhang G
Publication venue
Publication date: 01/01/2018
Field of study

Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Representing the semantics of words is a fundamental task in text processing. Several research studies have shown that text and knowledge bases (KBs) are complementary sources for word embedding learning. Most existing methods only consider relationships within word-pairs in the usage of KBs. We argue that the structural information of well-organized words within the KBs is able to convey more effective and stable knowledge in capturing semantics of words. In this paper, we propose a semantic structure-based word embedding method, and introduce concept convergence and word divergence to reveal semantic structures in the word embedding learning process. To assess the effectiveness of our method, we use WordNet for training and conduct extensive experiments on word similarity, word analogy, text classification and query expansion. The experimental results show that our method outperforms state-of-the-art methods, including the methods trained solely on the corpus, and others trained on the corpus and the KBs

OPUS - University of Technology Sydney

Association for the Advancement of Artificial Intelligence: AAAI Publications