Search CORE

2,363 research outputs found

Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation

Author: Henderson James
Pappas Nikolaos
Popescu-Belis Andrei
Pu Xiao
Publication venue
Publication date: 05/10/2018
Field of study

This paper demonstrates that word sense disambiguation (WSD) can improve neural machine translation (NMT) by widening the source context considered when modeling the senses of potentially ambiguous words. We first introduce three adaptive clustering algorithms for WSD, based on k-means, Chinese restaurant processes, and random walks, which are then applied to large word contexts represented in a low-rank space and evaluated on SemEval shared-task data. We then learn word vectors jointly with sense vectors defined by our best WSD method, within a state-of-the-art NMT system. We show that the concatenation of these vectors, and the use of a sense selection mechanism based on the weighted average of sense vectors, outperforms several baselines including sense-aware ones. This is demonstrated by translation on five language pairs. The improvements are above one BLEU point over strong NMT baselines, +4% accuracy over all ambiguous nouns and verbs, or +20% when scored manually over several challenging words.Comment: To appear in TAC

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank- $1$ Updates

Author: Anandkumar Animashree
Ge Rong
Janzamin Majid
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we provide local and global convergence guarantees for recovering CP (Candecomp/Parafac) tensor decomposition. The main step of the proposed algorithm is a simple alternating rank-

1

update which is the alternating version of the tensor power iteration adapted for asymmetric tensors. Local convergence guarantees are established for third order tensors of rank

k

d

dimensions, when

k=o \bigl( d^{1.5} \bigr)

and the tensor components are incoherent. Thus, we can recover overcomplete tensor decomposition. We also strengthen the results to global convergence guarantees under stricter rank condition

k \le \beta d

(for arbitrary constant

\beta > 1

) through a simple initialization procedure where the algorithm is initialized by top singular vectors of random tensor slices. Furthermore, the approximate local convergence guarantees for

p

-th order tensors are also provided under rank condition

k=o \bigl( d^{p/2} \bigr)

. The guarantees also include tight perturbation analysis given noisy tensor.Comment: We have added an additional sub-algorithm to remove the (approximate) residual error left after the tensor power iteratio

arXiv.org e-Print Archive

eScholarship - University of California

Clustering and Latent Semantic Indexing Aspects of the Nonnegative Matrix Factorization

Author: Mirzal Andri
Publication venue
Publication date: 16/12/2011
Field of study

This paper provides a theoretical support for clustering aspect of the nonnegative matrix factorization (NMF). By utilizing the Karush-Kuhn-Tucker optimality conditions, we show that NMF objective is equivalent to graph clustering objective, so clustering aspect of the NMF has a solid justification. Different from previous approaches which usually discard the nonnegativity constraints, our approach guarantees the stationary point being used in deriving the equivalence is located on the feasible region in the nonnegative orthant. Additionally, since clustering capability of a matrix decomposition technique can sometimes imply its latent semantic indexing (LSI) aspect, we will also evaluate LSI aspect of the NMF by showing its capability in solving the synonymy and polysemy problems in synthetic datasets. And more extensive evaluation will be conducted by comparing LSI performances of the NMF and the singular value decomposition (SVD), the standard LSI method, using some standard datasets.Comment: 28 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX