3,131 research outputs found
Shampoo: Preconditioned Stochastic Tensor Optimization
Preconditioned gradient methods are among the most general and powerful tools
in optimization. However, preconditioning requires storing and manipulating
prohibitively large matrices. We describe and analyze a new structure-aware
preconditioning algorithm, called Shampoo, for stochastic optimization over
tensor spaces. Shampoo maintains a set of preconditioning matrices, each of
which operates on a single dimension, contracting over the remaining
dimensions. We establish convergence guarantees in the stochastic convex
setting, the proof of which builds upon matrix trace inequalities. Our
experiments with state-of-the-art deep learning models show that Shampoo is
capable of converging considerably faster than commonly used optimizers.
Although it involves a more complex update rule, Shampoo's runtime per step is
comparable to that of simple gradient methods such as SGD, AdaGrad, and Adam
On Multi-Relational Link Prediction with Bilinear Models
We study bilinear embedding models for the task of multi-relational link
prediction and knowledge graph completion. Bilinear models belong to the most
basic models for this task, they are comparably efficient to train and use, and
they can provide good prediction performance. The main goal of this paper is to
explore the expressiveness of and the connections between various bilinear
models proposed in the literature. In particular, a substantial number of
models can be represented as bilinear models with certain additional
constraints enforced on the embeddings. We explore whether or not these
constraints lead to universal models, which can in principle represent every
set of relations, and whether or not there are subsumption relationships
between various models. We report results of an independent experimental study
that evaluates recent bilinear models in a common experimental setup. Finally,
we provide evidence that relation-level ensembles of multiple bilinear models
can achieve state-of-the art prediction performance
Complex Embeddings for Simple Link Prediction
In statistical relational learning, the link prediction problem is key to
automatically understand the structure of large knowledge bases. As in previous
studies, we propose to solve this problem through latent factorization.
However, here we make use of complex valued embeddings. The composition of
complex embeddings can handle a large variety of binary relations, among them
symmetric and antisymmetric relations. Compared to state-of-the-art models such
as Neural Tensor Network and Holographic Embeddings, our approach based on
complex embeddings is arguably simpler, as it only uses the Hermitian dot
product, the complex counterpart of the standard dot product between real
vectors. Our approach is scalable to large datasets as it remains linear in
both space and time, while consistently outperforming alternative approaches on
standard link prediction benchmarks.Comment: 10+2 pages, accepted at ICML 201
- …