258,822 research outputs found
Distributed Machine Learning via Sufficient Factor Broadcasting
Matrix-parametrized models, including multiclass logistic regression and
sparse coding, are used in machine learning (ML) applications ranging from
computer vision to computational biology. When these models are applied to
large-scale ML problems starting at millions of samples and tens of thousands
of classes, their parameter matrix can grow at an unexpected rate, resulting in
high parameter synchronization costs that greatly slow down distributed
learning. To address this issue, we propose a Sufficient Factor Broadcasting
(SFB) computation model for efficient distributed learning of a large family of
matrix-parameterized models, which share the following property: the parameter
update computed on each data sample is a rank-1 matrix, i.e., the outer product
of two "sufficient factors" (SFs). By broadcasting the SFs among worker
machines and reconstructing the update matrices locally at each worker, SFB
improves communication efficiency --- communication costs are linear in the
parameter matrix's dimensions, rather than quadratic --- without affecting
computational correctness. We present a theoretical convergence analysis of
SFB, and empirically corroborate its efficiency on four different
matrix-parametrized ML models
Neural-Network Quantum States, String-Bond States, and Chiral Topological States
Neural-Network Quantum States have been recently introduced as an Ansatz for
describing the wave function of quantum many-body systems. We show that there
are strong connections between Neural-Network Quantum States in the form of
Restricted Boltzmann Machines and some classes of Tensor-Network states in
arbitrary dimensions. In particular we demonstrate that short-range Restricted
Boltzmann Machines are Entangled Plaquette States, while fully connected
Restricted Boltzmann Machines are String-Bond States with a nonlocal geometry
and low bond dimension. These results shed light on the underlying architecture
of Restricted Boltzmann Machines and their efficiency at representing many-body
quantum states. String-Bond States also provide a generic way of enhancing the
power of Neural-Network Quantum States and a natural generalization to systems
with larger local Hilbert space. We compare the advantages and drawbacks of
these different classes of states and present a method to combine them
together. This allows us to benefit from both the entanglement structure of
Tensor Networks and the efficiency of Neural-Network Quantum States into a
single Ansatz capable of targeting the wave function of strongly correlated
systems. While it remains a challenge to describe states with chiral
topological order using traditional Tensor Networks, we show that
Neural-Network Quantum States and their String-Bond States extension can
describe a lattice Fractional Quantum Hall state exactly. In addition, we
provide numerical evidence that Neural-Network Quantum States can approximate a
chiral spin liquid with better accuracy than Entangled Plaquette States and
local String-Bond States. Our results demonstrate the efficiency of neural
networks to describe complex quantum wave functions and pave the way towards
the use of String-Bond States as a tool in more traditional machine-learning
applications.Comment: 15 pages, 7 figure
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Learning sophisticated feature interactions behind user behaviors is critical
in maximizing CTR for recommender systems. Despite great progress, existing
methods seem to have a strong bias towards low- or high-order interactions, or
require expertise feature engineering. In this paper, we show that it is
possible to derive an end-to-end learning model that emphasizes both low- and
high-order feature interactions. The proposed model, DeepFM, combines the power
of factorization machines for recommendation and deep learning for feature
learning in a new neural network architecture. Compared to the latest Wide \&
Deep model from Google, DeepFM has a shared input to its "wide" and "deep"
parts, with no need of feature engineering besides raw features. Comprehensive
experiments are conducted to demonstrate the effectiveness and efficiency of
DeepFM over the existing models for CTR prediction, on both benchmark data and
commercial data
Speeding up Context-based Sentence Representation Learning with Non-autoregressive Convolutional Decoding
Context plays an important role in human language understanding, thus it may
also be useful for machines learning vector representations of language. In
this paper, we explore an asymmetric encoder-decoder structure for unsupervised
context-based sentence representation learning. We carefully designed
experiments to show that neither an autoregressive decoder nor an RNN decoder
is required. After that, we designed a model which still keeps an RNN as the
encoder, while using a non-autoregressive convolutional decoder. We further
combine a suite of effective designs to significantly improve model efficiency
while also achieving better performance. Our model is trained on two different
large unlabelled corpora, and in both cases the transferability is evaluated on
a set of downstream NLP tasks. We empirically show that our model is simple and
fast while producing rich sentence representations that excel in downstream
tasks
Estimating conditional quantiles with the help of the pinball loss
The so-called pinball loss for estimating conditional quantiles is a
well-known tool in both statistics and machine learning. So far, however, only
little work has been done to quantify the efficiency of this tool for
nonparametric approaches. We fill this gap by establishing inequalities that
describe how close approximate pinball risk minimizers are to the corresponding
conditional quantile. These inequalities, which hold under mild assumptions on
the data-generating distribution, are then used to establish so-called variance
bounds, which recently turned out to play an important role in the statistical
analysis of (regularized) empirical risk minimization approaches. Finally, we
use both types of inequalities to establish an oracle inequality for support
vector machines that use the pinball loss. The resulting learning rates are
min--max optimal under some standard regularity assumptions on the conditional
quantile.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ267 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
- …