1,405 research outputs found
Coordinate Descent with Bandit Sampling
Coordinate descent methods usually minimize a cost function by updating a
random decision variable (corresponding to one coordinate) at a time. Ideally,
we would update the decision variable that yields the largest decrease in the
cost function. However, finding this coordinate would require checking all of
them, which would effectively negate the improvement in computational
tractability that coordinate descent is intended to afford. To address this, we
propose a new adaptive method for selecting a coordinate. First, we find a
lower bound on the amount the cost function decreases when a coordinate is
updated. We then use a multi-armed bandit algorithm to learn which coordinates
result in the largest lower bound by interleaving this learning with
conventional coordinate descent updates except that the coordinate is selected
proportionately to the expected decrease. We show that our approach improves
the convergence of coordinate descent methods both theoretically and
experimentally.Comment: appearing at NeurIPS 201
Supervised Learning Under Distributed Features
This work studies the problem of learning under both large datasets and
large-dimensional feature space scenarios. The feature information is assumed
to be spread across agents in a network, where each agent observes some of the
features. Through local cooperation, the agents are supposed to interact with
each other to solve an inference problem and converge towards the global
minimizer of an empirical risk. We study this problem exclusively in the primal
domain, and propose new and effective distributed solutions with guaranteed
convergence to the minimizer with linear rate under strong convexity. This is
achieved by combining a dynamic diffusion construction, a pipeline strategy,
and variance-reduced techniques. Simulation results illustrate the conclusions
Dual-Free Stochastic Decentralized Optimization with Variance Reduction
We consider the problem of training machine learning models on distributed
data in a decentralized way. For finite-sum problems, fast single-machine
algorithms for large datasets rely on stochastic updates combined with variance
reduction. Yet, existing decentralized stochastic algorithms either do not
obtain the full speedup allowed by stochastic updates, or require oracles that
are more expensive than regular gradients. In this work, we introduce a
Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only
requires computing stochastic gradients of the local functions, and is
computationally as fast as a standard stochastic variance-reduced algorithms
run on a fraction of the dataset, where is the number of nodes. To
derive DVR, we use Bregman coordinate descent on a well-chosen dual problem,
and obtain a dual-free algorithm using a specific Bregman divergence. We give
an accelerated version of DVR based on the Catalyst framework, and illustrate
its effectiveness with simulations on real data
Distributed Dual Coordinate Ascent with Imbalanced Data on a General Tree Network
In this paper, we investigate the impact of imbalanced data on the
convergence of distributed dual coordinate ascent in a tree network for solving
an empirical loss minimization problem in distributed machine learning. To
address this issue, we propose a method called delayed generalized distributed
dual coordinate ascent that takes into account the information of the
imbalanced data, and provide the analysis of the proposed algorithm. Numerical
experiments confirm the effectiveness of our proposed method in improving the
convergence speed of distributed dual coordinate ascent in a tree network.Comment: To be published in IEEE 2023 Workshop on Machine Learning for Signal
Processing (MLSP
- …