28 research outputs found
Accelerating Random Kaczmarz Algorithm Based on Clustering Information
Kaczmarz algorithm is an efficient iterative algorithm to solve
overdetermined consistent system of linear equations. During each updating
step, Kaczmarz chooses a hyperplane based on an individual equation and
projects the current estimate for the exact solution onto that space to get a
new estimate. Many vairants of Kaczmarz algorithms are proposed on how to
choose better hyperplanes. Using the property of randomly sampled data in
high-dimensional space, we propose an accelerated algorithm based on clustering
information to improve block Kaczmarz and Kaczmarz via Johnson-Lindenstrauss
lemma. Additionally, we theoretically demonstrate convergence improvement on
block Kaczmarz algorithm
Multi-consensus Decentralized Accelerated Gradient Descent
This paper considers the decentralized optimization problem, which has
applications in large scale machine learning, sensor networks, and control
theory. We propose a novel algorithm that can achieve near optimal
communication complexity, matching the known lower bound up to a logarithmic
factor of the condition number of the problem. Our theoretical results give
affirmative answers to the open problem on whether there exists an algorithm
that can achieve a communication complexity (nearly) matching the lower bound
depending on the global condition number instead of the local one. Moreover,
the proposed algorithm achieves the optimal computation complexity matching the
lower bound up to universal constants. Furthermore, to achieve a linear
convergence rate, our algorithm \emph{doesn't} require the individual functions
to be (strongly) convex. Our method relies on a novel combination of known
techniques including Nesterov's accelerated gradient descent, multi-consensus
and gradient-tracking. The analysis is new, and may be applied to other related
problems. Empirical studies demonstrate the effectiveness of our method for
machine learning applications
Snap-Shot Decentralized Stochastic Gradient Tracking Methods
In decentralized optimization, agents form a network and only communicate
with their neighbors, which gives advantages in data ownership, privacy, and
scalability. At the same time, decentralized stochastic gradient descent
(\texttt{SGD}) methods, as popular decentralized algorithms for training
large-scale machine learning models, have shown their superiority over
centralized counterparts. Distributed stochastic gradient
tracking~(\texttt{DSGT})~\citep{pu2021distributed} has been recognized as the
popular and state-of-the-art decentralized \texttt{SGD} method due to its
proper theoretical guarantees. However, the theoretical analysis of
\dsgt~\citep{koloskova2021improved} shows that its iteration complexity is
, where is a double stochastic mixing matrix that presents the
network topology and is a parameter that depends on . Thus, it
indicates that the convergence property of \texttt{DSGT} is heavily affected by
the topology of the communication network. To overcome the weakness of
\texttt{DSGT}, we resort to the snap-shot gradient tracking skill and propose
two novel algorithms. We further justify that the proposed two algorithms are
more robust to the topology of communication networks under similar algorithmic
structures and the same communication strategy to \dsgt~. Compared with \dsgt,
their iteration complexity are and which reduce the impact on
network topology (no )
Revisiting Co-Occurring Directions: Sharper Analysis and Efficient Algorithm for Sparse Matrices
We study the streaming model for approximate matrix multiplication (AMM). We
are interested in the scenario that the algorithm can only take one pass over
the data with limited memory. The state-of-the-art deterministic sketching
algorithm for streaming AMM is the co-occurring directions (COD), which has
much smaller approximation errors than randomized algorithms and outperforms
other deterministic sketching methods empirically. In this paper, we provide a
tighter error bound for COD whose leading term considers the potential
approximate low-rank structure and the correlation of input matrices. We prove
COD is space optimal with respect to our improved error bound. We also propose
a variant of COD for sparse matrices with theoretical guarantees. The
experiments on real-world sparse datasets show that the proposed algorithm is
more efficient than baseline methods