CORE
🇺🇦Â
 make metadata, not war
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming
Authors
Gregory Dexter
Petros Drineas
David P. Woodruff
Taisuke Yasuda
Publication date
29 October 2023
Publisher
View
on
arXiv
Abstract
Sketching algorithms have recently proven to be a powerful approach both for designing low-space streaming algorithms as well as fast polynomial time approximation schemes (PTAS). In this work, we develop new techniques to extend the applicability of sketching-based approaches to the sparse dictionary learning and the Euclidean
k
k
k
-means clustering problems. In particular, we initiate the study of the challenging setting where the dictionary/clustering assignment for each of the
n
n
n
input points must be output, which has surprisingly received little attention in prior work. On the fast algorithms front, we obtain a new approach for designing PTAS's for the
k
k
k
-means clustering problem, which generalizes to the first PTAS for the sparse dictionary learning problem. On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and
k
k
k
-means clustering. In particular, given a design matrix
A
∈
R
n
×
d
\mathbf A\in\mathbb R^{n\times d}
A
∈
R
n
×
d
in a turnstile stream, we show an
O
~
(
n
r
/
ϵ
2
+
d
k
/
ϵ
)
\tilde O(nr/\epsilon^2 + dk/\epsilon)
O
~
(
n
r
/
ϵ
2
+
d
k
/
ϵ
)
space upper bound for
r
r
r
-sparse dictionary learning of size
k
k
k
, an
O
~
(
n
/
ϵ
2
+
d
k
/
ϵ
)
\tilde O(n/\epsilon^2 + dk/\epsilon)
O
~
(
n
/
ϵ
2
+
d
k
/
ϵ
)
space upper bound for
k
k
k
-means clustering, as well as an
O
~
(
n
)
\tilde O(n)
O
~
(
n
)
space upper bound for
k
k
k
-means clustering on random order row insertion streams with a natural "bounded sensitivity" assumption. On the lower bounds side, we obtain a general
Ω
~
(
n
/
ϵ
+
d
k
/
ϵ
)
\tilde\Omega(n/\epsilon + dk/\epsilon)
Ω
~
(
n
/
ϵ
+
d
k
/
ϵ
)
lower bound for
k
k
k
-means clustering, as well as an
Ω
~
(
n
/
ϵ
2
)
\tilde\Omega(n/\epsilon^2)
Ω
~
(
n
/
ϵ
2
)
lower bound for algorithms which can estimate the cost of a single fixed set of candidate centers.Comment: To appear in NeurIPS 202
Similar works
Full text
Available Versions
arXiv.org e-Print Archive
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:arXiv.org:2310.19068
Last time updated on 18/01/2024