207 research outputs found
Near-optimal Coresets For Least-Squares Regression
We study (constrained) least-squares regression as well as multiple response
least-squares regression and ask the question of whether a subset of the data,
a coreset, suffices to compute a good approximate solution to the regression.
We give deterministic, low order polynomial-time algorithms to construct such
coresets with approximation guarantees, together with lower bounds indicating
that there is not much room for improvement upon our results.Comment: To appear in IEEE Transactions on Information Theor
Coreset Clustering on Small Quantum Computers
Many quantum algorithms for machine learning require access to classical data
in superposition. However, for many natural data sets and algorithms, the
overhead required to load the data set in superposition can erase any potential
quantum speedup over classical algorithms. Recent work by Harrow introduces a
new paradigm in hybrid quantum-classical computing to address this issue,
relying on coresets to minimize the data loading overhead of quantum
algorithms. We investigate using this paradigm to perform -means clustering
on near-term quantum computers, by casting it as a QAOA optimization instance
over a small coreset. We compare the performance of this approach to classical
-means clustering both numerically and experimentally on IBM Q hardware. We
are able to find data sets where coresets work well relative to random sampling
and where QAOA could potentially outperform standard -means on a coreset.
However, finding data sets where both coresets and QAOA work well--which is
necessary for a quantum advantage over -means on the entire data
set--appears to be challenging
Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms
We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques
Coresets for Regressions with Panel Data
This paper introduces the problem of coresets for regression problems to
panel data settings. We first define coresets for several variants of
regression problems with panel data and then present efficient algorithms to
construct coresets of size that depend polynomially on 1/ (where
is the error parameter) and the number of regression parameters -
independent of the number of individuals in the panel data or the time units
each individual is observed for. Our approach is based on the Feldman-Langberg
framework in which a key step is to upper bound the "total sensitivity" that is
roughly the sum of maximum influences of all individual-time pairs taken over
all possible choices of regression parameters. Empirically, we assess our
approach with synthetic and real-world datasets; the coreset sizes constructed
using our approach are much smaller than the full dataset and coresets indeed
accelerate the running time of computing the regression objective.Comment: This is a Full version of a paper to appear in NeurIPS 2020. The code
can be found in
https://github.com/huanglx12/Coresets-for-regressions-with-panel-dat
- …