10 research outputs found
CountSketches, Feature Hashing and the Median of Three
In this paper, we revisit the classic CountSketch method, which is a sparse,
random projection that transforms a (high-dimensional) Euclidean vector to
a vector of dimension , where are integer parameters. It
is known that even for , a CountSketch allows estimating coordinates of
with variance bounded by . For , the estimator takes
the median of independent estimates, and the probability that the
estimate is off by more than is exponentially small in
. This suggests choosing to be logarithmic in a desired inverse failure
probability. However, implementations of CountSketch often use a small,
constant . Previous work only predicts a constant factor improvement in this
setting.
Our main contribution is a new analysis of Count-Sketch, showing an
improvement in variance to when .
That is, the variance decreases proportionally to , asymptotically for
large enough . We also study the variance in the setting where an inner
product is to be estimated from two CountSketches. This finding suggests that
the Feature Hashing method, which is essentially identical to CountSketch but
does not make use of the median estimator, can be made more reliable at a small
cost in settings where using a median estimator is possible.
We confirm our theoretical findings in experiments and thereby help justify
why a small constant number of estimates often suffice in practice. Our
improved variance bounds are based on new general theorems about the variance
and higher moments of the median of i.i.d. random variables that may be of
independent interest
Private Federated Frequency Estimation: Adapting to the Hardness of the Instance
In federated frequency estimation (FFE), multiple clients work together to
estimate the frequencies of their collective data by communicating with a
server that respects the privacy constraints of Secure Summation (SecSum), a
cryptographic multi-party computation protocol that ensures that the server can
only access the sum of client-held vectors. For single-round FFE, it is known
that count sketching is nearly information-theoretically optimal for achieving
the fundamental accuracy-communication trade-offs [Chen et al., 2022]. However,
we show that under the more practical multi-round FEE setting, simple
adaptations of count sketching are strictly sub-optimal, and we propose a novel
hybrid sketching algorithm that is provably more accurate. We also address the
following fundamental question: how should a practitioner set the sketch size
in a way that adapts to the hardness of the underlying problem? We propose a
two-phase approach that allows for the use of a smaller sketch size for simpler
problems (e.g., near-sparse or light-tailed distributions). We conclude our
work by showing how differential privacy can be added to our algorithm and
verifying its superior performance through extensive experiments conducted on
large-scale datasets.Comment: NeurIPS 2023 camera ready versio
Correlation Aware Sparsified Mean Estimation Using Random Projection
We study the problem of communication-efficient distributed vector mean
estimation, a commonly used subroutine in distributed optimization and
Federated Learning (FL). Rand- sparsification is a commonly used technique
to reduce communication cost, where each client sends of its
coordinates to the server. However, Rand- is agnostic to any correlations,
that might exist between clients in practical scenarios. The recently proposed
Rand--Spatial estimator leverages the cross-client correlation information
at the server to improve Rand-'s performance. Yet, the performance of
Rand--Spatial is suboptimal. We propose the Rand-Proj-Spatial estimator with
a more flexible encoding-decoding procedure, which generalizes the encoding of
Rand- by projecting the client vectors to a random -dimensional subspace.
We utilize Subsampled Randomized Hadamard Transform (SRHT) as the projection
matrix and show that Rand-Proj-Spatial with SRHT outperforms Rand--Spatial,
using the correlation information more efficiently. Furthermore, we propose an
approach to incorporate varying degrees of correlation and suggest a practical
variant of Rand-Proj-Spatial when the correlation information is not available
to the server. Experiments on real-world distributed optimization tasks
showcase the superior performance of Rand-Proj-Spatial compared to
Rand--Spatial and other more sophisticated sparsification techniques.Comment: 32 pages, 13 figures. Proceedings of the 37th Conference on Neural
Information Processing Systems (NeurIPS 2023), New Orleans, US
Improved Frequency Estimation Algorithms with and without Predictions
Estimating frequencies of elements appearing in a data stream is a key task
in large-scale data analysis. Popular sketching approaches to this problem
(e.g., CountMin and CountSketch) come with worst-case guarantees that
probabilistically bound the error of the estimated frequencies for any possible
input. The work of Hsu et al. (2019) introduced the idea of using machine
learning to tailor sketching algorithms to the specific data distribution they
are being run on. In particular, their learning-augmented frequency estimation
algorithm uses a learned heavy-hitter oracle which predicts which elements will
appear many times in the stream. We give a novel algorithm, which in some
parameter regimes, already theoretically outperforms the learning based
algorithm of Hsu et al. without the use of any predictions. Augmenting our
algorithm with heavy-hitter predictions further reduces the error and improves
upon the state of the art. Empirically, our algorithms achieve superior
performance in all experiments compared to prior approaches.Comment: NeurIPS 202