Search CORE

10 research outputs found

CountSketches, Feature Hashing and the Median of Three

Author: Larsen Kasper Green
Pagh Rasmus
Tětek Jakub
Publication venue
Publication date: 01/01/2021
Field of study

In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector

v

to a vector of dimension

(2t-1) s

, where

t, s > 0

are integer parameters. It is known that even for

t=1

, a CountSketch allows estimating coordinates of

v

with variance bounded by

\|v\|_2^2/s

. For

t > 1

, the estimator takes the median of

2t-1

independent estimates, and the probability that the estimate is off by more than

2 \|v\|_2/\sqrt{s}

is exponentially small in

t

. This suggests choosing

t

to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant

t

. Previous work only predicts a constant factor improvement in this setting. Our main contribution is a new analysis of Count-Sketch, showing an improvement in variance to

O(\min\{\|v\|_1^2/s^2,\|v\|_2^2/s\})

when

t > 1

. That is, the variance decreases proportionally to

s^{-2}

, asymptotically for large enough

s

. We also study the variance in the setting where an inner product is to be estimated from two CountSketches. This finding suggests that the Feature Hashing method, which is essentially identical to CountSketch but does not make use of the median estimator, can be made more reliable at a small cost in settings where using a median estimator is possible. We confirm our theoretical findings in experiments and thereby help justify why a small constant number of estimates often suffice in practice. Our improved variance bounds are based on new general theorems about the variance and higher moments of the median of i.i.d. random variables that may be of independent interest

arXiv.org e-Print Archive

Copenhagen University Research Information System

Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

Author: Braverman Vladimir
Kairouz Peter
Wu Jingfeng
Zhu Wennan
Publication venue
Publication date: 02/12/2023
Field of study

In federated frequency estimation (FFE), multiple clients work together to estimate the frequencies of their collective data by communicating with a server that respects the privacy constraints of Secure Summation (SecSum), a cryptographic multi-party computation protocol that ensures that the server can only access the sum of client-held vectors. For single-round FFE, it is known that count sketching is nearly information-theoretically optimal for achieving the fundamental accuracy-communication trade-offs [Chen et al., 2022]. However, we show that under the more practical multi-round FEE setting, simple adaptations of count sketching are strictly sub-optimal, and we propose a novel hybrid sketching algorithm that is provably more accurate. We also address the following fundamental question: how should a practitioner set the sketch size in a way that adapts to the hardness of the underlying problem? We propose a two-phase approach that allows for the use of a smaller sketch size for simpler problems (e.g., near-sparse or light-tailed distributions). We conclude our work by showing how differential privacy can be added to our algorithm and verifying its superior performance through extensive experiments conducted on large-scale datasets.Comment: NeurIPS 2023 camera ready versio

arXiv.org e-Print Archive

Correlation Aware Sparsified Mean Estimation Using Random Projection

Author: Jiang Shuli
Joshi Gauri
Sharma Pranay
Publication venue
Publication date: 28/10/2023
Field of study

We study the problem of communication-efficient distributed vector mean estimation, a commonly used subroutine in distributed optimization and Federated Learning (FL). Rand-

k

sparsification is a commonly used technique to reduce communication cost, where each client sends

k < d

of its coordinates to the server. However, Rand-

k

is agnostic to any correlations, that might exist between clients in practical scenarios. The recently proposed Rand-

k

-Spatial estimator leverages the cross-client correlation information at the server to improve Rand-

k

's performance. Yet, the performance of Rand-

k

-Spatial is suboptimal. We propose the Rand-Proj-Spatial estimator with a more flexible encoding-decoding procedure, which generalizes the encoding of Rand-

k

by projecting the client vectors to a random

k

-dimensional subspace. We utilize Subsampled Randomized Hadamard Transform (SRHT) as the projection matrix and show that Rand-Proj-Spatial with SRHT outperforms Rand-

k

-Spatial, using the correlation information more efficiently. Furthermore, we propose an approach to incorporate varying degrees of correlation and suggest a practical variant of Rand-Proj-Spatial when the correlation information is not available to the server. Experiments on real-world distributed optimization tasks showcase the superior performance of Rand-Proj-Spatial compared to Rand-

k

-Spatial and other more sophisticated sparsification techniques.Comment: 32 pages, 13 figures. Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, US

arXiv.org e-Print Archive

Improved Frequency Estimation Algorithms with and without Predictions

Author: Aamand Anders
Chen Justin Y.
Nguyen Huy Lê
Silwal Sandeep
Vakilian Ali
Publication venue
Publication date: 12/12/2023
Field of study

Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically bound the error of the estimated frequencies for any possible input. The work of Hsu et al. (2019) introduced the idea of using machine learning to tailor sketching algorithms to the specific data distribution they are being run on. In particular, their learning-augmented frequency estimation algorithm uses a learned heavy-hitter oracle which predicts which elements will appear many times in the stream. We give a novel algorithm, which in some parameter regimes, already theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions. Augmenting our algorithm with heavy-hitter predictions further reduces the error and improves upon the state of the art. Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches.Comment: NeurIPS 202

arXiv.org e-Print Archive