10 research outputs found

    CountSketches, Feature Hashing and the Median of Three

    Full text link
    In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector vv to a vector of dimension (2t1)s(2t-1) s, where t,s>0t, s > 0 are integer parameters. It is known that even for t=1t=1, a CountSketch allows estimating coordinates of vv with variance bounded by v22/s\|v\|_2^2/s. For t>1t > 1, the estimator takes the median of 2t12t-1 independent estimates, and the probability that the estimate is off by more than 2v2/s2 \|v\|_2/\sqrt{s} is exponentially small in tt. This suggests choosing tt to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant tt. Previous work only predicts a constant factor improvement in this setting. Our main contribution is a new analysis of Count-Sketch, showing an improvement in variance to O(min{v12/s2,v22/s})O(\min\{\|v\|_1^2/s^2,\|v\|_2^2/s\}) when t>1t > 1. That is, the variance decreases proportionally to s2s^{-2}, asymptotically for large enough ss. We also study the variance in the setting where an inner product is to be estimated from two CountSketches. This finding suggests that the Feature Hashing method, which is essentially identical to CountSketch but does not make use of the median estimator, can be made more reliable at a small cost in settings where using a median estimator is possible. We confirm our theoretical findings in experiments and thereby help justify why a small constant number of estimates often suffice in practice. Our improved variance bounds are based on new general theorems about the variance and higher moments of the median of i.i.d. random variables that may be of independent interest

    Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

    Full text link
    In federated frequency estimation (FFE), multiple clients work together to estimate the frequencies of their collective data by communicating with a server that respects the privacy constraints of Secure Summation (SecSum), a cryptographic multi-party computation protocol that ensures that the server can only access the sum of client-held vectors. For single-round FFE, it is known that count sketching is nearly information-theoretically optimal for achieving the fundamental accuracy-communication trade-offs [Chen et al., 2022]. However, we show that under the more practical multi-round FEE setting, simple adaptations of count sketching are strictly sub-optimal, and we propose a novel hybrid sketching algorithm that is provably more accurate. We also address the following fundamental question: how should a practitioner set the sketch size in a way that adapts to the hardness of the underlying problem? We propose a two-phase approach that allows for the use of a smaller sketch size for simpler problems (e.g., near-sparse or light-tailed distributions). We conclude our work by showing how differential privacy can be added to our algorithm and verifying its superior performance through extensive experiments conducted on large-scale datasets.Comment: NeurIPS 2023 camera ready versio

    Correlation Aware Sparsified Mean Estimation Using Random Projection

    Full text link
    We study the problem of communication-efficient distributed vector mean estimation, a commonly used subroutine in distributed optimization and Federated Learning (FL). Rand-kk sparsification is a commonly used technique to reduce communication cost, where each client sends k<dk < d of its coordinates to the server. However, Rand-kk is agnostic to any correlations, that might exist between clients in practical scenarios. The recently proposed Rand-kk-Spatial estimator leverages the cross-client correlation information at the server to improve Rand-kk's performance. Yet, the performance of Rand-kk-Spatial is suboptimal. We propose the Rand-Proj-Spatial estimator with a more flexible encoding-decoding procedure, which generalizes the encoding of Rand-kk by projecting the client vectors to a random kk-dimensional subspace. We utilize Subsampled Randomized Hadamard Transform (SRHT) as the projection matrix and show that Rand-Proj-Spatial with SRHT outperforms Rand-kk-Spatial, using the correlation information more efficiently. Furthermore, we propose an approach to incorporate varying degrees of correlation and suggest a practical variant of Rand-Proj-Spatial when the correlation information is not available to the server. Experiments on real-world distributed optimization tasks showcase the superior performance of Rand-Proj-Spatial compared to Rand-kk-Spatial and other more sophisticated sparsification techniques.Comment: 32 pages, 13 figures. Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, US

    Improved Frequency Estimation Algorithms with and without Predictions

    Full text link
    Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically bound the error of the estimated frequencies for any possible input. The work of Hsu et al. (2019) introduced the idea of using machine learning to tailor sketching algorithms to the specific data distribution they are being run on. In particular, their learning-augmented frequency estimation algorithm uses a learned heavy-hitter oracle which predicts which elements will appear many times in the stream. We give a novel algorithm, which in some parameter regimes, already theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions. Augmenting our algorithm with heavy-hitter predictions further reduces the error and improves upon the state of the art. Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches.Comment: NeurIPS 202
    corecore