623 research outputs found

    Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy

    Get PDF
    We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to 13.43×13.43\times, and also achieves a 2.36×2.36\times-12.65×12.65\times speedup over the state-of-the-art straggler mitigation strategies

    List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians

    Full text link
    We study the problem of list-decodable Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians. We develop a set of techniques that yield new efficient algorithms with significantly improved guarantees for these problems. {\bf List-Decodable Mean Estimation.} Fix any dZ+d \in \mathbb{Z}_+ and 0<α<1/20< \alpha <1/2. We design an algorithm with runtime O(poly(n/α)d)O (\mathrm{poly}(n/\alpha)^{d}) that outputs a list of O(1/α)O(1/\alpha) many candidate vectors such that with high probability one of the candidates is within 2\ell_2-distance O(α1/(2d))O(\alpha^{-1/(2d)}) from the true mean. The only previous algorithm for this problem achieved error O~(α1/2)\tilde O(\alpha^{-1/2}) under second moment conditions. For d=O(1/ϵ)d = O(1/\epsilon), our algorithm runs in polynomial time and achieves error O(αϵ)O(\alpha^{\epsilon}). We also give a Statistical Query lower bound suggesting that the complexity of our algorithm is qualitatively close to best possible. {\bf Learning Mixtures of Spherical Gaussians.} We give a learning algorithm for mixtures of spherical Gaussians that succeeds under significantly weaker separation assumptions compared to prior work. For the prototypical case of a uniform mixture of kk identity covariance Gaussians we obtain: For any ϵ>0\epsilon>0, if the pairwise separation between the means is at least Ω(kϵ+log(1/δ))\Omega(k^{\epsilon}+\sqrt{\log(1/\delta)}), our algorithm learns the unknown parameters within accuracy δ\delta with sample complexity and running time poly(n,1/δ,(k/ϵ)1/ϵ)\mathrm{poly} (n, 1/\delta, (k/\epsilon)^{1/\epsilon}). The previously best known polynomial time algorithm required separation at least k1/4polylog(k/δ)k^{1/4} \mathrm{polylog}(k/\delta). Our main technical contribution is a new technique, using degree-dd multivariate polynomials, to remove outliers from high-dimensional datasets where the majority of the points are corrupted

    Identifying Shared Decodable Concepts in the Human Brain Using Image-Language Foundation Models

    Full text link
    We introduce a method that takes advantage of high-quality pretrained multimodal representations to explore fine-grained semantic networks in the human brain. Previous studies have documented evidence of functional localization in the brain, with different anatomical regions preferentially activating for different types of sensory input. Many such localized structures are known, including the fusiform face area and parahippocampal place area. This raises the question of whether additional brain regions (or conjunctions of brain regions) are also specialized for other important semantic concepts. To identify such brain regions, we developed a data-driven approach to uncover visual concepts that are decodable from a massive functional magnetic resonance imaging (fMRI) dataset. Our analysis is broadly split into three sections. First, a fully connected neural network is trained to map brain responses to the outputs of an image-language foundation model, CLIP (Radford et al., 2021). Subsequently, a contrastive-learning dimensionality reduction method reveals the brain-decodable components of CLIP space. In the final section of our analysis, we localize shared decodable concepts in the brain using a voxel-masking optimization method to produce a shared decodable concept (SDC) space. The accuracy of our procedure is validated by comparing it to previous localization experiments that identify regions for faces, bodies, and places. In addition to these concepts, whose corresponding brain regions were already known, we localize novel concept representations which are shared across participants to other areas of the human brain. We also demonstrate how this method can be used to inspect fine-grained semantic networks for individual participants. We envisage that this extensible method can also be adapted to explore other questions at the intersection of AI and neuroscience.Comment: Under revie

    Optimal modeling for complex system design

    Get PDF
    The article begins with a brief introduction to the theory describing optimal data compression systems and their performance. A brief outline is then given of a representative algorithm that employs these lessons for optimal data compression system design. The implications of rate-distortion theory for practical data compression system design is then described, followed by a description of the tensions between theoretical optimality and system practicality and a discussion of common tools used in current algorithms to resolve these tensions. Next, the generalization of rate-distortion principles to the design of optimal collections of models is presented. The discussion focuses initially on data compression systems, but later widens to describe how rate-distortion theory principles generalize to model design for a wide variety of modeling applications. The article ends with a discussion of the performance benefits to be achieved using the multiple-model design algorithms

    Linear Regression using Heterogeneous Data Batches

    Full text link
    In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are kk subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few, Ω~(k3/2)\tilde\Omega( k^{3/2}), batches of medium-size with Ω~(k)\tilde\Omega(\sqrt k) samples each. However, the paper requires that the input distribution for all kk subgroups be isotropic Gaussian, and states that removing this assumption is an ``interesting and challenging problem". We propose a novel gradient-based algorithm that improves on the existing results in several ways. It extends the applicability of the algorithm by: (1) allowing the subgroups' underlying input distributions to be different, unknown, and heavy-tailed; (2) recovering all subgroups followed by a significant proportion of batches even for infinite kk; (3) removing the separation requirement between the regression vectors; (4) reducing the number of batches and allowing smaller batch sizes
    corecore