Search CORE

623 research outputs found

Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy

Author: Avestimehr Salman
Kalan Seyed Mohammadreza Mousavi
Li Songze
Raviv Netanel
Soltanolkotabi Mahdi
Yu Qian
Publication venue
Publication date: 01/04/2019
Field of study

We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to

13.43\times

, and also achieves a

2.36\times

12.65\times

speedup over the state-of-the-art straggler mitigation strategies

arXiv.org e-Print Archive

Caltech Authors

List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians

Author: Diakonikolas Ilias
Kane Daniel M.
Stewart Alistair
Publication venue
Publication date: 20/11/2017
Field of study

We study the problem of list-decodable Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians. We develop a set of techniques that yield new efficient algorithms with significantly improved guarantees for these problems. {\bf List-Decodable Mean Estimation.} Fix any

d \in \mathbb{Z}_+

and

0< \alpha <1/2

. We design an algorithm with runtime

O (\mathrm{poly}(n/\alpha)^{d})

that outputs a list of

O(1/\alpha)

many candidate vectors such that with high probability one of the candidates is within

\ell_2

-distance

O(\alpha^{-1/(2d)})

from the true mean. The only previous algorithm for this problem achieved error

\tilde O(\alpha^{-1/2})

under second moment conditions. For

d = O(1/\epsilon)

, our algorithm runs in polynomial time and achieves error

O(\alpha^{\epsilon})

. We also give a Statistical Query lower bound suggesting that the complexity of our algorithm is qualitatively close to best possible. {\bf Learning Mixtures of Spherical Gaussians.} We give a learning algorithm for mixtures of spherical Gaussians that succeeds under significantly weaker separation assumptions compared to prior work. For the prototypical case of a uniform mixture of

k

identity covariance Gaussians we obtain: For any

\epsilon>0

, if the pairwise separation between the means is at least

\Omega(k^{\epsilon}+\sqrt{\log(1/\delta)})

, our algorithm learns the unknown parameters within accuracy

\delta

with sample complexity and running time

\mathrm{poly} (n, 1/\delta, (k/\epsilon)^{1/\epsilon})

. The previously best known polynomial time algorithm required separation at least

k^{1/4} \mathrm{polylog}(k/\delta)

. Our main technical contribution is a new technique, using degree-

d

multivariate polynomials, to remove outliers from high-dimensional datasets where the majority of the points are corrupted

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Identifying Shared Decodable Concepts in the Human Brain Using Image-Language Foundation Models

Author: Efird Cory
Fyshe Alona
Murphy Alex
Zylberberg Joel
Publication venue
Publication date: 05/06/2023
Field of study

We introduce a method that takes advantage of high-quality pretrained multimodal representations to explore fine-grained semantic networks in the human brain. Previous studies have documented evidence of functional localization in the brain, with different anatomical regions preferentially activating for different types of sensory input. Many such localized structures are known, including the fusiform face area and parahippocampal place area. This raises the question of whether additional brain regions (or conjunctions of brain regions) are also specialized for other important semantic concepts. To identify such brain regions, we developed a data-driven approach to uncover visual concepts that are decodable from a massive functional magnetic resonance imaging (fMRI) dataset. Our analysis is broadly split into three sections. First, a fully connected neural network is trained to map brain responses to the outputs of an image-language foundation model, CLIP (Radford et al., 2021). Subsequently, a contrastive-learning dimensionality reduction method reveals the brain-decodable components of CLIP space. In the final section of our analysis, we localize shared decodable concepts in the brain using a voxel-masking optimization method to produce a shared decodable concept (SDC) space. The accuracy of our procedure is validated by comparing it to previous localization experiments that identify regions for faces, bodies, and places. In addition to these concepts, whose corresponding brain regions were already known, we localize novel concept representations which are shared across participants to other areas of the human brain. We also demonstrate how this method can be used to inspect fine-grained semantic networks for individual participants. We envisage that this extensible method can also be adapted to explore other questions at the intersection of AI and neuroscience.Comment: Under revie

arXiv.org e-Print Archive

Optimal modeling for complex system design

Author: Effros Michelle
Publication venue
Publication date: 01/11/1998
Field of study

The article begins with a brief introduction to the theory describing optimal data compression systems and their performance. A brief outline is then given of a representative algorithm that employs these lessons for optimal data compression system design. The implications of rate-distortion theory for practical data compression system design is then described, followed by a description of the tensions between theoretical optimality and system practicality and a discussion of common tools used in current algorithms to resolve these tensions. Next, the generalization of rate-distortion principles to the design of optimal collections of models is presented. The discussion focuses initially on data compression systems, but later widens to describe how rate-distortion theory principles generalize to model design for a wide variety of modeling applications. The article ends with a discussion of the performance benefits to be achieved using the multiple-model design algorithms

Caltech Authors

Linear Regression using Heterogeneous Data Batches

Author: Das Abhimanyu
Jain Ayush
Kong Weihao
Orlitsky Alon
Sen Rajat
Publication venue
Publication date: 05/09/2023
Field of study

In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are

k

subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few,

\tilde\Omega( k^{3/2})

, batches of medium-size with

\tilde\Omega(\sqrt k)

samples each. However, the paper requires that the input distribution for all

k

subgroups be isotropic Gaussian, and states that removing this assumption is an ``interesting and challenging problem". We propose a novel gradient-based algorithm that improves on the existing results in several ways. It extends the applicability of the algorithm by: (1) allowing the subgroups' underlying input distributions to be different, unknown, and heavy-tailed; (2) recovering all subgroups followed by a significant proportion of batches even for infinite

k

; (3) removing the separation requirement between the regression vectors; (4) reducing the number of batches and allowing smaller batch sizes

arXiv.org e-Print Archive