623 research outputs found
Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy
We consider a scenario involving computations over a massive dataset stored
distributedly across multiple workers, which is at the core of distributed
learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework
to simultaneously provide (1) resiliency against stragglers that may prolong
computations; (2) security against Byzantine (or malicious) workers that
deliberately modify the computation for their benefit; and (3)
(information-theoretic) privacy of the dataset amidst possible collusion of
workers. LCC, which leverages the well-known Lagrange polynomial to create
computation redundancy in a novel coded form across workers, can be applied to
any computation scenario in which the function of interest is an arbitrary
multivariate polynomial of the input dataset, hence covering many computations
of interest in machine learning. LCC significantly generalizes prior works to
go beyond linear computations. It also enables secure and private computing in
distributed settings, improving the computation and communication efficiency of
the state-of-the-art. Furthermore, we prove the optimality of LCC by showing
that it achieves the optimal tradeoff between resiliency, security, and
privacy, i.e., in terms of tolerating the maximum number of stragglers and
adversaries, and providing data privacy against the maximum number of colluding
workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the
conventional uncoded implementation of distributed least-squares linear
regression by up to , and also achieves a
- speedup over the state-of-the-art straggler
mitigation strategies
List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians
We study the problem of list-decodable Gaussian mean estimation and the
related problem of learning mixtures of separated spherical Gaussians. We
develop a set of techniques that yield new efficient algorithms with
significantly improved guarantees for these problems.
{\bf List-Decodable Mean Estimation.} Fix any and . We design an algorithm with runtime that outputs a list of many
candidate vectors such that with high probability one of the candidates is
within -distance from the true mean. The only
previous algorithm for this problem achieved error
under second moment conditions. For , our algorithm runs in
polynomial time and achieves error . We also give a
Statistical Query lower bound suggesting that the complexity of our algorithm
is qualitatively close to best possible.
{\bf Learning Mixtures of Spherical Gaussians.} We give a learning algorithm
for mixtures of spherical Gaussians that succeeds under significantly weaker
separation assumptions compared to prior work. For the prototypical case of a
uniform mixture of identity covariance Gaussians we obtain: For any
, if the pairwise separation between the means is at least
, our algorithm learns the unknown
parameters within accuracy with sample complexity and running time
. The previously best
known polynomial time algorithm required separation at least .
Our main technical contribution is a new technique, using degree-
multivariate polynomials, to remove outliers from high-dimensional datasets
where the majority of the points are corrupted
Identifying Shared Decodable Concepts in the Human Brain Using Image-Language Foundation Models
We introduce a method that takes advantage of high-quality pretrained
multimodal representations to explore fine-grained semantic networks in the
human brain. Previous studies have documented evidence of functional
localization in the brain, with different anatomical regions preferentially
activating for different types of sensory input. Many such localized structures
are known, including the fusiform face area and parahippocampal place area.
This raises the question of whether additional brain regions (or conjunctions
of brain regions) are also specialized for other important semantic concepts.
To identify such brain regions, we developed a data-driven approach to uncover
visual concepts that are decodable from a massive functional magnetic resonance
imaging (fMRI) dataset. Our analysis is broadly split into three sections.
First, a fully connected neural network is trained to map brain responses to
the outputs of an image-language foundation model, CLIP (Radford et al., 2021).
Subsequently, a contrastive-learning dimensionality reduction method reveals
the brain-decodable components of CLIP space. In the final section of our
analysis, we localize shared decodable concepts in the brain using a
voxel-masking optimization method to produce a shared decodable concept (SDC)
space. The accuracy of our procedure is validated by comparing it to previous
localization experiments that identify regions for faces, bodies, and places.
In addition to these concepts, whose corresponding brain regions were already
known, we localize novel concept representations which are shared across
participants to other areas of the human brain. We also demonstrate how this
method can be used to inspect fine-grained semantic networks for individual
participants. We envisage that this extensible method can also be adapted to
explore other questions at the intersection of AI and neuroscience.Comment: Under revie
Optimal modeling for complex system design
The article begins with a brief introduction to the theory describing optimal data compression systems and their performance. A brief outline is then given of a representative algorithm that employs these lessons for optimal data compression system design. The implications of rate-distortion theory for practical data compression system design is then described, followed by a description of the tensions between theoretical optimality and system practicality and a discussion of common tools used in current algorithms to resolve these tensions. Next, the generalization of rate-distortion principles to the design of optimal collections of models is presented. The discussion focuses initially on data compression systems, but later widens to describe how rate-distortion theory principles generalize to model design for a wide variety of modeling applications. The article ends with a discussion of the performance benefits to be achieved using the multiple-model design algorithms
Linear Regression using Heterogeneous Data Batches
In many learning applications, data are collected from multiple sources, each
providing a \emph{batch} of samples that by itself is insufficient to learn its
input-output relationship. A common approach assumes that the sources fall in
one of several unknown subgroups, each with an unknown input distribution and
input-output relationship. We consider one of this setup's most fundamental and
important manifestations where the output is a noisy linear combination of the
inputs, and there are subgroups, each with its own regression vector. Prior
work~\cite{kong2020meta} showed that with abundant small-batches, the
regression vectors can be learned with only few, ,
batches of medium-size with samples each. However, the
paper requires that the input distribution for all subgroups be isotropic
Gaussian, and states that removing this assumption is an ``interesting and
challenging problem". We propose a novel gradient-based algorithm that improves
on the existing results in several ways. It extends the applicability of the
algorithm by: (1) allowing the subgroups' underlying input distributions to be
different, unknown, and heavy-tailed; (2) recovering all subgroups followed by
a significant proportion of batches even for infinite ; (3) removing the
separation requirement between the regression vectors; (4) reducing the number
of batches and allowing smaller batch sizes
- …