19,254 research outputs found
Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis
Database theory and database practice are typically the domain of computer
scientists who adopt what may be termed an algorithmic perspective on their
data. This perspective is very different than the more statistical perspective
adopted by statisticians, scientific computers, machine learners, and other who
work on what may be broadly termed statistical data analysis. In this article,
I will address fundamental aspects of this algorithmic-statistical disconnect,
with an eye to bridging the gap between these two very different approaches. A
concept that lies at the heart of this disconnect is that of statistical
regularization, a notion that has to do with how robust is the output of an
algorithm to the noise properties of the input data. Although it is nearly
completely absent from computer science, which historically has taken the input
data as given and modeled algorithms discretely, regularization in one form or
another is central to nearly every application domain that applies algorithms
to noisy data. By using several case studies, I will illustrate, both
theoretically and empirically, the nonobvious fact that approximate
computation, in and of itself, can implicitly lead to statistical
regularization. This and other recent work suggests that, by exploiting in a
more principled way the statistical properties implicit in worst-case
algorithms, one can in many cases satisfy the bicriteria of having algorithms
that are scalable to very large-scale databases and that also have good
inferential or predictive properties.Comment: To appear in the Proceedings of the 2012 ACM Symposium on Principles
of Database Systems (PODS 2012
Survey propagation at finite temperature: application to a Sourlas code as a toy model
In this paper we investigate a finite temperature generalization of survey
propagation, by applying it to the problem of finite temperature decoding of a
biased finite connectivity Sourlas code for temperatures lower than the
Nishimori temperature. We observe that the result is a shift of the location of
the dynamical critical channel noise to larger values than the corresponding
dynamical transition for belief propagation, as suggested recently by
Migliorini and Saad for LDPC codes. We show how the finite temperature 1-RSB SP
gives accurate results in the regime where competing approaches fail to
converge or fail to recover the retrieval state
Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions
In decentralized networks (of sensors, connected objects, etc.), there is an
important need for efficient algorithms to optimize a global cost function, for
instance to learn a global model from the local data collected by each
computing unit. In this paper, we address the problem of decentralized
minimization of pairwise functions of the data points, where these points are
distributed over the nodes of a graph defining the communication topology of
the network. This general problem finds applications in ranking, distance
metric learning and graph inference, among others. We propose new gossip
algorithms based on dual averaging which aims at solving such problems both in
synchronous and asynchronous settings. The proposed framework is flexible
enough to deal with constrained and regularized variants of the optimization
problem. Our theoretical analysis reveals that the proposed algorithms preserve
the convergence rate of centralized dual averaging up to an additive bias term.
We present numerical simulations on Area Under the ROC Curve (AUC) maximization
and metric learning problems which illustrate the practical interest of our
approach
Clustering Algorithms for Scale-free Networks and Applications to Cloud Resource Management
In this paper we introduce algorithms for the construction of scale-free
networks and for clustering around the nerve centers, nodes with a high
connectivity in a scale-free networks. We argue that such overlay networks
could support self-organization in a complex system like a cloud computing
infrastructure and allow the implementation of optimal resource management
policies.Comment: 14 pages, 8 Figurs, Journa
The random K-satisfiability problem: from an analytic solution to an efficient algorithm
We study the problem of satisfiability of randomly chosen clauses, each with
K Boolean variables. Using the cavity method at zero temperature, we find the
phase diagram for the K=3 case. We show the existence of an intermediate phase
in the satisfiable region, where the proliferation of metastable states is at
the origin of the slowdown of search algorithms. The fundamental order
parameter introduced in the cavity method, which consists of surveys of local
magnetic fields in the various possible states of the system, can be computed
for one given sample. These surveys can be used to invent new types of
algorithms for solving hard combinatorial optimizations problems. One such
algorithm is shown here for the 3-sat problem, with very good performances.Comment: 38 pages, 13 figures; corrected typo
- …