31,045 research outputs found
Sketching for Large-Scale Learning of Mixture Models
Learning parameters from voluminous data can be prohibitive in terms of
memory and computational requirements. We propose a "compressive learning"
framework where we estimate model parameters from a sketch of the training
data. This sketch is a collection of generalized moments of the underlying
probability distribution of the data. It can be computed in a single pass on
the training set, and is easily computable on streams or distributed datasets.
The proposed framework shares similarities with compressive sensing, which aims
at drastically reducing the dimension of high-dimensional signals while
preserving the ability to reconstruct them. To perform the estimation task, we
derive an iterative algorithm analogous to sparse reconstruction algorithms in
the context of linear inverse problems. We exemplify our framework with the
compressive estimation of a Gaussian Mixture Model (GMM), providing heuristics
on the choice of the sketching procedure and theoretical guarantees of
reconstruction. We experimentally show on synthetic data that the proposed
algorithm yields results comparable to the classical Expectation-Maximization
(EM) technique while requiring significantly less memory and fewer computations
when the number of database elements is large. We further demonstrate the
potential of the approach on real large-scale data (over 10 8 training samples)
for the task of model-based speaker verification. Finally, we draw some
connections between the proposed framework and approximate Hilbert space
embedding of probability distributions using random features. We show that the
proposed sketching operator can be seen as an innovative method to design
translation-invariant kernels adapted to the analysis of GMMs. We also use this
theoretical framework to derive information preservation guarantees, in the
spirit of infinite-dimensional compressive sensing
An ultrahigh-speed digitizer for the Harvard College Observatory astronomical plates
A machine capable of digitizing two 8 inch by 10 inch (203 mm by 254 mm)
glass astrophotographic plates or a single 14 inch by 17 inch (356 mm by 432
mm) plate at a resolution of 11 microns per pixel or 2309 dots per inch (dpi)
in 92 seconds is described. The purpose of the machine is to digitize the
\~500,000 plate collection of the Harvard College Observatory in a five year
time frame. The digitization must meet the requirements for scientific work in
astrometry, photometry, and archival preservation of the plates. This paper
describes the requirements for and the design of the subsystems of the machine
that was developed specifically for this task.Comment: 12 pages, 9 figures, 1 table; presented at SPIE (July, 2006) and
published in Proceeding
Machine Covering in the Random-Order Model
In the Online Machine Covering problem jobs, defined by their sizes, arrive
one by one and have to be assigned to parallel and identical machines, with
the goal of maximizing the load of the least-loaded machine. In this work, we
study the Machine Covering problem in the recently popular random-order model.
Here no extra resources are present, but instead the adversary is weakened in
that it can only decide upon the input set while jobs are revealed uniformly at
random. It is particularly relevant to Machine Covering where lower bounds are
usually associated to highly structured input sequences.
We first analyze Graham's Greedy-strategy in this context and establish that
its competitive ratio decreases slightly to
which is asymptotically tight. Then, as
our main result, we present an improved -competitive
algorithm for the problem. This result is achieved by exploiting the extra
information coming from the random order of the jobs, using sampling techniques
to devise an improved mechanism to distinguish jobs that are relatively large
from small ones. We complement this result with a first lower bound showing
that no algorithm can have a competitive ratio of
in the random-order model. This
lower bound is achieved by studying a novel variant of the Secretary problem,
which could be of independent interest
Dagstuhl Reports : Volume 1, Issue 2, February 2011
Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-HĂŒbner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro PezzĂ©, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn
Algorithmic Complexity for Short Binary Strings Applied to Psychology: A Primer
Since human randomness production has been studied and widely used to assess
executive functions (especially inhibition), many measures have been suggested
to assess the degree to which a sequence is random-like. However, each of them
focuses on one feature of randomness, leading authors to have to use multiple
measures. Here we describe and advocate for the use of the accepted universal
measure for randomness based on algorithmic complexity, by means of a novel
previously presented technique using the the definition of algorithmic
probability. A re-analysis of the classical Radio Zenith data in the light of
the proposed measure and methodology is provided as a study case of an
application.Comment: To appear in Behavior Research Method
Unmasking Clever Hans Predictors and Assessing What Machines Really Learn
Current learning machines have successfully solved hard application problems,
reaching high accuracy and displaying seemingly "intelligent" behavior. Here we
apply recent techniques for explaining decisions of state-of-the-art learning
machines and analyze various tasks from computer vision and arcade games. This
showcases a spectrum of problem-solving behaviors ranging from naive and
short-sighted, to well-informed and strategic. We observe that standard
performance evaluation metrics can be oblivious to distinguishing these diverse
problem solving behaviors. Furthermore, we propose our semi-automated Spectral
Relevance Analysis that provides a practically effective way of characterizing
and validating the behavior of nonlinear learning machines. This helps to
assess whether a learned model indeed delivers reliably for the problem that it
was conceived for. Furthermore, our work intends to add a voice of caution to
the ongoing excitement about machine intelligence and pledges to evaluate and
judge some of these recent successes in a more nuanced manner.Comment: Accepted for publication in Nature Communication
Planters and their Components: Types, attributes, functional requirements, classification and description
Crop Production/Industries, Farm Management,
- âŠ