468 research outputs found
Online Learning of Quantum States
Suppose we have many copies of an unknown -qubit state . We measure
some copies of using a known two-outcome measurement , then other
copies using a measurement , and so on. At each stage , we generate a
current hypothesis about the state , using the outcomes of
the previous measurements. We show that it is possible to do this in a way that
guarantees that , the error in our prediction for the next
measurement, is at least at most times. Even in the "non-realizable" setting---where
there could be arbitrary noise in the measurement outcomes---we show how to
output hypothesis states that do significantly worse than the best possible
states at most times on the first
measurements. These results generalize a 2007 theorem by Aaronson on the
PAC-learnability of quantum states, to the online and regret-minimization
settings. We give three different ways to prove our results---using convex
optimization, quantum postselection, and sequential fat-shattering
dimension---which have different advantages in terms of parameters and
portability.Comment: 18 page
The Shannon Cipher System with a Guessing Wiretapper: General Sources
The Shannon cipher system is studied in the context of general sources using
a notion of computational secrecy introduced by Merhav & Arikan. Bounds are
derived on limiting exponents of guessing moments for general sources. The
bounds are shown to be tight for iid, Markov, and unifilar sources, thus
recovering some known results. A close relationship between error exponents and
correct decoding exponents for fixed rate source compression on the one hand
and exponents for guessing moments on the other hand is established.Comment: 24 pages, Submitted to IEEE Transactions on Information Theor
On empirical cumulant generating functions of code lengths for individual sequences
We consider the problem of lossless compression of individual sequences using
finite-state (FS) machines, from the perspective of the best achievable
empirical cumulant generating function (CGF) of the code length, i.e., the
normalized logarithm of the empirical average of the exponentiated code length.
Since the probabilistic CGF is minimized in terms of the R\'enyi entropy of the
source, one of the motivations of this study is to derive an
individual-sequence analogue of the R\'enyi entropy, in the same way that the
FS compressibility is the individual-sequence counterpart of the Shannon
entropy. We consider the CGF of the code-length both from the perspective of
fixed-to-variable (F-V) length coding and the perspective of
variable-to-variable (V-V) length coding, where the latter turns out to yield a
better result, that coincides with the FS compressibility. We also extend our
results to compression with side information, available at both the encoder and
decoder. In this case, the V-V version no longer coincides with the FS
compressibility, but results in a different complexity measure.Comment: 15 pages; submitted for publicatio
A Multi-Plane Block-Coordinate Frank-Wolfe Algorithm for Training Structural SVMs with a Costly max-Oracle
Structural support vector machines (SSVMs) are amongst the best performing
models for structured computer vision tasks, such as semantic image
segmentation or human pose estimation. Training SSVMs, however, is
computationally costly, because it requires repeated calls to a structured
prediction subroutine (called \emph{max-oracle}), which has to solve an
optimization problem itself, e.g. a graph cut.
In this work, we introduce a new algorithm for SSVM training that is more
efficient than earlier techniques when the max-oracle is computationally
expensive, as it is frequently the case in computer vision tasks. The main idea
is to (i) combine the recent stochastic Block-Coordinate Frank-Wolfe algorithm
with efficient hyperplane caching, and (ii) use an automatic selection rule for
deciding whether to call the exact max-oracle or to rely on an approximate one
based on the cached hyperplanes.
We show experimentally that this strategy leads to faster convergence to the
optimum with respect to the number of requires oracle calls, and that this
translates into faster convergence with respect to the total runtime when the
max-oracle is slow compared to the other steps of the algorithm.
A publicly available C++ implementation is provided at
http://pub.ist.ac.at/~vnk/papers/SVM.html
Construction of a Large Class of Deterministic Sensing Matrices that Satisfy a Statistical Isometry Property
Compressed Sensing aims to capture attributes of -sparse signals using
very few measurements. In the standard Compressed Sensing paradigm, the
\m\times \n measurement matrix \A is required to act as a near isometry on
the set of all -sparse signals (Restricted Isometry Property or RIP).
Although it is known that certain probabilistic processes generate \m \times
\n matrices that satisfy RIP with high probability, there is no practical
algorithm for verifying whether a given sensing matrix \A has this property,
crucial for the feasibility of the standard recovery algorithms. In contrast
this paper provides simple criteria that guarantee that a deterministic sensing
matrix satisfying these criteria acts as a near isometry on an overwhelming
majority of -sparse signals; in particular, most such signals have a unique
representation in the measurement domain. Probability still plays a critical
role, but it enters the signal model rather than the construction of the
sensing matrix. We require the columns of the sensing matrix to form a group
under pointwise multiplication. The construction allows recovery methods for
which the expected performance is sub-linear in \n, and only quadratic in
\m; the focus on expected performance is more typical of mainstream signal
processing than the worst-case analysis that prevails in standard Compressed
Sensing. Our framework encompasses many families of deterministic sensing
matrices, including those formed from discrete chirps, Delsarte-Goethals codes,
and extended BCH codes.Comment: 16 Pages, 2 figures, to appear in IEEE Journal of Selected Topics in
Signal Processing, the special issue on Compressed Sensin
Universal Codes from Switching Strategies
We discuss algorithms for combining sequential prediction strategies, a task
which can be viewed as a natural generalisation of the concept of universal
coding. We describe a graphical language based on Hidden Markov Models for
defining prediction strategies, and we provide both existing and new models as
examples. The models include efficient, parameterless models for switching
between the input strategies over time, including a model for the case where
switches tend to occur in clusters, and finally a new model for the scenario
where the prediction strategies have a known relationship, and where jumps are
typically between strongly related ones. This last model is relevant for coding
time series data where parameter drift is expected. As theoretical ontributions
we introduce an interpolation construction that is useful in the development
and analysis of new algorithms, and we establish a new sophisticated lemma for
analysing the individual sequence regret of parameterised models
Sequential Predictions based on Algorithmic Complexity
This paper studies sequence prediction based on the monotone Kolmogorov
complexity Km=-log m, i.e. based on universal deterministic/one-part MDL. m is
extremely close to Solomonoff's universal prior M, the latter being an
excellent predictor in deterministic as well as probabilistic environments,
where performance is measured in terms of convergence of posteriors or losses.
Despite this closeness to M, it is difficult to assess the prediction quality
of m, since little is known about the closeness of their posteriors, which are
the important quantities for prediction. We show that for deterministic
computable environments, the "posterior" and losses of m converge, but rapid
convergence could only be shown on-sequence; the off-sequence convergence can
be slow. In probabilistic environments, neither the posterior nor the losses
converge, in general.Comment: 26 pages, LaTe
Offline to Online Conversion
We consider the problem of converting offline estimators into an online
predictor or estimator with small extra regret. Formally this is the problem of
merging a collection of probability measures over strings of length 1,2,3,...
into a single probability measure over infinite sequences. We describe various
approaches and their pros and cons on various examples. As a side-result we
give an elementary non-heuristic purely combinatoric derivation of Turing's
famous estimator. Our main technical contribution is to determine the
computational complexity of online estimators with good guarantees in general.Comment: 20 LaTeX page
- …