131 research outputs found
Fat-shattering dimension of -fold maxima
We provide improved estimates on the fat-shattering dimension of the -fold
maximum of real-valued function classes. The latter consists of all ways of
choosing functions, one from each of the classes, and computing their
pointwise maximum. The bound is stated in terms of the fat-shattering
dimensions of the component classes. For linear and affine function classes, we
provide a considerably sharper upper bound and a matching lower bound,
achieving, in particular, an optimal dependence on . Along the way, we point
out and correct a number of erroneous claims in the literature
Error Bounds for Piecewise Smooth and Switching Regression
The paper deals with regression problems, in which the nonsmooth target is
assumed to switch between different operating modes. Specifically, piecewise
smooth (PWS) regression considers target functions switching deterministically
via a partition of the input space, while switching regression considers
arbitrary switching laws. The paper derives generalization error bounds in
these two settings by following the approach based on Rademacher complexities.
For PWS regression, our derivation involves a chaining argument and a
decomposition of the covering numbers of PWS classes in terms of the ones of
their component functions and the capacity of the classifier partitioning the
input space. This yields error bounds with a radical dependency on the number
of modes. For switching regression, the decomposition can be performed directly
at the level of the Rademacher complexities, which yields bounds with a linear
dependency on the number of modes. By using once more chaining and a
decomposition at the level of covering numbers, we show how to recover a
radical dependency. Examples of applications are given in particular for PWS
and swichting regression with linear and kernel-based component functions.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice,after which this version may no
longer be accessibl
Scale-sensitive Psi-dimensions: the Capacity Measures for Classifiers Taking Values in R^Q
Bounds on the risk play a crucial role in statistical learning theory. They
usually involve as capacity measure of the model studied the VC dimension or
one of its extensions. In classification, such "VC dimensions" exist for models
taking values in {0, 1}, {1,..., Q} and R. We introduce the generalizations
appropriate for the missing case, the one of models with values in R^Q. This
provides us with a new guaranteed risk for M-SVMs which appears superior to the
existing one
The learnability of unknown quantum measurements
© Rinton Press. In this work, we provide an elegant framework to analyze learning matrices in the Schatten class by taking advantage of a recently developed methodology—matrix concentration inequalities. We establish the fat-shattering dimension, Rademacher/Gaussian complexity, and the entropy number of learning bounded operators and trace class operators. By characterising the tasks of learning quantum states and two-outcome quantum measurements into learning matrices in the Schatten-1 and ∞ classes, our proposed approach directly solves the sample complexity problems of learning quantum states and quantum measurements. Our main result in the paper is that, for learning an unknown quantum measurement, the upper bound, given by the fat-shattering dimension, is linearly proportional to the dimension of the underlying Hilbert space. Learning an unknown quantum state becomes a dual problem to ours, and as a byproduct, we can recover Aaronson’s famous result [Proc. R. Soc. A 463, 3089–3144 (2007)] solely using a classical machine learning technique. In addition, other famous complexity measures like covering numbers and Rademacher/Gaussian complexities are derived explicitly under the same framework. We are able to connect measures of sample complexity with various areas in quantum information science, e.g. quantum state/measurement tomography, quantum state discrimination and quantum random access codes, which may be of independent interest. Lastly, with the assistance of general Bloch-sphere representation, we show that learning quantum measurements/states can be mathematically formulated as a neural network. Consequently, classical ML algorithms can be applied to efficiently accomplish the two quantum learning tasks
Oracle Efficient Online Multicalibration and Omniprediction
A recent line of work has shown a surprising connection between
multicalibration, a multi-group fairness notion, and omniprediction, a learning
paradigm that provides simultaneous loss minimization guarantees for a large
family of loss functions. Prior work studies omniprediction in the batch
setting. We initiate the study of omniprediction in the online adversarial
setting. Although there exist algorithms for obtaining notions of
multicalibration in the online adversarial setting, unlike batch algorithms,
they work only for small finite classes of benchmark functions , because
they require enumerating every function at every round. In contrast,
omniprediction is most interesting for learning theoretic hypothesis classes
, which are generally continuously large.
We develop a new online multicalibration algorithm that is well defined for
infinite benchmark classes , and is oracle efficient (i.e. for any class
, the algorithm has the form of an efficient reduction to a no-regret
learning algorithm for ). The result is the first efficient online
omnipredictor -- an oracle efficient prediction algorithm that can be used to
simultaneously obtain no regret guarantees to all Lipschitz convex loss
functions. For the class of linear functions, we show how to make our
algorithm efficient in the worst case. Also, we show upper and lower bounds on
the extent to which our rates can be improved: our oracle efficient algorithm
actually promises a stronger guarantee called swap-omniprediction, and we prove
a lower bound showing that obtaining bounds for
swap-omniprediction is impossible in the online setting. On the other hand, we
give a (non-oracle efficient) algorithm which can obtain the optimal
omniprediction bounds without going through multicalibration,
giving an information theoretic separation between these two solution concepts
Optimal Efficiency-Envy Trade-Off via Optimal Transport
We consider the problem of allocating a distribution of items to
recipients where each recipient has to be allocated a fixed, prespecified
fraction of all items, while ensuring that each recipient does not experience
too much envy. We show that this problem can be formulated as a variant of the
semi-discrete optimal transport (OT) problem, whose solution structure in this
case has a concise representation and a simple geometric interpretation. Unlike
existing literature that treats envy-freeness as a hard constraint, our
formulation allows us to \emph{optimally} trade off efficiency and envy
continuously. Additionally, we study the statistical properties of the space of
our OT based allocation policies by showing a polynomial bound on the number of
samples needed to approximate the optimal solution from samples. Our approach
is suitable for large-scale fair allocation problems such as the blood donation
matching problem, and we show numerically that it performs well on a prior
realistic data simulator
I-theory on depth vs width: hierarchical function composition
Deep learning networks with convolution, pooling and subsampling are a special case of hierar- chical architectures, which can be represented by trees (such as binary trees). Hierarchical as well as shallow networks can approximate functions of several variables, in particular those that are com- positions of low dimensional functions. We show that the power of a deep network architecture with respect to a shallow network is rather independent of the specific nonlinear operations in the network and depends instead on the the behavior of the VC-dimension. A shallow network can approximate compositional functions with the same error of a deep network but at the cost of a VC-dimension that is exponential instead than quadratic in the dimensionality of the function. To complete the argument we argue that there exist visual computations that are intrinsically compositional. In particular, we prove that recognition invariant to translation cannot be computed by shallow networks in the presence of clutter. Finally, a general framework that includes the compositional case is sketched. The key con- dition that allows tall, thin networks to be nicer that short, fat networks is that the target input-output function must be sparse in a certain technical sense.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216
- …