237 research outputs found
Joint Mixability of Elliptical Distributions and Related Families
In this paper, we further develop the theory of complete mixability and joint
mixability for some distribution families. We generalize a result of
R\"uschendorf and Uckelmann (2002) related to complete mixability of continuous
distribution function having a symmetric and unimodal density. Two different
proofs to a result of Wang and Wang (2016) which related to the joint
mixability of elliptical distributions with the same characteristic generator
are present. We solve the Open Problem 7 in Wang (2015) by constructing a
bimodal-symmetric distribution. The joint mixability of slash-elliptical
distributions and skew-elliptical distributions is studied and the extension to
multivariate distributions is also investigated.Comment: 15page
From Stochastic Mixability to Fast Rates
Empirical risk minimization (ERM) is a fundamental learning rule for
statistical learning problems where the data is generated according to some
unknown distribution and returns a hypothesis chosen from a
fixed class with small loss . In the parametric setting,
depending upon ERM can have slow
or fast rates of convergence of the excess risk as a
function of the sample size . There exist several results that give
sufficient conditions for fast rates in terms of joint properties of ,
, and , such as the margin condition and the Bernstein
condition. In the non-statistical prediction with expert advice setting, there
is an analogous slow and fast rate phenomenon, and it is entirely characterized
in terms of the mixability of the loss (there being no role there for
or ). The notion of stochastic mixability builds a
bridge between these two models of learning, reducing to classical mixability
in a special case. The present paper presents a direct proof of fast rates for
ERM in terms of stochastic mixability of , and
in so doing provides new insight into the fast-rates phenomenon. The proof
exploits an old result of Kemperman on the solution to the general moment
problem. We also show a partial converse that suggests a characterization of
fast rates for ERM in terms of stochastic mixability is possible.Comment: 21 pages, accepted to NIPS 201
Bounding Stochastic Dependence, Complete Mixability of Matrices, and Multidimensional Bottleneck Assignment Problems
We call a matrix completely mixable if the entries in its columns can be
permuted so that all row sums are equal. If it is not completely mixable, we
want to determine the smallest maximal and largest minimal row sum attainable.
These values provide a discrete approximation of of minimum variance problems
for discrete distributions, a problem motivated by the question how to estimate
the -quantile of an aggregate random variable with unknown dependence
structure given the marginals of the constituent random variables. We relate
this problem to the multidimensional bottleneck assignment problem and show
that there exists a polynomial -approximation algorithm if the matrix has
only columns. In general, deciding complete mixability is
-complete. In particular the swapping algorithm of Puccetti et
al. is not an exact method unless . For a
fixed number of columns it remains -complete, but there exists a
PTAS. The problem can be solved in pseudopolynomial time for a fixed number of
rows, and even in polynomial time if all columns furthermore contain entries
from the same multiset
Current Open Questions in Complete Mixability
Complete and joint mixability has raised considerable interest in recent few
years, in both the theory of distributions with given margins, and applications
in discrete optimization and quantitative risk management. We list various open
questions in the theory of complete and joint mixability, which are
mathematically concrete, and yet accessible to a broad range of researchers
without specific background knowledge. In addition to the discussions on open
questions, some results contained in this paper are new
Fast rates in statistical and online learning
The speed with which a learning algorithm converges as it is presented with
more data is a central problem in machine learning --- a fast rate of
convergence means less data is needed for the same level of performance. The
pursuit of fast rates in online and statistical learning has led to the
discovery of many conditions in learning theory under which fast learning is
possible. We show that most of these conditions are special cases of a single,
unifying condition, that comes in two forms: the central condition for 'proper'
learning algorithms that always output a hypothesis in the given model, and
stochastic mixability for online algorithms that may make predictions outside
of the model. We show that under surprisingly weak assumptions both conditions
are, in a certain sense, equivalent. The central condition has a
re-interpretation in terms of convexity of a set of pseudoprobabilities,
linking it to density estimation under misspecification. For bounded losses, we
show how the central condition enables a direct proof of fast rates and we
prove its equivalence to the Bernstein condition, itself a generalization of
the Tsybakov margin condition, both of which have played a central role in
obtaining fast rates in statistical learning. Yet, while the Bernstein
condition is two-sided, the central condition is one-sided, making it more
suitable to deal with unbounded losses. In its stochastic mixability form, our
condition generalizes both a stochastic exp-concavity condition identified by
Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying
conditions thus provide a substantial step towards a characterization of fast
rates in statistical learning, similar to how classical mixability
characterizes constant regret in the sequential prediction with expert advice
setting.Comment: 69 pages, 3 figure
- …