532 research outputs found
Laplace's rule of succession in information geometry
Laplace's "add-one" rule of succession modifies the observed frequencies in a
sequence of heads and tails by adding one to the observed counts. This improves
prediction by avoiding zero probabilities and corresponds to a uniform Bayesian
prior on the parameter. The canonical Jeffreys prior corresponds to the
"add-one-half" rule. We prove that, for exponential families of distributions,
such Bayesian predictors can be approximated by taking the average of the
maximum likelihood predictor and the \emph{sequential normalized maximum
likelihood} predictor from information theory. Thus in this case it is possible
to approximate Bayesian predictors without the cost of integrating or sampling
in parameter space
Bandit Online Optimization Over the Permutahedron
The permutahedron is the convex polytope with vertex set consisting of the
vectors for all permutations (bijections) over
. We study a bandit game in which, at each step , an
adversary chooses a hidden weight weight vector , a player chooses a
vertex of the permutahedron and suffers an observed loss of
.
A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a
regret of for a time horizon of . Unfortunately,
CombBand requires at each step an -by- matrix permanent approximation to
within improved accuracy as grows, resulting in a total running time that
is super linear in , making it impractical for large time horizons.
We provide an algorithm of regret with total time
complexity . The ideas are a combination of CombBand and a recent
algorithm by Ailon (2013) for online optimization over the permutahedron in the
full information setting. The technical core is a bound on the variance of the
Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by
establishing positive semi-definiteness of a family of 3-by-3 matrices
generated from rational functions of exponentials of 3 parameters
Statistical Mechanics of Linear and Nonlinear Time-Domain Ensemble Learning
Conventional ensemble learning combines students in the space domain. In this
paper, however, we combine students in the time domain and call it time-domain
ensemble learning. We analyze, compare, and discuss the generalization
performances regarding time-domain ensemble learning of both a linear model and
a nonlinear model. Analyzing in the framework of online learning using a
statistical mechanical method, we show the qualitatively different behaviors
between the two models. In a linear model, the dynamical behaviors of the
generalization error are monotonic. We analytically show that time-domain
ensemble learning is twice as effective as conventional ensemble learning.
Furthermore, the generalization error of a nonlinear model features
nonmonotonic dynamical behaviors when the learning rate is small. We
numerically show that the generalization performance can be improved remarkably
by using this phenomenon and the divergence of students in the time domain.Comment: 11 pages, 7 figure
IR ion spectroscopy in a combined approach with MS/MS and IM-MS to discriminate epimeric anthocyanin glycosides (cyanidin 3-O-glucoside and -galactoside)
Anthocyanins are widespread in plants and flowers, being responsible for their different colouring. Two representative members of this family have been selected, cyanidin 3-O-β-glucopyranoside and 3-O-β-galactopyranoside, and probed by mass spectrometry based methods, testing their performance in discriminating between the two epimers. The native anthocyanins, delivered into the gas phase by electrospray ionization, display a comparable drift time in ion mobility mass spectrometry (IM-MS) and a common fragment, corresponding to loss of the sugar moiety, in their collision induced dissociation (CID) pattern. However, the IR multiple photon dissociation (IRMPD) spectra in the fingerprint range show a feature particularly evident in the case of the glucoside. This signature is used to identify the presence of cyanidin 3-O-β-glucopyranoside in a natural extract of pomegranate. In an effort to increase any differentiation between the two epimers, aluminum complexes were prepared and sampled for elemental composition by FT-ICR-MS. CID experiments now display an extensive fragmentation pattern, showing few product ions peculiar to each species. More noteworthy is the IRMPD behavior in the OH stretching range showing significant differences in the spectra of the two epimers. DFT calculations allow to interpret the observed distinct bands due to a varied network of hydrogen bonding and relative conformer stability
Byzantine Stochastic Gradient Descent
This paper studies the problem of distributed stochastic optimization in an
adversarial setting where, out of the machines which allegedly compute
stochastic gradients every iteration, an -fraction are Byzantine, and
can behave arbitrarily and adversarially. Our main result is a variant of
stochastic gradient descent (SGD) which finds -approximate
minimizers of convex functions in iterations. In contrast, traditional
mini-batch SGD needs iterations,
but cannot tolerate Byzantine failures. Further, we provide a lower bound
showing that, up to logarithmic factors, our algorithm is
information-theoretically optimal both in terms of sampling complexity and time
complexity
On the Prior Sensitivity of Thompson Sampling
The empirically successful Thompson Sampling algorithm for stochastic bandits
has drawn much interest in understanding its theoretical properties. One
important benefit of the algorithm is that it allows domain knowledge to be
conveniently encoded as a prior distribution to balance exploration and
exploitation more effectively. While it is generally believed that the
algorithm's regret is low (high) when the prior is good (bad), little is known
about the exact dependence. In this paper, we fully characterize the
algorithm's worst-case dependence of regret on the choice of prior, focusing on
a special yet representative case. These results also provide insights into the
general sensitivity of the algorithm to the choice of priors. In particular,
with being the prior probability mass of the true reward-generating model,
we prove and regret upper bounds for the
bad- and good-prior cases, respectively, as well as \emph{matching} lower
bounds. Our proofs rely on the discovery of a fundamental property of Thompson
Sampling and make heavy use of martingale theory, both of which appear novel in
the literature, to the best of our knowledge.Comment: Appears in the 27th International Conference on Algorithmic Learning
Theory (ALT), 201
- …