875 research outputs found
Regret Bounds for Reinforcement Learning with Policy Advice
In some reinforcement learning problems an agent may be provided with a set
of input policies, perhaps learned from prior experience or provided by
advisors. We present a reinforcement learning with policy advice (RLPA)
algorithm which leverages this input set and learns to use the best policy in
the set for the reinforcement learning task at hand. We prove that RLPA has a
sub-linear regret of \tilde O(\sqrt{T}) relative to the best input policy, and
that both this regret and its computational complexity are independent of the
size of the state and action space. Our empirical simulations support our
theoretical analysis. This suggests RLPA may offer significant advantages in
large domains where some prior good policies are provided
An efficient algorithm for learning with semi-bandit feedback
We consider the problem of online combinatorial optimization under
semi-bandit feedback. The goal of the learner is to sequentially select its
actions from a combinatorial decision set so as to minimize its cumulative
loss. We propose a learning algorithm for this problem based on combining the
Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss
estimation procedure called Geometric Resampling (GR). Contrary to previous
solutions, the resulting algorithm can be efficiently implemented for any
decision set where efficient offline combinatorial optimization is possible at
all. Assuming that the elements of the decision set can be described with
d-dimensional binary vectors with at most m non-zero entries, we show that the
expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a
side result, we also improve the best known regret bounds for FPL in the full
information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m)
over previous bounds for this algorithm.Comment: submitted to ALT 201
IR ion spectroscopy in a combined approach with MS/MS and IM-MS to discriminate epimeric anthocyanin glycosides (cyanidin 3-O-glucoside and -galactoside)
Anthocyanins are widespread in plants and flowers, being responsible for their different colouring. Two representative members of this family have been selected, cyanidin 3-O-β-glucopyranoside and 3-O-β-galactopyranoside, and probed by mass spectrometry based methods, testing their performance in discriminating between the two epimers. The native anthocyanins, delivered into the gas phase by electrospray ionization, display a comparable drift time in ion mobility mass spectrometry (IM-MS) and a common fragment, corresponding to loss of the sugar moiety, in their collision induced dissociation (CID) pattern. However, the IR multiple photon dissociation (IRMPD) spectra in the fingerprint range show a feature particularly evident in the case of the glucoside. This signature is used to identify the presence of cyanidin 3-O-β-glucopyranoside in a natural extract of pomegranate. In an effort to increase any differentiation between the two epimers, aluminum complexes were prepared and sampled for elemental composition by FT-ICR-MS. CID experiments now display an extensive fragmentation pattern, showing few product ions peculiar to each species. More noteworthy is the IRMPD behavior in the OH stretching range showing significant differences in the spectra of the two epimers. DFT calculations allow to interpret the observed distinct bands due to a varied network of hydrogen bonding and relative conformer stability
Trading-off payments and accuracy in online classification with paid stochastic experts
We investigate online classification with paid stochastic experts. Here, before making their prediction, each expert must be paid. The amount that we pay each expert directly influences the accuracy of their prediction through some unknown Lipschitz “productivity” function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a weighted sum of the prediction error and upfront payments for all experts. We introduce an online learning algorithm whose total cost after T rounds exceeds that of a predictor which knows the productivity of all experts in advance by at most O(K2(lnT)T−−√) where K is the number of experts. In order to achieve this result, we combine Lipschitz bandits and online classification with surrogate losses. These tools allow us to improve upon the bound of order T2/3 one would obtain in the standard Lipschitz bandit setting. Our algorithm is empirically evaluated on synthetic data
Identificación de tetrahidrogeranilgeraniol y dihidrogeranilgeraniol en aceites de oliva virgen extra
Olive oil contains many different compounds which are responsible for its nutritional and sensorial value. However, some compounds present in olive oil at very low amounts have not yet been identified. Here, the detection of tetrahydrogeranylgeraniol and dihydrogeranylgeraniol, in both the total aliphatic alcohol and waxy fractions of extra virgin olive oil, is reported for the first time using GC and GC-MS methodologies. It was suggested that tetrahydrogeranylgeraniol and dihydrogeranylgeraniol do not originate from the hydrolysis of the chlorophyll but are present as diterpenic esters.Los aceites de oliva contienen muchos compuestos diferentes responsables de su valor nutricional y sensorial. Sin embargo, algunos compuestos presentes en los aceites de oliva en cantidades muy bajas aún no se han identificado. En este trabajo la detección de tetrahidrogeranilgeraniol y dihidrogeranilgeraniol, en las fracciones de alcoholes alifáticos totales y en la de ceras de los aceites de oliva virgen extra, es reportado por primera vez utilizando metodologías de GC y GC-MS. Se sugiere que el tetrahidrogeranilgeraniol y el dihidrogeranilgeraniol no se originan por hidrólisis de la clorofila, sino que están presentes como ésteres diterpénicos
- …