Search CORE

875 research outputs found

Regret Bounds for Reinforcement Learning with Policy Advice

Author: C. Tekin
M.L. Puterman
N. Cesa-Bianchi
R. Ortner
R.S. Sutton
T. Jaksch
Publication venue
Publication date: 01/01/2013
Field of study

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best policy in the set for the reinforcement learning task at hand. We prove that RLPA has a sub-linear regret of \tilde O(\sqrt{T}) relative to the best input policy, and that both this regret and its computational complexity are independent of the size of the state and action space. Our empirical simulations support our theoretical analysis. This suggests RLPA may offer significant advantages in large domains where some prior good policies are provided

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

An efficient algorithm for learning with semi-bandit feedback

Author: A. György
A. Kalai
C. Allenberg
D. Suehiro
E. Takimoto
H.B. McMahan
J. Hannan
J. Poland
J.-Y. Audibert
N. Cesa-Bianchi
N. Cesa-Bianchi
P. Auer
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a side result, we also improve the best known regret bounds for FPL in the full information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m) over previous bounds for this algorithm.Comment: submitted to ALT 201

arXiv.org e-Print Archive

Crossref

A second-order perceptron algorithm

Author: CESA-BIANCHI N
CONCONI A
GENTILE C
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2005
Field of study

Archivio istituzionale della ricerca - Università dell'Insubria

Physical Activity in old age: educational and psycological aspec

Author: Cesa-Bianchi G.
Cristini C.
Margiotta M.
Riva E.
Togni F.
Publication venue
Publication date: 01/01/2017
Field of study

Archivio istituzionale della ricerca - Università di Brescia

IR ion spectroscopy in a combined approach with MS/MS and IM-MS to discriminate epimeric anthocyanin glycosides (cyanidin 3-O-glucoside and -galactoside)

Author: Botta B.
Cesa S.
Chiavarino B.
Corinti D.
Crestoni M. E.
Fornarini S.
Ingallina C.
Maccelli A.
Mannina L.
Quaglio D.
Tintaru A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Anthocyanins are widespread in plants and flowers, being responsible for their different colouring. Two representative members of this family have been selected, cyanidin 3-O-β-glucopyranoside and 3-O-β-galactopyranoside, and probed by mass spectrometry based methods, testing their performance in discriminating between the two epimers. The native anthocyanins, delivered into the gas phase by electrospray ionization, display a comparable drift time in ion mobility mass spectrometry (IM-MS) and a common fragment, corresponding to loss of the sugar moiety, in their collision induced dissociation (CID) pattern. However, the IR multiple photon dissociation (IRMPD) spectra in the fingerprint range show a feature particularly evident in the case of the glucoside. This signature is used to identify the presence of cyanidin 3-O-β-glucopyranoside in a natural extract of pomegranate. In an effort to increase any differentiation between the two epimers, aluminum complexes were prepared and sampled for elemental composition by FT-ICR-MS. CID experiments now display an extensive fragmentation pattern, showing few product ions peculiar to each species. More noteworthy is the IRMPD behavior in the OH stretching range showing significant differences in the spectra of the two epimers. DFT calculations allow to interpret the observed distinct bands due to a varied network of hydrogen bonding and relative conformer stability

HAL AMU

Archivio della ricerca- Università di Roma La Sapienza

Study on Alkane Patterns of Grassland Species from the Patagonian Steppe

Author: Bakker M. L.
Cepeda R. E.
Cesa A.
Marinelli C. B.
Publication venue: UKnowledge
Publication date: 21/03/2021
Field of study

University of Kentucky

Diario di un viaggio : l'ultima creativit\ue0

Author: C. Cristini
G. Cesa Bianchi
Publication venue
Publication date: 01/01/2009
Field of study

AIR Universita degli studi di Milano

Trading-off payments and accuracy in online classification with paid stochastic experts

Author: Cesa-Bianchi N
Pike-Burke C
Qiu H
Van der Hoeven D
Publication venue: ML Research Press
Publication date: 24/04/2023
Field of study

We investigate online classification with paid stochastic experts. Here, before making their prediction, each expert must be paid. The amount that we pay each expert directly influences the accuracy of their prediction through some unknown Lipschitz “productivity” function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a weighted sum of the prediction error and upfront payments for all experts. We introduce an online learning algorithm whose total cost after T rounds exceeds that of a predictor which knows the productivity of all experts in advance by at most O(K2(lnT)T−−√) where K is the number of experts. In order to achieve this result, we combine Lipschitz bandits and online classification with surrogate losses. These tools allow us to improve upon the bound of order T2/3 one would obtain in the standard Lipschitz bandit setting. Our algorithm is empirically evaluated on synthetic data

Spiral - Imperial College Digital Repository

Identificación de tetrahidrogeranilgeraniol y dihidrogeranilgeraniol en aceites de oliva virgen extra

Author: Cesa S.
Ingallina C.
Mannina L.
Mariani C.
Publication venue: 'Editorial CSIC'
Publication date: 01/01/2018
Field of study

Olive oil contains many different compounds which are responsible for its nutritional and sensorial value. However, some compounds present in olive oil at very low amounts have not yet been identified. Here, the detection of tetrahydrogeranylgeraniol and dihydrogeranylgeraniol, in both the total aliphatic alcohol and waxy fractions of extra virgin olive oil, is reported for the first time using GC and GC-MS methodologies. It was suggested that tetrahydrogeranylgeraniol and dihydrogeranylgeraniol do not originate from the hydrolysis of the chlorophyll but are present as diterpenic esters.Los aceites de oliva contienen muchos compuestos diferentes responsables de su valor nutricional y sensorial. Sin embargo, algunos compuestos presentes en los aceites de oliva en cantidades muy bajas aún no se han identificado. En este trabajo la detección de tetrahidrogeranilgeraniol y dihidrogeranilgeraniol, en las fracciones de alcoholes alifáticos totales y en la de ceras de los aceites de oliva virgen extra, es reportado por primera vez utilizando metodologías de GC y GC-MS. Se sugiere que el tetrahidrogeranilgeraniol y el dihidrogeranilgeraniol no se originan por hidrólisis de la clorofila, sino que están presentes como ésteres diterpénicos

Directory of Open Access Journals

Grasas y Aceites (E-Journal)

Archivio della ricerca- Università di Roma La Sapienza

Perfil tecnológico de cultivo de trigo em lavouras tecnicamente assistidas no Paraná - safra 2012.

Author: BODNAR A.
CESA P.
DOSSA A.
FOLONI J. S. S.
HARGER N.
MORI C. de
Publication venue
Publication date: 18/08/2014
Field of study

Repository Open Access to Scientific Information from Embrapa