377 research outputs found

    Sequential Transfer in Multi-armed Bandit with Finite Set of Models

    Get PDF
    Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of \textit{sequential transfer in online learning}, notably in the multi-armed bandit framework, where the objective is to minimize the cumulative regret over a sequence of tasks by incrementally transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for the estimation of the possible tasks and derive regret bounds for it

    Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

    Get PDF
    International audienceWe consider the problem of learning the optimal action-value function in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative model of the MDP: value iteration and policy iteration. The first result indicates that for an MDP with NN state-action pairs and the discount factor Ī³in[0, 1) only O(Nlog(N/Ī“)/[(1āˆ’Ī³)3Ļµ2])O(N log(N/Ī“)/ [(1 - Ī³)3 \epsilon^2]) state-transition samples are required to find an Ļµ\epsilon-optimal estimation of the action-value function with the probability (w.p.) 1-Ī“. Further, we prove that, for small values of Ļµ\epsilon, an order of O(Nlog(N/Ī“)/[(1āˆ’Ī³)3Ļµ2])O(N log(N/Ī“)/ [(1 - Ī³)3 \epsilon^2]) samples is required to find an Ļµ\epsilon -optimal policy w.p. 1-Ī“. We also prove a matching lower bound of Ī©(Nlog(N/Ī“)/[(1āˆ’Ī³)3Ļµ2])\Omega(N log(N/Ī“)/ [(1 - Ī³)3\epsilon2]) on the sample complexity of estimating the optimal action-value function. To the best of our knowledge, this is the first minimax result on the sample complexity of RL: The upper bound matches the lower bound interms of NN , Ļµ\epsilon, Ī“ and 1/(1 -Ī³) up to a constant factor. Also, both our lower bound and upper bound improve on the state-of-the-art in terms of their dependence on 1/(1-Ī³)

    Online Stochastic Optimization under Correlated Bandit Feedback

    Get PDF
    In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel any-time X\mathcal{X}-armed bandit algorithm, and derive regret bounds matching the performance of existing state-of-the-art in terms of dependency on number of steps and smoothness factor. The main advantage of HCT is that it handles the challenging case of correlated rewards, whereas existing methods require that the reward-generating process of each arm is an identically and independent distributed (iid) random process. HCT also improves on the state-of-the-art in terms of its memory requirement as well as requiring a weaker smoothness assumption on the mean-reward function in compare to the previous anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results

    On the Sample Complexity of Reinforcement Learning with a Generative Model

    Get PDF
    International audienceWe consider the problem of learning the optimal action-value function in the discounted-reward Markov decision processes (MDPs). We prove a new PAC bound on the sample-complexity of model-based value iteration algorithm in the presence of the generative model, which indicates that for an MDP with N state-action pairs and the discount factor \gamma\in[0,1) only O(N\log(N/\delta)/((1-\gamma)^3\epsilon^2)) samples are required to find an \epsilon-optimal estimation of the action-value function with the probability 1-\delta. We also prove a matching lower bound of \Theta (N\log(N/\delta)/((1-\gamma)^3\epsilon^2)) on the sample complexity of estimating the optimal action-value function by every RL algorithm. To the best of our knowledge, this is the first matching result on the sample complexity of estimating the optimal (action-) value function in which the upper bound matches the lower bound of RL in terms of N, \epsilon, \delta and 1/(1-\gamma). Also, both our lower bound and our upper bound significantly improve on the state-of-the-art in terms of 1/(1-\gamma)

    The effects of pollen sources and foliar application of zinc and boron on fruit set and fruit traits of three hazelnut cultivars

    Get PDF
    The productivity of plants is generally influenced by the environment, the physiology of plant species and their management, species genetics, and their interactions. The present research aimed to assess the effects of various pollen sources (ā€˜Bolibaā€™, ā€˜Gercheā€™, and ā€˜Davianaā€™) on physical and chemical traits of nuts in some dominant cultivars (ā€˜Gerde-Eshkevaratā€™, ā€˜Fertileā€™, and ā€˜Segorbeā€™) in Iranā€™s hazelnut production industry. The effects of the application of micronutrients B as borax and Zn as zinc sulfate on improving the productivity of vegetative and reproductive processes, and then the interactive effect of these factors on hazelnut and kernel yield and quality were evaluated. The results showed that there was dichogamy in all studied cultivars and all cultivars were protandrous. The blooming time of male and female flowers was different among cultivars. After the nuts were harvested, nut and kernel traits were assessed. The highest weight of nuts with green husk (7.1 g) was related to ā€˜Fertileā€™ Ɨ ā€˜Gercheā€™ Ɨ borax + zinc sulfateā€™ and the lowest (2.9 g) to the treatment of ā€˜Segorbeā€™ Ɨ ā€˜Davianaā€™Ć— borax + zincā€™. The results indicated that the effect of the pollinizer parent was significant on hazelnut kernel and nut traits. The highest nut and kernel dimensions were obtained from ā€˜Fertileā€™. The local variety (ā€˜Gerde-Eshkevaratā€™) produced the widest kernels. In conclusion, among the assessed cultivars, the foliar application of zinc and boron had a significant effect on the quality (oil, Zn and B) of the hazelnuts

    The effects of pollen sources and foliar application of zinc and boron on fruit set and fruit traits of three hazelnut cultivars

    Get PDF
    The productivity of plants is generally influenced by the environment, the physiology of plant species and their management, species genetics, and their interactions. The present research aimed to assess the effects of various pollen sources (ā€˜Bolibaā€™, ā€˜Gercheā€™, and ā€˜Davianaā€™) on physical and chemical traits of nuts in some dominant cultivars (ā€˜Gerde-Eshkevaratā€™, ā€˜Fertileā€™, and ā€˜Segorbeā€™) in Iranā€™s hazelnut production industry. The effects of the application of micronutrients B as borax and Zn as zinc sulfate on improving the productivity of vegetative and reproductive processes, and then the interactive effect of these factors on hazelnut and kernel yield and quality were evaluated. The results showed that there was dichogamy in all studied cultivars and all cultivars were protandrous. The blooming time of male and female flowers was different among cultivars. After the nuts were harvested, nut and kernel traits were assessed. The highest weight of nuts with green husk (7.1 g) was related to ā€˜Fertileā€™ Ɨ ā€˜Gercheā€™ Ɨ borax + zinc sulfateā€™ and the lowest (2.9 g) to the treatment of ā€˜Segorbeā€™ Ɨ ā€˜Davianaā€™Ć— borax + zincā€™. The results indicated that the effect of the pollinizer parent was significant on hazelnut kernel and nut traits. The highest nut and kernel dimensions were obtained from ā€˜Fertileā€™. The local variety (ā€˜Gerde-Eshkevaratā€™) produced the widest kernels. In conclusion, among the assessed cultivars, the foliar application of zinc and boron had a significant effect on the quality (oil, Zn and B) of the hazelnuts
    • ā€¦
    corecore