Search CORE

148 research outputs found

Select Suppliers from Electronic Markets with Incomplete Information

Author: Chen Li-gang
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/12/2010
Field of study

An agent want to buy products from e-market often encounters unknown suppliers, he then must choose between maximizing its expected utility according to the known suppliers and trying to learn more about the unknown suppliers, since this may improve its future rewards. This issue is known as the trade-off between exploitation and exploration. In this research, we study the problem of an agent how to select suppliers from electronic markets with incomplete information. The agent has no knowledge about suppliers, so he needs to learn the information by consuming their product and his object is to maximize total utility. We consider two different scenarios. The first is an agent selects a single supplier at each time period. By the introduction of Gittins index, we show that by using Gittins index technology, the agent can achieve the optimal solution. The second is an agent can select several suppliers at each time period, we propose four heuristic policies and evaluate them by building up a simulation tool

AIS Electronic Library (AISeL)

R-UCB: a Contextual Bandit Algorithm for Risk-Aware Recommender Systems

Author: Bouneffouf Djallel
Publication venue
Publication date: 10/08/2014
Field of study

Mobile Context-Aware Recommender Systems can be naturally modelled as an exploration/exploitation trade-off (exr/exp) problem, where the system has to choose between maximizing its expected rewards dealing with its current knowledge (exploitation) and learning more about the unknown user's preferences to improve its knowledge (exploration). This problem has been addressed by the reinforcement learning community but they do not consider the risk level of the current user's situation, where it may be dangerous to recommend items the user may not desire in her current situation if the risk level is high. We introduce in this paper an algorithm named R-UCB that considers the risk level of the user's situation to adaptively balance between exr and exp. The detailed analysis of the experimental results reveals several important discoveries in the exr/exp behaviour

arXiv.org e-Print Archive

CiteSeerX

Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits

Author: Galichet Nicolas
Sebag Michèle
Teytaud Olivier
Publication venue
Publication date: 13/11/2013
Field of study

Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the exploration of risky arms, MARAB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MARAB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.Comment: 16 page

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Active Sensing as Bayes-Optimal Sequential Decision Making

Author: Ahmad Sheeraz
Yu Angela J.
Publication venue
Publication date: 28/05/2013
Field of study

Sensory inference under conditions of uncertainty is a major problem in both machine learning and computational neuroscience. An important but poorly understood aspect of sensory processing is the role of active sensing. Here, we present a Bayes-optimal inference and control framework for active sensing, C-DAC (Context-Dependent Active Controller). Unlike previously proposed algorithms that optimize abstract statistical objectives such as information maximization (Infomax) [Butko & Movellan, 2010] or one-step look-ahead accuracy [Najemnik & Geisler, 2005], our active sensing model directly minimizes a combination of behavioral costs, such as temporal delay, response error, and effort. We simulate these algorithms on a simple visual search task to illustrate scenarios in which context-sensitivity is particularly beneficial and optimization with respect to generic statistical objectives particularly inadequate. Motivated by the geometric properties of the C-DAC policy, we present both parametric and non-parametric approximations, which retain context-sensitivity while significantly reducing computational complexity. These approximations enable us to investigate the more complex problem involving peripheral vision, and we notice that the difference between C-DAC and statistical policies becomes even more evident in this scenario.Comment: Scheduled to appear in UAI 201

arXiv.org e-Print Archive

CiteSeerX