Search CORE

595 research outputs found

Off-Policy Evaluation with Deficient Support Using Side Information

Author: Cremonesi Paolo
Felicioni Nicolò
Ferrari Dacrema Maurizio
Restelli Marcello
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2022
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Efficient, safe and adaptive learning from user interactions

Author: Jagerman R.M.
Publication venue
Publication date: 01/01/2020
Field of study

International Migration, Integration and Social Cohesion online publications

A second look at memory: Different Approaches to Understanding Diversity in Memory and Cognition

Author: Jativa Vega Sofia Alejandra
Publication venue: UCL (University College London)
Publication date: 28/05/2020
Field of study

Memory lies at the heart of human cognitive abilities. Therefore, understanding it from neural, psychological and computational viewpoints is of key importance for computational neuroscience, psychology and beyond. In this thesis, I explore two prominent, but different, memory systems: episodic memory and working memory. First, I propose a modification to a recent reinforcement learning algorithm for decision making in which single memories of events, i.e., episodic memories, are integrated to compute the long run value of actions. I argue that these memories are recalled and that their contributions are weighted based on context. Further, I propose that predictions made by this algorithm are combined with those that come from a standard, model-free, reinforcement learning algorithm. I suggest that humans can flexibly choose between these two sources of information to make decisions and guide actions. I show that the resulting combined model best fits data on human choices, outperforming previously proposed models. To complement these algorithmic and psychological suggestions, I present a generative model of the world according to which this sort of episodic recall is an appropriate method for making inferences and predictions of future rewards. Contrary to other suggestions for reward-based learning, this generative model can model events that not only drift continuously in time, but can also suddenly change to new or repeated events. Turning to working memory, I use information theoretic analyses to show that dynamic synapses, whose strengths adjust with usage, can increase its capacity. I argue that these components should be included in the study of working memory. The thesis ends with an explanation of the connections between these memory systems

UCL Discovery

Discovering Valuable Items from Massive Data

Author: Bubeck S.
Dani V.
Desautels T.
Garnett R.
Kale S.
Krause A.
Kulesza A.
Lawrence N. D.
Lin H.
Nemhauser G.
Rasmussen C. E.
Schölkopf B.
Settles B.
Slivkins A.
Streeter M.
Streeter M.
Tran-Thanh L.
Yue Y.
Publication venue
Publication date: 02/06/2015
Field of study

Suppose there is a large collection of items, each with an associated cost and an inherent utility that is revealed only once we commit to selecting it. Given a budget on the cumulative cost of the selected items, how can we pick a subset of maximal value? This task generalizes several important problems such as multi-arm bandits, active search and the knapsack problem. We present an algorithm, GP-Select, which utilizes prior knowledge about similarity be- tween items, expressed as a kernel function. GP-Select uses Gaussian process prediction to balance exploration (estimating the unknown value of items) and exploitation (selecting items of high value). We extend GP-Select to be able to discover sets that simultaneously have high utility and are diverse. Our preference for diversity can be specified as an arbitrary monotone submodular function that quantifies the diminishing returns obtained when selecting similar items. Furthermore, we exploit the structure of the model updates to achieve an order of magnitude (up to 40X) speedup in our experiments without resorting to approximations. We provide strong guarantees on the performance of GP-Select and apply it to three real-world case studies of industrial relevance: (1) Refreshing a repository of prices in a Global Distribution System for the travel industry, (2) Identifying diverse, binding-affine peptides in a vaccine de- sign task and (3) Maximizing clicks in a web-scale recommender system by recommending items to users

arXiv.org e-Print Archive

CiteSeerX

Repository for Publications and Research Data

Crossref

Context-Aware Hierarchical Online Learning for Performance Maximization in Mobile Crowdsourcing

Author: Klein Anja
Klos Sabrina
Tekin Cem
van der Schaar Mihaela
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/05/2018
Field of study

In mobile crowdsourcing (MCS), mobile users accomplish outsourced human intelligence tasks. MCS requires an appropriate task assignment strategy, since different workers may have different performance in terms of acceptance rate and quality. Task assignment is challenging, since a worker's performance (i) may fluctuate, depending on both the worker's current personal context and the task context, (ii) is not known a priori, but has to be learned over time. Moreover, learning context-specific worker performance requires access to context information, which may not be available at a central entity due to communication overhead or privacy concerns. Additionally, evaluating worker performance might require costly quality assessments. In this paper, we propose a context-aware hierarchical online learning algorithm addressing the problem of performance maximization in MCS. In our algorithm, a local controller (LC) in the mobile device of a worker regularly observes the worker's context, her/his decisions to accept or decline tasks and the quality in completing tasks. Based on these observations, the LC regularly estimates the worker's context-specific performance. The mobile crowdsourcing platform (MCSP) then selects workers based on performance estimates received from the LCs. This hierarchical approach enables the LCs to learn context-specific worker performance and it enables the MCSP to select suitable workers. In addition, our algorithm preserves worker context locally, and it keeps the number of required quality assessments low. We prove that our algorithm converges to the optimal task assignment strategy. Moreover, the algorithm outperforms simpler task assignment strategies in experiments based on synthetic and real data.Comment: 18 pages, 10 figure

arXiv.org e-Print Archive

TUbiblio

Autonomous Drug Design with Multi-Armed Bandits

Author: Bjerrum Esben Jannik
Chehreghani Morteza Haghir
Engkvist Ola
Svensson Hampus Gummesson
Tyrchan Christian
Publication venue
Publication date: 01/01/2022
Field of study

Recent developments in artificial intelligence and automation support a new drug design paradigm: autonomous drug design. Under this paradigm, generative models can provide suggestions on thousands of molecules with specific properties, and automated laboratories can potentially make, test and analyze molecules with minimal human supervision. However, since still only a limited number of molecules can be synthesized and tested, an obvious challenge is how to efficiently select among provided suggestions in a closed-loop system. We formulate this task as a stochastic multi-armed bandit problem with multiple plays, volatile arms and similarity information. To solve this task, we adapt previous work on multi-armed bandits to this setting, and compare our solution with random sampling, greedy selection and decaying-epsilon-greedy selection strategies. According to our simulation results, our approach has the potential to perform better exploration and exploitation of the chemical space for autonomous drug design

arXiv.org e-Print Archive

Chalmers Research