Search CORE

3,439 research outputs found

Estimating the Maximum Expected Value: An Analysis of (Nested) Cross Validation and the Maximum Sample Average

Author: van Hasselt Hado
Publication venue
Publication date: 01/03/2013
Field of study

We investigate the accuracy of the two most common estimators for the maximum expected value of a general set of random variables: a generalization of the maximum sample average, and cross validation. No unbiased estimator exists and we show that it is non-trivial to select a good estimator without knowledge about the distributions of the random variables. We investigate and bound the bias and variance of the aforementioned estimators and prove consistency. The variance of cross validation can be significantly reduced, but not without risking a large bias. The bias and variance of different variants of cross validation are shown to be very problem-dependent, and a wrong choice can lead to very inaccurate estimates

arXiv.org e-Print Archive

CiteSeerX

Bayesian Sampling Algorithms for the Sample Selection and Two-Part Models

Author: Martijn van Hasselt
Publication venue
Publication date
Field of study

This paper considers two models to deal with an outcome variable that contains a large fraction of zeros, such as individual expenditures on health care: a sample-selection model and a two-part model. The sample-selection model uses two possibly correlated processes to determine the outcome: a decision process and an outcome process; conditional on a favorable decision, the outcome is observed. The two-part model comprises uncorrelated decision and outcome processes. The paper addresses the issue of selecting between these two models. With a Gaussian specification of the likelihood, the models are nested and inference can focus on the correlation coefficient. Using a fully parametric Bayesian approach, I present sampling algorithms for the model parameters that are based on data augmentation. In addition to the sampler output of the correlation coefficient, a Bayes factor can be computed to distinguish between the models. The paper illustrates the methods and their potential pitfalls using simulated data setsSample Selection, Data Augmentation, Gibbs Sampling

Research Papers in Economics

Deep Reinforcement Learning with Double Q-learning

Author: Guez Arthur
Silver David
van Hasselt Hado
Publication venue
Publication date: 08/12/2015
Field of study

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Effect of linear polarisability and local fields on surface SHG

Author: Boeij P.L. de
Hasselt C.W. van
Rasing Th.
Wijers C.M.J.
Publication venue: Pergamon
Publication date: 01/01/1994
Field of study

A discrete dipole model has been developed to describe Surface Second Harmonic Generation by centrosymmetric semiconductors. The double cell method, which enables the linear reflection problem to be solved numerically for semi-infinite systems, has been extended for the nonlinear case. It is shown that a single layer of nonlinear electric dipoles at the surface and nonlocal effects allows to describe the angle of incidence dependent anisotropic SHG obtained from oxidised Si(001) wafers. The influence of the linear response, turns out to be essential to understand the anisotropic SHG-process

Radboud Repository (Radboud Univ.)

University of Twente Research Information

Multi-task Deep Reinforcement Learning with PopArt

Author: Czarnecki Wojciech
Espeholt Lasse
Hessel Matteo
Schmitt Simon
Soyer Hubert
van Hasselt Hado
Publication venue
Publication date: 12/09/2018
Field of study

The reinforcement learning community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequential-decision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent's updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Виправте!

Author: Anderson
Bermúdez
Berry
Cella
De Graaf
Duffull
Gidding
Groninger
Groninger
Guilhaumou
Harnicar
Holford
Jeena
Karlsson
Kimko
Laughon
Moore
Mouksassi
Pana
Piana
Teusink
Van Hasselt
Van Hasselt
Van Hasselt
Van Hasselt
Van Schie
Publication venue: Iнститут української мови НАН України
Publication date: 01/01/2010
Field of study

BACKGROUND: The aim of the current work was to perform a clinical trial simulation (CTS) analysis to optimize a drug-drug interaction (DDI) study of vincristine in children who also received azole antifungals, taking into account challenges of conducting clinical trials in this population, and, to provide a motivating example of the application of CTS in the design of pediatric oncology clinical trials. PROCEDURE: A pharmacokinetic (PK) model for vincristine in children was used to simulate concentration-time profiles. A continuous model for body surface area versus age was defined based on pediatric growth curves. Informative sampling time windows were derived using D-optimal design. The CTS framework was used to different magnitudes of clearance inhibition (10%, 25%, or 40%), sample size (30-500), the impact of missing samples or sampling occasions, and the age distribution, on the power to detect a significant inhibition effect, and in addition, the relative estimation error (REE) of the interaction effect. RESULTS: A minimum group specific sample size of 38 patients with a total sample size of 150 patients was required to detect a clearance inhibition effect of 40% with 80% power, while in the case of a lower effect of clearance inhibition, a substantially larger sample size was required. However, for the majority of re-estimated drug effects, the inhibition effect could be estimated precisely (REE < 25%) in even smaller sample sizes and with lower effect sizes. CONCLUSION: This work demonstrated the utility of CTS for the evaluation of PK clinical trial designs in the pediatric oncology population

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Crossref

Utrecht University Repository