3,439 research outputs found

    Estimating the Maximum Expected Value: An Analysis of (Nested) Cross Validation and the Maximum Sample Average

    Full text link
    We investigate the accuracy of the two most common estimators for the maximum expected value of a general set of random variables: a generalization of the maximum sample average, and cross validation. No unbiased estimator exists and we show that it is non-trivial to select a good estimator without knowledge about the distributions of the random variables. We investigate and bound the bias and variance of the aforementioned estimators and prove consistency. The variance of cross validation can be significantly reduced, but not without risking a large bias. The bias and variance of different variants of cross validation are shown to be very problem-dependent, and a wrong choice can lead to very inaccurate estimates

    Bayesian Sampling Algorithms for the Sample Selection and Two-Part Models

    Get PDF
    This paper considers two models to deal with an outcome variable that contains a large fraction of zeros, such as individual expenditures on health care: a sample-selection model and a two-part model. The sample-selection model uses two possibly correlated processes to determine the outcome: a decision process and an outcome process; conditional on a favorable decision, the outcome is observed. The two-part model comprises uncorrelated decision and outcome processes. The paper addresses the issue of selecting between these two models. With a Gaussian specification of the likelihood, the models are nested and inference can focus on the correlation coefficient. Using a fully parametric Bayesian approach, I present sampling algorithms for the model parameters that are based on data augmentation. In addition to the sampler output of the correlation coefficient, a Bayes factor can be computed to distinguish between the models. The paper illustrates the methods and their potential pitfalls using simulated data setsSample Selection, Data Augmentation, Gibbs Sampling

    Deep Reinforcement Learning with Double Q-learning

    Full text link
    The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.Comment: AAAI 201

    Effect of linear polarisability and local fields on surface SHG

    Get PDF
    A discrete dipole model has been developed to describe Surface Second Harmonic Generation by centrosymmetric semiconductors. The double cell method, which enables the linear reflection problem to be solved numerically for semi-infinite systems, has been extended for the nonlinear case. It is shown that a single layer of nonlinear electric dipoles at the surface and nonlocal effects allows to describe the angle of incidence dependent anisotropic SHG obtained from oxidised Si(001) wafers. The influence of the linear response, turns out to be essential to understand the anisotropic SHG-process

    Multi-task Deep Reinforcement Learning with PopArt

    Full text link
    The reinforcement learning community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequential-decision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent's updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab

    Виправте!

    Get PDF
    BACKGROUND: The aim of the current work was to perform a clinical trial simulation (CTS) analysis to optimize a drug-drug interaction (DDI) study of vincristine in children who also received azole antifungals, taking into account challenges of conducting clinical trials in this population, and, to provide a motivating example of the application of CTS in the design of pediatric oncology clinical trials. PROCEDURE: A pharmacokinetic (PK) model for vincristine in children was used to simulate concentration-time profiles. A continuous model for body surface area versus age was defined based on pediatric growth curves. Informative sampling time windows were derived using D-optimal design. The CTS framework was used to different magnitudes of clearance inhibition (10%, 25%, or 40%), sample size (30-500), the impact of missing samples or sampling occasions, and the age distribution, on the power to detect a significant inhibition effect, and in addition, the relative estimation error (REE) of the interaction effect. RESULTS: A minimum group specific sample size of 38 patients with a total sample size of 150 patients was required to detect a clearance inhibition effect of 40% with 80% power, while in the case of a lower effect of clearance inhibition, a substantially larger sample size was required. However, for the majority of re-estimated drug effects, the inhibition effect could be estimated precisely (REE < 25%) in even smaller sample sizes and with lower effect sizes. CONCLUSION: This work demonstrated the utility of CTS for the evaluation of PK clinical trial designs in the pediatric oncology population
    corecore