Search CORE

375 research outputs found

Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges.

Author: Bowden Jack
Villar Sofía S
Wason James
Publication venue: Stat Sci
Publication date: 01/01/2015
Field of study

Multi-armed bandit problems (MABPs) are a special type of optimal control problem well suited to model resource allocation under uncertainty in a wide variety of contexts. Since the first publication of the optimal solution of the classic MABP by a dynamic index rule, the bandit literature quickly diversified and emerged as an active research topic. Across this literature, the use of bandit models to optimally design clinical trials became a typical motivating application, yet little of the resulting theory has ever been used in the actual design and analysis of clinical trials. To this end, we review two MABP decision-theoretic approaches to the optimal allocation of treatments in a clinical trial: the infinite-horizon Bayesian Bernoulli MABP and the finite-horizon variant. These models possess distinct theoretical properties and lead to separate allocation rules in a clinical trial design context. We evaluate their performance compared to other allocation rules, including fixed randomization. Our results indicate that bandit approaches offer significant advantages, in terms of assigning more patients to better treatments, and severe limitations, in terms of their resulting statistical power. We propose a novel bandit-based patient allocation rule that overcomes the issue of low power, thus removing a potential barrier for their use in practice

arXiv.org e-Print Archive

Crossref

PubMed Central

Apollo (Cambridge)

Explore Bristol Research

Reinforcement Learning: A Survey

Author: Kaelbling L. P.
Littman M. L.
Moore A. W.
Publication venue
Publication date: 01/01/1996
Field of study

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Predictive maintenance for the heated hold-up tank

Author: Aldemir
Aven
Benoîte de Saporta
Cojazzi
Davis
de Saporta
de Saporta
Devooght
Dutuit
Grall
Gugerli
Huilong Zhang
Li
Marseguerra
Marseguerra
Marseguerra
Pagès
Pagès
Pagès
Schoenig
Siu
van Noortwijk
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

We present a numerical method to compute an optimal maintenance date for the test case of the heated hold-up tank. The system consists of a tank containing a fluid whose level is controlled by three components: two inlet pumps and one outlet valve. A thermal power source heats up the fluid. The failure rates of the components depends on the temperature, the position of the three components monitors the liquid level in the tank and the liquid level determines the temperature. Therefore, this system can be modeled by a hybrid process where the discrete (components) and continuous (level, temperature) parts interact in a closed loop. We model the system by a piecewise deterministic Markov process, propose and implement a numerical method to compute the optimal maintenance date to repair the components before the total failure of the system.Comment: arXiv admin note: text overlap with arXiv:1101.174

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Oskar Bordeaux

Theory of Resource Allocation for Robust Distributed Computing

Author: Pezoa Jorge E.
Publication venue: UNM Digital Repository
Publication date: 09/02/2011
Field of study

Lately, distributed computing (DC) has emerged in several application scenarios such as grid computing, high-performance and reconfigurable computing, wireless sensor networks, battle management systems, peer-to-peer networks, and donation grids. When DC is performed in these scenarios, the distributed computing system (DCS) supporting the applications not only exhibits heterogeneous computing resources and a significant communication latency, but also becomes highly dynamic due to the communication network as well as the computing servers are affected by a wide class of anomalies that change the topology of the system in a random fashion. These anomalies exhibit spatial and/or temporal correlation when they result, for instance, from wide-area power or network outages These correlated failures may not only inflict a large amount of damage to the system, but they may also induce further failures in other servers as a result of the lack of reliable communication between the components of the DCS. In order to provide a robust DC environment in the presence of component failures, it is key to develop a general framework for accurately modeling the complex dynamics of a DCS. In this dissertation a novel approach has been undertaken for modeling a general class of DCSs and for analytically characterizing the performance and reliability of parallel applications executed on such systems. A general probabilistic model has been constructed by assuming that the random times governing the dynamics of the DCS follow arbitrary probability distributions with heterogeneous parameters. Auxiliary age variables have been introduced in the modeling of a DCS and a hybrid continuous and discrete state-space model the system has been constructed. This hybrid model has enabled the development of an age-dependent stochastic regeneration theory, which, in turn, has been employed to analytically characterize the average execution time, the quality-of-service and the reliability in serving an application. These are three metrics of performance and reliability of practical interest in DC. Analytical approximations as well as mathematical lower and upper bounds for these metrics have also been derived in an attempt to reduce the amount of computational resources demanded by the exact characterizations. In order to systematically assess the reliability of DCSs in the presence of correlated component failures, a novel probabilistic model for spatially correlated failures has been developed. The model, based on graph theory and Markov random fields, captures both geographical and logical correlations induced by the arbitrary topology of the communication network of a DCS. The modeling framework, in conjunction with a general class of dynamic task reallocation (DTR) control policies, has been used to optimize the performance and reliability of applications in the presence of independent as well as spatially correlated anomalies. Theoretical predictions, Monte- Carlo simulations as well as experimental results have shown that optimizing these metrics can significantly impact the performance of a DCS. Moreover, the general setting developed here has shed insights on: (i) the effect of different stochastic mod- els on the accuracy of the performance and reliability metrics, (ii) the dependence of the DTR policies on system parameters such as failure rates and task-processing rates, (iii) the severe impact of correlated failures on the reliability of DCSs, (iv) the dependence of the DTR policies on degree of correlation in the failures, and (v) the fundamental trade-off between minimizing the execution time of an application and maximizing its reliability

Artificial Sequences and Complexity Measures

In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools to extract, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of Artificial Text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression approach to Information Extraction and Classification" by A. Baronchelli and V. Loreto. 15 pages; 5 figure

arXiv.org e-Print Archive

City Research Online

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Drought impact on regional economy

Author: Millán Jaime
Publication venue: 'Colorado State University Libraries'
Publication date: 01/01/1972
Field of study

October 1972.Bibliography: pages 65-68

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Bayesian methods and optimal experimental design for gene mapping by radiation hybrids

Author: Bishop D. T.
Boehnke M.
Boehnke M.
Chakravarti A.
Chernoff H.
Dempster A. P.
Geman S.
Green P.
Guerra R.
Haldane J. B. S.
Kalos M. H.
Karlin S.
Meng X.-L.
Press W. H.
Rao C. R.
Read C. B.
Reingold E. M.
Richard C. W.
Rosenblatt M.
Rubin D. B.
Shorack G. R.
Weeks D. E.
Publication venue: 'Wiley'
Publication date: 01/05/1992
Field of study

Radiation hybrid mapping is a somatic cell technique for ordering human loci along a chromosome and estimating the physical distance between adjacent loci. The present paper considers a realistic model of fragment generation and retention. This model assumes that fragments are generated in the ancestral cell of a clone according to a Poisson breakage process along the chromosome. Once generated, fragments are independently retained in the clone with a common retention probability. Based on this and less restrictive models, statistical criteria such as minimum obligate breaks, maximum likelihood, and Bayesian posterior probabilities can be used to decide order. Distances can be estimated by either maximum likelihood or Bayesian posterior means. The model also permits rational design of radiation dose for optimal statistical precision. A brief examination of some real data illustrates our criteria and computational algorithms.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/65749/1/j.1469-1809.1992.tb01139.x.pd

Crossref

Deep Blue Documents at the University of Michigan

Smoothing Policies and Safe Policy Gradients

Author: Papini Matteo
Pirotta Matteo
Restelli Marcello
Publication venue
Publication date: 08/05/2019
Field of study

Policy gradient algorithms are among the best candidates for the much anticipated application of reinforcement learning to real-world control tasks, such as the ones arising in robotics. However, the trial-and-error nature of these methods introduces safety issues whenever the learning phase itself must be performed on a physical system. In this paper, we address a specific safety formulation, where danger is encoded in the reward signal and the learning agent is constrained to never worsen its performance. By studying actor-only policy gradient from a stochastic optimization perspective, we establish improvement guarantees for a wide class of parametric policies, generalizing existing results on Gaussian policies. This, together with novel upper bounds on the variance of policy gradient estimators, allows to identify those meta-parameter schedules that guarantee monotonic improvement with high probability. The two key meta-parameters are the step size of the parameter updates and the batch size of the gradient estimators. By a joint, adaptive selection of these meta-parameters, we obtain a safe policy gradient algorithm

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

UPF Digital Repository

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems.

Author: Laurent Guillaume J.
Le Fort-Piat Nadine
Matignon Laëtitia
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 06/03/2012
Field of study

International audienceIn the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, nonstationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive FMQ and WoLF PHC. An overview of the learning algorithms' strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications

HAL - Université de Franche-Comté

HAL Descartes

Hal-Diderot