Search CORE

55,271 research outputs found

Sequential Design for Optimal Stopping Problems

Author: Gramacy Robert B.
Ludkovski Mike
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 29/07/2014
Field of study

We propose a new approach to solve optimal stopping problems via simulation. Working within the backward dynamic programming/Snell envelope framework, we augment the methodology of Longstaff-Schwartz that focuses on approximating the stopping strategy. Namely, we introduce adaptive generation of the stochastic grids anchoring the simulated sample paths of the underlying state process. This allows for active learning of the classifiers partitioning the state space into the continuation and stopping regions. To this end, we examine sequential design schemes that adaptively place new design points close to the stopping boundaries. We then discuss dynamic regression algorithms that can implement such recursive estimation and local refinement of the classifiers. The new algorithm is illustrated with a variety of numerical experiments, showing that an order of magnitude savings in terms of design size can be achieved. We also compare with existing benchmarks in the context of pricing multi-dimensional Bermudan options.Comment: 24 page

arXiv.org e-Print Archive

CiteSeerX

Performance Dynamics and Termination Errors in Reinforcement Learning: A Unifying Perspective

Author: Kuang Nikki Lijing
Leung Clement H. C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/02/2019
Field of study

In reinforcement learning, a decision needs to be made at some point as to whether it is worthwhile to carry on with the learning process or to terminate it. In many such situations, stochastic elements are often present which govern the occurrence of rewards, with the sequential occurrences of positive rewards randomly interleaved with negative rewards. For most practical learners, the learning is considered useful if the number of positive rewards always exceeds the negative ones. A situation that often calls for learning termination is when the number of negative rewards exceeds the number of positive rewards. However, while this seems reasonable, the error of premature termination, whereby termination is enacted along with the conclusion of learning failure despite the positive rewards eventually far outnumber the negative ones, can be significant. In this paper, using combinatorial analysis we study the error probability in wrongly terminating a reinforcement learning activity which undermines the effectiveness of an optimal policy, and we show that the resultant error can be quite high. Whilst we demonstrate mathematically that such errors can never be eliminated, we propose some practical mechanisms that can effectively reduce such errors. Simulation experiments have been carried out, the results of which are in close agreement with our theoretical findings.Comment: Short Paper in AIKE 201

arXiv.org e-Print Archive

Crossref

A particle filtering approach for joint detection/estimation of multipath effects on GPS measurements

Author: Calmettes Vincent
Giremus Audrey
Tourneret Jean-Yves
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Multipath propagation causes major impairments to Global Positioning System (GPS) based navigation. Multipath results in biased GPS measurements, hence inaccurate position estimates. In this work, multipath effects are considered as abrupt changes affecting the navigation system. A multiple model formulation is proposed whereby the changes are represented by a discrete valued process. The detection of the errors induced by multipath is handled by a Rao-Blackwellized particle filter (RBPF). The RBPF estimates the indicator process jointly with the navigation states and multipath biases. The interest of this approach is its ability to integrate a priori constraints about the propagation environment. The detection is improved by using information from near future GPS measurements at the particle filter (PF) sampling step. A computationally modest delayed sampling is developed, which is based on a minimal duration assumption for multipath effects. Finally, the standard PF resampling stage is modified to include an hypothesis test based decision step

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Mechanisms for the generation and regulation of sequential behaviour

Author: Cooper Richard P.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2003
Field of study

A critical aspect of much human behaviour is the generation and regulation of sequential activities. Such behaviour is seen in both naturalistic settings such as routine action and language production and laboratory tasks such as serial recall and many reaction time experiments. There are a variety of computational mechanisms that may support the generation and regulation of sequential behaviours, ranging from those underlying Turing machines to those employed by recurrent connectionist networks. This paper surveys a range of such mechanisms, together with a range of empirical phenomena related to human sequential behaviour. It is argued that the empirical phenomena pose difficulties for most sequencing mechanisms, but that converging evidence from behavioural flexibility, error data arising from when the system is stressed or when it is damaged following brain injury, and between-trial effects in reaction time tasks, point to a hybrid symbolic activation-based mechanism for the generation and regulation of sequential behaviour. Some implications of this view for the nature of mental computation are highlighted

Crossref

Birkbeck Institutional Research Online

Synthesizing SystemC Code from Delay Hybrid CSP

Author: A Bellen
A Deshpande
A Girard
D Angeli
D Harel
E Ahmad
E Lee
G Pola
G Pola
G Yan
L Zou
M Anand
M Chen
M Fränzle
N Zhan
R Alur
TA Henzinger
Y Hur
Z Huang
Publication venue
Publication date: 19/09/2017
Field of study

Delay is omnipresent in modern control systems, which can prompt oscillations and may cause deterioration of control performance, invalidate both stability and safety properties. This implies that safety or stability certificates obtained on idealized, delay-free models of systems prone to delayed coupling may be erratic, and further the incorrectness of the executable code generated from these models. However, automated methods for system verification and code generation that ought to address models of system dynamics reflecting delays have not been paid enough attention yet in the computer science community. In our previous work, on one hand, we investigated the verification of delay dynamical and hybrid systems; on the other hand, we also addressed how to synthesize SystemC code from a verified hybrid system modelled by Hybrid CSP (HCSP) without delay. In this paper, we give a first attempt to synthesize SystemC code from a verified delay hybrid system modelled by Delay HCSP (dHCSP), which is an extension of HCSP by replacing ordinary differential equations (ODEs) with delay differential equations (DDEs). We implement a tool to support the automatic translation from dHCSP to SystemC

arXiv.org e-Print Archive

Crossref

On detecting jumps in time series: Nonparametric setting

Author: Pawlak Mirek
Rafajlowicz Ewaryst
Steland Ansgar
Publication venue
Publication date
Field of study

Motivated by applications in statistical quality control and signal analysis, we propose a sequential detection procedure which is designed to detect structural changes, in particular jumps, immediately. This is achieved by modifying a median filter by appropriate kernel-based jump preserving weights (shrinking) and a clipping mechanism. We aim at both robustness and immediate detection of jumps. Whereas the median approach ensures robust smooths when there are no jumps, the modification ensure immediate reaction to jumps. For general clipping location estimators we show that the procedure can detect jumps of certain heights with no delay, even when applied to Banach space valued data. For shrinking medians we provide an asymptotic upper bound for the normed delay. The finite sample properties are studied by simulations which show that our proposal outperforms classical procedures in certain respects. --Edge Detection,Nonparametric Estimation,Quality Control,Statistical Process Control

Research Papers in Economics