665 research outputs found
TBQ(σ): Improving efficiency of trace utilization for off-policy reinforcement learning
© 2019 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org) Ail rights reserved. OfF-policy reinforcement learning with eligibility traces faces is challenging because of the discrepancy between target policy and behavior policy One common approach is to measure the difference between two policies in a probabilistic way, such as importance sampling and tree-backup However, existing off-policy learning methods based on probabilistic policy measurement are inefficient when utilizing traces under a greedy target policy, which is ineffective for control problems The traces are cut immediately when a non-greedy action is taken, which may lose the advantage of eligibility traccs and slow down the learning process Alternatively, some non-probabdistic measurement methods such as General Q(A) and Naive Q(A) never cut traces, but face convergence problems in practice To address the above issues, this paper introduces a new method named TBQ(a) which effectively unifies the tree-backup algorithm and Naive Q(A) By introducing a new parameter a to illustrate the degree of utilizing traces, TBQ(<t) creates an effective integration of TB(A) and Naive Q(A) and continuous role shift between them The contraction property ofTB(cr) is theoretically analyzed for both policy evaluation and control settings We also derive the online version of TBQ(cr) and give the convergence proof We empirically show that, for e (0, l) in e-greedy policies, there exists some degree of utilizing traces for A 6 (0,1), which can improve the efficiency in trace utilization for off-policy reinforcement learning, to both accelerate the learning process and improve the performance
Self-reflective deep reinforcement learning
© 2016 IEEE. In this paper we present a new concept of self-reflection learning to support a deep reinforcement learning model. The self-reflective process occurs offline between episodes to help the agent to learn to navigate towards a goal location and boost its online performance. In particular, a so far optimal experience is recalled and compared with other similar but suboptimal episodes to reemphasize worthy decisions and deemphasize unworthy ones using eligibility and learning traces. At the same time, relatively bad experience is forgotten to remove its confusing effect. We set up a layer-wise deep actor-critic architecture and apply the self-reflection process to help to train it. We show that the self-reflective model seems to work well and initial experimental result on real robot shows that the agent accomplished good success rate in reaching a goal location
An investigation into hazard-centric analysis of complex autonomous systems
This thesis proposes a hypothesis that a conventional, and essentially manual, HAZOP process can be
improved with information obtained with model-based dynamic simulation, using a Monte Carlo
approach, to update a Bayesian Belief model representing the expected relations between cause and
effects – and thereby produce an enhanced HAZOP. The work considers how the expertise of a
hazard and operability study team might be augmented with access to behavioural models,
simulations and belief inference models. This incorporates models of dynamically complex system
behaviour, considering where these might contribute to the expertise of a hazard and operability study
team, and how these might bolster trust in the portrayal of system behaviour. With a questionnaire
containing behavioural outputs from a representative systems model, responses were collected from a
group with relevant domain expertise. From this it is argued that the quality of analysis is dependent
upon the experience and expertise of the participants but this might be artificially augmented using
probabilistic data derived from a system dynamics model. Consequently, Monte Carlo simulations of
an improved exemplar system dynamics model are used to condition a behavioural inference model
and also to generate measures of emergence associated with the deviation parameter used in the study.
A Bayesian approach towards probability is adopted where particular events and combinations of
circumstances are effectively unique or hypothetical, and perhaps irreproducible in practice.
Therefore, it is shown that a Bayesian model, representing beliefs expressed in a hazard and
operability study, conditioned by the likely occurrence of flaw events causing specific deviant
behaviour from evidence observed in the system dynamical behaviour, may combine intuitive
estimates based upon experience and expertise, with quantitative statistical information representing
plausible evidence of safety constraint violation. A further behavioural measure identifies potential
emergent behaviour by way of a Lyapunov Exponent. Together these improvements enhance the
awareness of potential hazard cases
Renewable Estimation and Incremental Inference with Streaming Health Datasets
The overarching objective of my dissertation is to develop a new methodology that allows to sequentially update parameter estimates and their standard errors along with data streams. The key technical novelty pertains to the fact that the proposed estimation method, termed as renewable estimation in my dissertation, uses current data and summary statistics of historical data, but no use of any historical subject-level data. To implement renewable estimation, I utilize the powerful Lambda architecture in Apache Spark to design a new paradigm that includes an inference layer in addition to the existing speed layer. This expanded architecture is named as the Rho architecture, which accommodates inference-related statistics and to facilitate sequential updating of quantities involved in estimation and inference.
The first project focuses on the renewable estimation in the setting of generalized linear models (RenewGLM) in which I develop a new sequential updating algorithm to calculate numerical solutions of parameter estimates and related inferential quantities. The proposed procedure aggregates both score functions and information matrices over streaming data batches through some summary statistics. I show that the resulting estimation is asymptotically equivalent to the maximum likelihood estimation (MLE) obtained with the entire data once. I demonstrate this new methodology on the analysis of the National Automotive Sampling System-Crashworthiness Data System (NASS CDS) to evaluate the effectiveness of graduated driver licensing (GDL) in the USA.
The second project focuses on a substantial extension of the first project to analyze streaming datasets with correlated outcomes, such as clustered data and longitudinal data. I establish the theoretical guarantees for the proposed renewable quadratic inference function (RenewQIF) for dependent outcomes and implement it within the Rho architecture. Furthermore, I relax the homogeneous assumption in the first project and consider regime-switching regression models with a structural change-point. I propose a real-time hypothesis testing procedure based on a goodness-of-fit test statistic that is shown to achieve both proper type I error control and desirable change-point detection power.
The third project concerns data streams that involve both inter-data batch correlation and dynamic heterogeneity, arising typically from various types of electronic health records (EHR) and mobile health data. This project is built in the framework of state space models in which the observed data stream is driven by a latent state process that may incorporate trend, seasonal, or time-varying covariate effects. In this setting, calculating the online MLE is challenge due to the involvement of high-dimensional integrals and complex covariance structures. In this project, I develop a Kalman filter to facilitate a multivariate online regression analysis (MORA) in the context of linear state space mixed models. MORA enables to renew both point estimates and standard errors of the fixed effects. We also apply the MORA method to analyze an EHR data example, adjusting for some heterogeneous batch-specific effects.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163085/1/luolsph_1.pd
First Time Measurements of Polarization Observables for the Charged Cascade Hyperon in Photoproduction
The parity violating weak decay of hyperons offers a valuable means of measuring their polarization, providing insight into the production of strange quarks and the matter they compose. Jefferson Lab’s CLAS collaboration has utilized this property of hyperons, publishing the most precise polarization measurements for the Λ and Σ in both photoproduction and electroproduction to date. In contrast, cascades, which contain two strange quarks, can only be produced through indirect processes and as a result, exhibit low cross sections thus remaining experimentally elusive.
At present, there are two aspects in cascade physics where progress has been minimal: characterizing their production mechanism, which lacks theoretical and experimental developments, and observation of the numerous excited cascade resonances that are required to exist by flavor SU(3)F symmetry. However, CLAS data were collected in 2008 with a luminosity of 68 pb−1 using a circularly polarized photon beam with energies up to 5.45 GeV, incident on a liquid hydrogen target. This dataset is, at present, the world’s largest for meson photoproduction in its energy range and provides a unique opportunity to study cascade physics with polarization measurements.
The current analysis explores hyperon production through the γp → K+K+Ξ− reaction by providing the first ever determination of spin observables P, Cx and Cz for the cascade. Three of our primary goals are to test the only cascade photoproduction model in existence, examine the underlying processes that give rise to hyperon polarization, and to stimulate future theoretical developments while providing constraints for their parameters. Our research is part of a broader program to understand the production of strange quarks and hadrons with strangeness. The remainder of this document discusses the motivation behind such research, the method of data collection, details of their analysis, and the significance of our results
- …