665 research outputs found

    TBQ(σ): Improving efficiency of trace utilization for off-policy reinforcement learning

    Full text link
    © 2019 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org) Ail rights reserved. OfF-policy reinforcement learning with eligibility traces faces is challenging because of the discrepancy between target policy and behavior policy One common approach is to measure the difference between two policies in a probabilistic way, such as importance sampling and tree-backup However, existing off-policy learning methods based on probabilistic policy measurement are inefficient when utilizing traces under a greedy target policy, which is ineffective for control problems The traces are cut immediately when a non-greedy action is taken, which may lose the advantage of eligibility traccs and slow down the learning process Alternatively, some non-probabdistic measurement methods such as General Q(A) and Naive Q(A) never cut traces, but face convergence problems in practice To address the above issues, this paper introduces a new method named TBQ(a) which effectively unifies the tree-backup algorithm and Naive Q(A) By introducing a new parameter a to illustrate the degree of utilizing traces, TBQ(<t) creates an effective integration of TB(A) and Naive Q(A) and continuous role shift between them The contraction property ofTB(cr) is theoretically analyzed for both policy evaluation and control settings We also derive the online version of TBQ(cr) and give the convergence proof We empirically show that, for e (0, l) in e-greedy policies, there exists some degree of utilizing traces for A 6 (0,1), which can improve the efficiency in trace utilization for off-policy reinforcement learning, to both accelerate the learning process and improve the performance

    Self-reflective deep reinforcement learning

    Get PDF
    © 2016 IEEE. In this paper we present a new concept of self-reflection learning to support a deep reinforcement learning model. The self-reflective process occurs offline between episodes to help the agent to learn to navigate towards a goal location and boost its online performance. In particular, a so far optimal experience is recalled and compared with other similar but suboptimal episodes to reemphasize worthy decisions and deemphasize unworthy ones using eligibility and learning traces. At the same time, relatively bad experience is forgotten to remove its confusing effect. We set up a layer-wise deep actor-critic architecture and apply the self-reflection process to help to train it. We show that the self-reflective model seems to work well and initial experimental result on real robot shows that the agent accomplished good success rate in reaching a goal location

    An investigation into hazard-centric analysis of complex autonomous systems

    Get PDF
    This thesis proposes a hypothesis that a conventional, and essentially manual, HAZOP process can be improved with information obtained with model-based dynamic simulation, using a Monte Carlo approach, to update a Bayesian Belief model representing the expected relations between cause and effects – and thereby produce an enhanced HAZOP. The work considers how the expertise of a hazard and operability study team might be augmented with access to behavioural models, simulations and belief inference models. This incorporates models of dynamically complex system behaviour, considering where these might contribute to the expertise of a hazard and operability study team, and how these might bolster trust in the portrayal of system behaviour. With a questionnaire containing behavioural outputs from a representative systems model, responses were collected from a group with relevant domain expertise. From this it is argued that the quality of analysis is dependent upon the experience and expertise of the participants but this might be artificially augmented using probabilistic data derived from a system dynamics model. Consequently, Monte Carlo simulations of an improved exemplar system dynamics model are used to condition a behavioural inference model and also to generate measures of emergence associated with the deviation parameter used in the study. A Bayesian approach towards probability is adopted where particular events and combinations of circumstances are effectively unique or hypothetical, and perhaps irreproducible in practice. Therefore, it is shown that a Bayesian model, representing beliefs expressed in a hazard and operability study, conditioned by the likely occurrence of flaw events causing specific deviant behaviour from evidence observed in the system dynamical behaviour, may combine intuitive estimates based upon experience and expertise, with quantitative statistical information representing plausible evidence of safety constraint violation. A further behavioural measure identifies potential emergent behaviour by way of a Lyapunov Exponent. Together these improvements enhance the awareness of potential hazard cases

    Renewable Estimation and Incremental Inference with Streaming Health Datasets

    Full text link
    The overarching objective of my dissertation is to develop a new methodology that allows to sequentially update parameter estimates and their standard errors along with data streams. The key technical novelty pertains to the fact that the proposed estimation method, termed as renewable estimation in my dissertation, uses current data and summary statistics of historical data, but no use of any historical subject-level data. To implement renewable estimation, I utilize the powerful Lambda architecture in Apache Spark to design a new paradigm that includes an inference layer in addition to the existing speed layer. This expanded architecture is named as the Rho architecture, which accommodates inference-related statistics and to facilitate sequential updating of quantities involved in estimation and inference. The first project focuses on the renewable estimation in the setting of generalized linear models (RenewGLM) in which I develop a new sequential updating algorithm to calculate numerical solutions of parameter estimates and related inferential quantities. The proposed procedure aggregates both score functions and information matrices over streaming data batches through some summary statistics. I show that the resulting estimation is asymptotically equivalent to the maximum likelihood estimation (MLE) obtained with the entire data once. I demonstrate this new methodology on the analysis of the National Automotive Sampling System-Crashworthiness Data System (NASS CDS) to evaluate the effectiveness of graduated driver licensing (GDL) in the USA. The second project focuses on a substantial extension of the first project to analyze streaming datasets with correlated outcomes, such as clustered data and longitudinal data. I establish the theoretical guarantees for the proposed renewable quadratic inference function (RenewQIF) for dependent outcomes and implement it within the Rho architecture. Furthermore, I relax the homogeneous assumption in the first project and consider regime-switching regression models with a structural change-point. I propose a real-time hypothesis testing procedure based on a goodness-of-fit test statistic that is shown to achieve both proper type I error control and desirable change-point detection power. The third project concerns data streams that involve both inter-data batch correlation and dynamic heterogeneity, arising typically from various types of electronic health records (EHR) and mobile health data. This project is built in the framework of state space models in which the observed data stream is driven by a latent state process that may incorporate trend, seasonal, or time-varying covariate effects. In this setting, calculating the online MLE is challenge due to the involvement of high-dimensional integrals and complex covariance structures. In this project, I develop a Kalman filter to facilitate a multivariate online regression analysis (MORA) in the context of linear state space mixed models. MORA enables to renew both point estimates and standard errors of the fixed effects. We also apply the MORA method to analyze an EHR data example, adjusting for some heterogeneous batch-specific effects.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163085/1/luolsph_1.pd

    First Time Measurements of Polarization Observables for the Charged Cascade Hyperon in Photoproduction

    Get PDF
    The parity violating weak decay of hyperons offers a valuable means of measuring their polarization, providing insight into the production of strange quarks and the matter they compose. Jefferson Lab’s CLAS collaboration has utilized this property of hyperons, publishing the most precise polarization measurements for the Λ and Σ in both photoproduction and electroproduction to date. In contrast, cascades, which contain two strange quarks, can only be produced through indirect processes and as a result, exhibit low cross sections thus remaining experimentally elusive. At present, there are two aspects in cascade physics where progress has been minimal: characterizing their production mechanism, which lacks theoretical and experimental developments, and observation of the numerous excited cascade resonances that are required to exist by flavor SU(3)F symmetry. However, CLAS data were collected in 2008 with a luminosity of 68 pb−1 using a circularly polarized photon beam with energies up to 5.45 GeV, incident on a liquid hydrogen target. This dataset is, at present, the world’s largest for meson photoproduction in its energy range and provides a unique opportunity to study cascade physics with polarization measurements. The current analysis explores hyperon production through the γp → K+K+Ξ− reaction by providing the first ever determination of spin observables P, Cx and Cz for the cascade. Three of our primary goals are to test the only cascade photoproduction model in existence, examine the underlying processes that give rise to hyperon polarization, and to stimulate future theoretical developments while providing constraints for their parameters. Our research is part of a broader program to understand the production of strange quarks and hadrons with strangeness. The remainder of this document discusses the motivation behind such research, the method of data collection, details of their analysis, and the significance of our results

    ROCK PROPERTIES MODEL ANALYSIS MODEL REPORT

    Full text link
    • …
    corecore