948 research outputs found

    On utilizing weak estimators to achieve the online classification of data streams

    Get PDF
    Author's accepted version (post-print).Available from 03/09/2021.acceptedVersio

    Statistical Computations Underlying the Dynamics of Memory Updating

    Get PDF
    Psychophysical and neurophysiological studies have suggested that memory is not simply a carbon copy of our experience: Memories are modified or new memories are formed depending on the dynamic structure of our experience, and specifically, on how gradually or abruptly the world changes. We present a statistical theory of memory formation in a dynamic environment, based on a nonparametric generalization of the switching Kalman filter. We show that this theory can qualitatively account for several psychophysical and neural phenomena, and present results of a new visual memory experiment aimed at testing the theory directly. Our experimental findings suggest that humans can use temporal discontinuities in the structure of the environment to determine when to form new memory traces. The statistical perspective we offer provides a coherent account of the conditions under which new experience is integrated into an old memory versus forming a new memory, and shows that memory formation depends on inferences about the underlying structure of our experience.Templeton FoundationAlfred P. Sloan Foundation (Fellowship)National Science Foundation (U.S.) (NSF Graduate Research Fellowship)National Institute of Mental Health (U.S.) (NIH Award Number R01MH098861

    Advancing the Applicability of Reinforcement Learning to Autonomous Control

    Get PDF
    ï»żMit dateneffizientem Reinforcement Learning (RL) konnten beeindruckendeErgebnisse erzielt werden, z.B. fĂŒr die Regelung von Gasturbinen. In derPraxis erfordert die Anwendung von RL jedoch noch viel manuelle Arbeit, wasbisher RL fĂŒr die autonome Regelung untauglich erscheinen ließ. Dievorliegende Arbeit adressiert einige der verbleibenden Probleme, insbesonderein Bezug auf die ZuverlĂ€ssigkeit der Policy-Erstellung. Es werden zunĂ€chst RL-Probleme mit diskreten Zustands- und AktionsrĂ€umenbetrachtet. FĂŒr solche Probleme wird hĂ€ufig ein MDP aus BeobachtungengeschĂ€tzt, um dann auf Basis dieser MDP-SchĂ€tzung eine Policy abzuleiten. DieArbeit beschreibt, wie die SchĂ€tzer-Unsicherheit des MDP in diePolicy-Erstellung eingebracht werden kann, um mit diesem Wissen das Risikoeiner schlechten Policy aufgrund einer fehlerhaften MDP-SchĂ€tzung zuverringern. Außerdem wird so effiziente Exploration sowie Policy-Bewertungermöglicht. Anschließend wendet sich die Arbeit Problemen mit kontinuierlichenZustandsrĂ€umen zu und konzentriert sich auf auf RL-Verfahren, welche aufFitted Q-Iteration (FQI) basieren, insbesondere Neural Fitted Q-Iteration(NFQ). Zwar ist NFQ sehr dateneffizient, jedoch nicht so zuverlĂ€ssig, wie fĂŒrdie autonome Regelung nötig wĂ€re. Die Arbeit schlĂ€gt die Verwendung vonEnsembles vor, um die ZuverlĂ€ssigkeit von NFQ zu erhöhen. Es werden eine Reihevon Möglichkeiten der Ensemble-Nutzung entworfen und evaluiert. Bei allenbetrachteten RL-Problemen sorgen Ensembles fĂŒr eine zuverlĂ€ssigere Erstellungguter Policies. Im nĂ€chsten Schritt werden Möglichkeiten der Policy-Bewertung beikontinuierlichen ZustandsrĂ€umen besprochen. Die Arbeit schlĂ€gt vor, FittedPolicy Evaluation (FPE), eine Variante von FQI fĂŒr Policy Evaluation, mitanderen Regressionsverfahren und/oder anderen DatensĂ€tzen zu kombinieren, umein Maß fĂŒr die Policy-QualitĂ€t zu erhalten. Experimente zeigen, dassExtra-Tree-FPE ein realistisches QualitĂ€tsmaß fĂŒr NFQ-generierte Policies liefernkann. Schließlich kombiniert die Arbeit Ensembles und Policy-Bewertung, um mit sichĂ€ndernden RL-Problemen umzugehen. Der wesentliche Beitrag ist das EvolvingEnsemble, dessen Policy sich langsam Ă€ndert, indem alte, untaugliche Policiesentfernt und neue hinzugefĂŒgt werden. Es zeigt sich, dass das EvolvingEnsemble deutlich besser funktioniert als einfachere AnsĂ€tze.With data-efficient reinforcement learning (RL) methods impressive resultscould be achieved, e.g., in the context of gas turbine control. However, inpractice the application of RL still requires much human intervention, whichhinders the application of RL to autonomous control. This thesis addressessome of the remaining problems, particularly regarding the reliability of thepolicy generation process. The thesis first discusses RL problems with discrete state and action spaces.In that context, often an MDP is estimated from observations. It is describedhow to incorporate the estimators' uncertainties into the policy generationprocess. This information can then be used to reduce the risk of obtaining apoor policy due to flawed MDP estimates. Moreover, it is discussed how to usethe knowledge of uncertainty for efficient exploration and the assessment ofpolicy quality without requiring the policy's execution. The thesis then moves on to continuous state problems and focuses on methodsbased on fitted Q-iteration (FQI), particularly neural fitted Q-iteration(NFQ). Although NFQ has proven to be very data-efficient, it is not asreliable as required for autonomous control. The thesis proposes to useensembles to increase reliability. Several ways of ensemble usage in an NFQcontext are discussed and evaluated on a number of benchmark domains. It showsthat in all considered domains with ensembles good policies can be producedmore reliably. Next, policy assessment in continuous domains is discussed. The thesisproposes to use fitted policy evaluation (FPE), an adaptation of FQI to policyevaluation, combined with a different function approximator and/or differentdataset to obtain a measure for policy quality. Results of experiments showthat extra-tree FPE, applied to policies generated by NFQ, produces valuefunctions that can well be used to reason about the true policy quality. Finally, the thesis combines ensembles and policy assessment to derive methodsthat can deal with changing environments. The major contribution is theevolving ensemble. The policy of the evolving ensemble changes slowly as newpolicies are added and old policies removed. It turns out that the evolvingensemble approaches work considerably better than simpler approaches likesingle policies learned with recent observations or simple ensembles

    Intelligent Learning Automata-based Strategies Applied to Personalized Service Provisioning in Pervasive Environments

    Get PDF
    Doktorgradsavhandling i informasjons- og kommunikasjonsteknologi, Universitetet i Agder, Grimstad, 201

    Escapist policy rules

    Get PDF
    We study a simple, microfounded macroeconomic system in which the monetary authority employs a Taylor-type policy rule. We analyze situations in which the self-confirming equilibrium is unique and learnable according to Bullard and Mitra (2002). We explore the prospects for the use of 'large deviation' theory in this context, as employed by Sargent (1999) and Cho, Williams, and Sargent (2002). We show that our system can sometimes depart from the self-confirming equilibrium towards a non-equilibrium outcome characterized by persistently low nominal interest rates and persistently low inflation. Thus we generate events that have some of the properties of "liquidity traps" observed in the data, even though the policymaker remains committed to a Taylor-type policy rule which otherwise has desirable stabilization properties
    • 

    corecore