Search CORE

92 research outputs found

Quantum-enhanced reinforcement learning for finite-episode games with discrete state spaces

Author: Compostella Gabriele
Neukart Florian
Seidel Christian
Von Dollen David
Publication venue
Publication date: 14/09/2017
Field of study

Quantum annealing algorithms belong to the class of metaheuristic tools, applicable for solving binary optimization problems. Hardware implementations of quantum annealing, such as the quantum annealing machines produced by D-Wave Systems, have been subject to multiple analyses in research, with the aim of characterizing the technology's usefulness for optimization and sampling tasks. Here, we present a way to partially embed both Monte Carlo policy iteration for finding an optimal policy on random observations, as well as how to embed (n) sub-optimal state-value functions for approximating an improved state-value function given a policy for finite horizon games with discrete state spaces on a D-Wave 2000Q quantum processing unit (QPU). We explain how both problems can be expressed as a quadratic unconstrained binary optimization (QUBO) problem, and show that quantum-enhanced Monte Carlo policy evaluation allows for finding equivalent or better state-value functions for a given policy with the same number episodes compared to a purely classical Monte Carlo algorithm. Additionally, we describe a quantum-classical policy learning algorithm. Our first and foremost aim is to explain how to represent and solve parts of these problems with the help of the QPU, and not to prove supremacy over every existing classical policy evaluation algorithm.Comment: 17 pages, 7 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Frontiers - Publisher Connector

Recommended from our members

Between MDPs and Semi-MDPs:Learning, Planning, and Representing Knowledge at Multiple Temporal Scales

Author: Sutton Richard S.
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/1998
Field of study

Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key challenges for AI. In this paper we develop an approach to these problems based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action to include options|whole courses of behavior that may be temporally extended, stochastic, and contingent on events. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Options may be given a priori, learned by experience, or both. They may be used interchangeably with actions in a variety of planning and learning methods. The theory of semi-Markov decision processes (SMDPs) can be applied to model the consequences of options and as a basis for planning and learning methods using them. In this paper we develop these connections, building on prior work by Bradtke and Du (1995), Parr (in prep.) and others. Our main novel results concern the interface between the MDP and SMDP levels of analysis. We show how a set of options can be altered by changing only their termination conditions to improve over SMDP methods with no additional cost. We also introduce intra-option temporal-dierence methods that are able to learn from fragments of an option\u27s execution. Finally, we propose a notion of subgoal which can be used to improve the options themselves. Overall, we argue that options and their models provide hitherto missing aspects of a powerful, clear, and expressive framework for representing and organizing knowledge

ScholarWorks@UMass Amherst

Unifying Consciousness and Time to Enhance Artificial Intelligence

Author: Samarawickrama Mahendra
Publication venue
Publication date: 10/01/2023
Field of study

Consciousness is a sequential process of awareness which can focus on one piece of information at a time. This process of awareness experiences causation which underpins the notion of time while it interplays with matter and energy, forming reality. The study of Consciousness, time and reality is complex and evolving fast in many fields, including metaphysics and fundamental physics. Reality composes patterns in human Consciousness in response to the regularities in nature. These regularities could be physical (e.g., astronomical, environmental), biological, chemical, mental, social, etc. The patterns that emerged in Consciousness were correlated to the environment, life and social behaviours followed by constructed frameworks, systems and structures. The complex constructs evolved as cultures, customs, norms and values, which created a diverse society. In the evolution of responsible AI, it is important to be attuned to the evolved cultural, ethical and moral values through Consciousness. This requires the advocated design of self-learning AI aware of time perception and human ethics.Comment: This discussion paper has been submitted to Cognitive Neuroscience of Routledge, part of the Taylor & Francis publication

arXiv.org e-Print Archive

Algoritma Fuzzy dan Reinforcement Learning dalam Pengambilan Keputusan

Author: TIA DIANTI HAJIZAH OKTAVIA NINGSIH
Publication venue: Universitas Telkom
Publication date: 02/11/2017
Field of study

Kenaikan jumlah pengguna internet sebagai media komunikasi terus meningkat, memungkinkan terjadinya anomali yang dapat mengganggu lalu lintas jaringan. Anomali memiliki potensi terhadap suatu serangan atau ancaman pada sebuah komputer ataupun server. Banyak macam-macam tipe serangan dalam sebuah jaringan internet seperti DoS (Denial of Service), DDoS (Distributed Denial of Service), flash crowd, dan sebagainya. Pengaruh negatif anomali merugikan banyak pihak, baik dari sisi user ataupun pihak penyedia layanan internet. Berdasarkan masalah tersebut, dibuat sebuah sistem yang dapat melakukan proses learning untuk menangani penuruan jumlah anomali secara bertahap. Dalam proses penurunan anomali trafik, menggunakan satu teknik learning yaitu Reinforcement Learning (RL). RL adalah suatu pembelajaran yang dilakukan oleh agent (Learner) dengan cara berinteraksi terhadap lingkungan yang masih asing, tujuannya untuk mengambil keputusan secara langsung di lingkungan tersebut. Agent berinteraksi dengan cara memilih dan mengeksekusi sebuah aksi. Lingkungan akan memberikan sebuah state baru dan juga respon berupa feedback dalam bentuk positif atau negatif reward. Signal reward diberikan sesuai dengan hasil evaluasi pada kualitas perfomansi aksinya. Proses learning terjadi saat agen memilih aksi berupa presentasi jumlah penurunan anomali, karena penurunan anomali tidak terjadi secara drastis, melainkan dengan tahapan RL. Algoritma fuzzy digunakan untuk proses penentuan jumlah service yang akan dikontrol sebelum masuk ke proses RL. Penilitian tugas akhir ini menghasilkan sebuah sistem yang dapat melakukan proses learning terhadap penuruan laju anomali. Setiap 100 trafik yang masuk akan dievaluasi, dan berpengaruh terhadap Q-value masing-masing service. Dengan melakukan proses testing, yang telah disesuaikan dengan hasil training menghasilkan penuruan anomali sebesar 64%. Dominasi anomali berada pada service ftp-data dan telnet. Kata Kunci: Anomali trafik, Reinforcement Learning, reward, action, Algoritma Fuzz

Open Library