92 research outputs found
Quantum-enhanced reinforcement learning for finite-episode games with discrete state spaces
Quantum annealing algorithms belong to the class of metaheuristic tools,
applicable for solving binary optimization problems. Hardware implementations
of quantum annealing, such as the quantum annealing machines produced by D-Wave
Systems, have been subject to multiple analyses in research, with the aim of
characterizing the technology's usefulness for optimization and sampling tasks.
Here, we present a way to partially embed both Monte Carlo policy iteration for
finding an optimal policy on random observations, as well as how to embed (n)
sub-optimal state-value functions for approximating an improved state-value
function given a policy for finite horizon games with discrete state spaces on
a D-Wave 2000Q quantum processing unit (QPU). We explain how both problems can
be expressed as a quadratic unconstrained binary optimization (QUBO) problem,
and show that quantum-enhanced Monte Carlo policy evaluation allows for finding
equivalent or better state-value functions for a given policy with the same
number episodes compared to a purely classical Monte Carlo algorithm.
Additionally, we describe a quantum-classical policy learning algorithm. Our
first and foremost aim is to explain how to represent and solve parts of these
problems with the help of the QPU, and not to prove supremacy over every
existing classical policy evaluation algorithm.Comment: 17 pages, 7 figure
Recommended from our members
Between MDPs and Semi-MDPs:Learning, Planning, and Representing Knowledge at Multiple Temporal Scales
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key challenges for AI. In this paper we develop an approach to these problems based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action to include options|whole courses of behavior that may be temporally extended, stochastic, and contingent on events. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Options may be given a priori, learned by experience, or both. They may be used interchangeably with actions in a variety of planning and learning methods. The theory of semi-Markov decision processes (SMDPs) can be applied to model the consequences of options and as a basis for planning and learning methods using them. In this paper we develop these connections, building on prior work by Bradtke and Du (1995), Parr (in prep.) and others. Our main novel results concern the interface between the MDP and SMDP levels of analysis. We show how a set of options can be altered by changing only their termination conditions to improve over SMDP methods with no additional cost. We also introduce intra-option temporal-dierence methods that are able to learn from fragments of an option\u27s execution. Finally, we propose a notion of subgoal which can be used to improve the options themselves. Overall, we argue that options and their models provide hitherto missing aspects of a powerful, clear, and expressive framework for representing and organizing knowledge
Unifying Consciousness and Time to Enhance Artificial Intelligence
Consciousness is a sequential process of awareness which can focus on one
piece of information at a time. This process of awareness experiences causation
which underpins the notion of time while it interplays with matter and energy,
forming reality. The study of Consciousness, time and reality is complex and
evolving fast in many fields, including metaphysics and fundamental physics.
Reality composes patterns in human Consciousness in response to the
regularities in nature. These regularities could be physical (e.g.,
astronomical, environmental), biological, chemical, mental, social, etc. The
patterns that emerged in Consciousness were correlated to the environment, life
and social behaviours followed by constructed frameworks, systems and
structures. The complex constructs evolved as cultures, customs, norms and
values, which created a diverse society. In the evolution of responsible AI, it
is important to be attuned to the evolved cultural, ethical and moral values
through Consciousness. This requires the advocated design of self-learning AI
aware of time perception and human ethics.Comment: This discussion paper has been submitted to Cognitive Neuroscience of
Routledge, part of the Taylor & Francis publication
Algoritma Fuzzy dan Reinforcement Learning dalam Pengambilan Keputusan
Kenaikan jumlah pengguna internet sebagai media komunikasi terus meningkat, memungkinkan terjadinya anomali yang dapat mengganggu lalu lintas jaringan. Anomali memiliki potensi terhadap suatu serangan atau ancaman pada sebuah komputer ataupun server. Banyak macam-macam tipe serangan dalam sebuah jaringan internet seperti DoS (Denial of Service), DDoS (Distributed Denial of Service), flash crowd, dan sebagainya. Pengaruh negatif anomali merugikan banyak pihak, baik dari sisi user ataupun pihak penyedia layanan internet. Berdasarkan masalah tersebut, dibuat sebuah sistem yang dapat melakukan proses learning untuk menangani penuruan jumlah anomali secara bertahap.
Dalam proses penurunan anomali trafik, menggunakan satu teknik learning yaitu Reinforcement Learning (RL). RL adalah suatu pembelajaran yang dilakukan oleh agent (Learner) dengan cara berinteraksi terhadap lingkungan yang masih asing, tujuannya untuk mengambil keputusan secara langsung di lingkungan tersebut. Agent berinteraksi dengan cara memilih dan mengeksekusi sebuah aksi. Lingkungan akan memberikan sebuah state baru dan juga respon berupa feedback dalam bentuk positif atau negatif reward. Signal reward diberikan sesuai dengan hasil evaluasi pada kualitas perfomansi aksinya. Proses learning terjadi saat agen memilih aksi berupa presentasi jumlah penurunan anomali, karena penurunan anomali tidak terjadi secara drastis, melainkan dengan tahapan RL. Algoritma fuzzy digunakan untuk proses penentuan jumlah service yang akan dikontrol sebelum masuk ke proses RL.
Penilitian tugas akhir ini menghasilkan sebuah sistem yang dapat melakukan proses learning terhadap penuruan laju anomali. Setiap 100 trafik yang masuk akan dievaluasi, dan berpengaruh terhadap Q-value masing-masing service. Dengan melakukan proses testing, yang telah disesuaikan dengan hasil training menghasilkan penuruan anomali sebesar 64%. Dominasi anomali berada pada service ftp-data dan telnet.
Kata Kunci: Anomali trafik, Reinforcement Learning, reward, action, Algoritma Fuzz
- …