    Quantum-enhanced reinforcement learning for finite-episode games with discrete state spaces

    Quantum annealing algorithms belong to the class of metaheuristic tools, applicable for solving binary optimization problems. Hardware implementations of quantum annealing, such as the quantum annealing machines produced by D-Wave Systems, have been subject to multiple analyses in research, with the aim of characterizing the technology's usefulness for optimization and sampling tasks. Here, we present a way to partially embed both Monte Carlo policy iteration for finding an optimal policy on random observations, as well as how to embed (n) sub-optimal state-value functions for approximating an improved state-value function given a policy for finite horizon games with discrete state spaces on a D-Wave 2000Q quantum processing unit (QPU). We explain how both problems can be expressed as a quadratic unconstrained binary optimization (QUBO) problem, and show that quantum-enhanced Monte Carlo policy evaluation allows for finding equivalent or better state-value functions for a given policy with the same number episodes compared to a purely classical Monte Carlo algorithm. Additionally, we describe a quantum-classical policy learning algorithm. Our first and foremost aim is to explain how to represent and solve parts of these problems with the help of the QPU, and not to prove supremacy over every existing classical policy evaluation algorithm.Comment: 17 pages, 7 figure

    Unifying Consciousness and Time to Enhance Artificial Intelligence

    Consciousness is a sequential process of awareness which can focus on one piece of information at a time. This process of awareness experiences causation which underpins the notion of time while it interplays with matter and energy, forming reality. The study of Consciousness, time and reality is complex and evolving fast in many fields, including metaphysics and fundamental physics. Reality composes patterns in human Consciousness in response to the regularities in nature. These regularities could be physical (e.g., astronomical, environmental), biological, chemical, mental, social, etc. The patterns that emerged in Consciousness were correlated to the environment, life and social behaviours followed by constructed frameworks, systems and structures. The complex constructs evolved as cultures, customs, norms and values, which created a diverse society. In the evolution of responsible AI, it is important to be attuned to the evolved cultural, ethical and moral values through Consciousness. This requires the advocated design of self-learning AI aware of time perception and human ethics.Comment: This discussion paper has been submitted to Cognitive Neuroscience of Routledge, part of the Taylor & Francis publication

    Algoritma Fuzzy dan Reinforcement Learning dalam Pengambilan Keputusan

    Kenaikan jumlah pengguna internet sebagai media komunikasi terus meningkat, memungkinkan terjadinya anomali yang dapat mengganggu lalu lintas jaringan. Anomali memiliki potensi terhadap suatu serangan atau ancaman pada sebuah komputer ataupun server. Banyak macam-macam tipe serangan dalam sebuah jaringan internet seperti DoS (Denial of Service), DDoS (Distributed Denial of Service), flash crowd, dan sebagainya. Pengaruh negatif anomali merugikan banyak pihak, baik dari sisi user ataupun pihak penyedia layanan internet. Berdasarkan masalah tersebut, dibuat sebuah sistem yang dapat melakukan proses learning untuk menangani penuruan jumlah anomali secara bertahap. Dalam proses penurunan anomali trafik, menggunakan satu teknik learning yaitu Reinforcement Learning (RL). RL adalah suatu pembelajaran yang dilakukan oleh agent (Learner) dengan cara berinteraksi terhadap lingkungan yang masih asing, tujuannya untuk mengambil keputusan secara langsung di lingkungan tersebut. Agent berinteraksi dengan cara memilih dan mengeksekusi sebuah aksi. Lingkungan akan memberikan sebuah state baru dan juga respon berupa feedback dalam bentuk positif atau negatif reward. Signal reward diberikan sesuai dengan hasil evaluasi pada kualitas perfomansi aksinya. Proses learning terjadi saat agen memilih aksi berupa presentasi jumlah penurunan anomali, karena penurunan anomali tidak terjadi secara drastis, melainkan dengan tahapan RL. Algoritma fuzzy digunakan untuk proses penentuan jumlah service yang akan dikontrol sebelum masuk ke proses RL. Penilitian tugas akhir ini menghasilkan sebuah sistem yang dapat melakukan proses learning terhadap penuruan laju anomali. Setiap 100 trafik yang masuk akan dievaluasi, dan berpengaruh terhadap Q-value masing-masing service. Dengan melakukan proses testing, yang telah disesuaikan dengan hasil training menghasilkan penuruan anomali sebesar 64%. Dominasi anomali berada pada service ftp-data dan telnet. Kata Kunci: Anomali trafik, Reinforcement Learning, reward, action, Algoritma Fuzz
