Search CORE

1,180 research outputs found

A Survey of Monte Carlo Tree Search Methods

Author: Browne Cameron B
Colton Simon
Cowling Peter I
Lucas Simon M
Perez Diego
Powley Edward
Rohlfshagen Philipp
Samothrakis Spyridon
Tavener Stephen
Whitehouse Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

University of Essex Research Repository

CiteSeerX

Maastricht University Research Portal

An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning

Author: Matsutani Hiroki
Tsukada Mineto
Watanabe Hirohisa
Publication venue
Publication date: 23/03/2021
Field of study

DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNs require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method but uses OS-ELM (Online Sequential Extreme Learning Machine) based training algorithm. In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable. The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate that the proposed algorithm and its FPGA implementation complete a CartPole-v0 task 29.77x and 89.40x faster than a conventional DQN-based approach when the number of hidden-layer nodes is 64

arXiv.org e-Print Archive

Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives

Author: Ai Yunfeng
Chen Long
Deng Peng
Hu Xuemin
Li Bai
Li Lingxi
Li Yuchen
Teng Siyu
Xuanyuan Zhe
Yang Dongsheng
Zhu Fenghua
Publication venue
Publication date: 29/03/2023
Field of study

Thanks to the augmented convenience, safety advantages, and potential commercial value, Intelligent vehicles (IVs) have attracted wide attention throughout the world. Although a few autonomous driving unicorns assert that IVs will be commercially deployable by 2025, their implementation is still restricted to small-scale validation due to various issues, among which precise computation of control commands or trajectories by planning methods remains a prerequisite for IVs. This paper aims to review state-of-the-art planning methods, including pipeline planning and end-to-end planning methods. In terms of pipeline methods, a survey of selecting algorithms is provided along with a discussion of the expansion and optimization mechanisms, whereas in end-to-end methods, the training approaches and verification scenarios of driving tasks are points of concern. Experimental platforms are reviewed to facilitate readers in selecting suitable training and validation methods. Finally, the current challenges and future directions are discussed. The side-by-side comparison presented in this survey not only helps to gain insights into the strengths and limitations of the reviewed methods but also assists with system-level design choices.Comment: 20 pages, 14 figures and 5 table

arXiv.org e-Print Archive

Cyber Deception for Critical Infrastructure Resiliency

Author: Al Amin Md Ali Reza
Publication venue: ODU Digital Commons
Publication date: 01/08/2022
Field of study

The high connectivity of modern cyber networks and devices has brought many improvements to the functionality and efficiency of networked systems. Unfortunately, these benefits have come with many new entry points for attackers, making systems much more vulnerable to intrusions. Thus, it is critically important to protect cyber infrastructure against cyber attacks. The static nature of cyber infrastructure leads to adversaries performing reconnaissance activities and identifying potential threats. Threats related to software vulnerabilities can be mitigated upon discovering a vulnerability and-, developing and releasing a patch to remove the vulnerability. Unfortunately, the period between discovering a vulnerability and applying a patch is long, often lasting five months or more. These delays pose significant risks to the organization while many cyber networks are operational. This concern necessitates the development of an active defense system capable of thwarting cyber reconnaissance missions and mitigating the progression of the attacker through the network. Thus, my research investigates how to develop an efficient defense system to address these challenges. First, we proposed the framework to show how the defender can use the network of decoys along with the real network to introduce mistrust. However, another research problem, the defender’s choice of whether to save resources or spend more (number of decoys) resources in a resource-constrained system, needs to be addressed. We developed a Dynamic Deception System (DDS) that can assess various attacker types based on the attacker’s knowledge, aggression, and stealthiness level to decide whether the defender should spend or save resources. In our DDS, we leveraged Software Defined Networking (SDN) to differentiate the malicious traffic from the benign traffic to deter the cyber reconnaissance mission and redirect malicious traffic to the deception server. Experiments conducted on the prototype implementation of our DDS confirmed that the defender could decide whether to spend or save resources based on the attacker types and thwarted cyber reconnaissance mission. Next, we addressed the challenge of efficiently placing network decoys by predicting the most likely attack path in Multi-Stage Attacks (MSAs). MSAs are cyber security threats where the attack campaign is performed through several attack stages and adversarial lateral movement is one of the critical stages. Adversaries can laterally move into the network without raising an alert. To prevent lateral movement, we proposed an approach that combines reactive (graph analysis) and proactive (cyber deception technology) defense. The proposed approach is realized through two phases. The first phase predicts the most likely attack path based on Intrusion Detection System (IDS) alerts and network trace. The second phase determines the optimal deployment of decoy nodes along the predicted path. We employ transition probabilities in a Hidden Markov Model to predict the path. In the second phase, we utilize the predicted attack path to deploy decoy nodes. The evaluation results show that our approach can predict the most likely attack paths and thwart adversarial lateral movement

Old Dominion University

Trusted Artificial Intelligence in Manufacturing; Trusted Artificial Intelligence in Manufacturing

Author
Publication venue: 'Now Publishers'
Publication date: 28/01/2022
Field of study

The successful deployment of AI solutions in manufacturing environments hinges on their security, safety and reliability which becomes more challenging in settings where multiple AI systems (e.g., industrial robots, robotic cells, Deep Neural Networks (DNNs)) interact as atomic systems and with humans. To guarantee the safe and reliable operation of AI systems in the shopfloor, there is a need to address many challenges in the scope of complex, heterogeneous, dynamic and unpredictable environments. Specifically, data reliability, human machine interaction, security, transparency and explainability challenges need to be addressed at the same time. Recent advances in AI research (e.g., in deep neural networks security and explainable AI (XAI) systems), coupled with novel research outcomes in the formal specification and verification of AI systems provide a sound basis for safe and reliable AI deployments in production lines. Moreover, the legal and regulatory dimension of safe and reliable AI solutions in production lines must be considered as well. To address some of the above listed challenges, fifteen European Organizations collaborate in the scope of the STAR project, a research initiative funded by the European Commission in the scope of its H2020 program (Grant Agreement Number: 956573). STAR researches, develops, and validates novel technologies that enable AI systems to acquire knowledge in order to take timely and safe decisions in dynamic and unpredictable environments. Moreover, the project researches and delivers approaches that enable AI systems to confront sophisticated adversaries and to remain robust against security attacks. This book is co-authored by the STAR consortium members and provides a review of technologies, techniques and systems for trusted, ethical, and secure AI in manufacturing. The different chapters of the book cover systems and technologies for industrial data reliability, responsible and transparent artificial intelligence systems, human centered manufacturing systems such as human-centred digital twins, cyber-defence in AI systems, simulated reality systems, human robot collaboration systems, as well as automated mobile robots for manufacturing environments. A variety of cutting-edge AI technologies are employed by these systems including deep neural networks, reinforcement learning systems, and explainable artificial intelligence systems. Furthermore, relevant standards and applicable regulations are discussed. Beyond reviewing state of the art standards and technologies, the book illustrates how the STAR research goes beyond the state of the art, towards enabling and showcasing human-centred technologies in production lines. Emphasis is put on dynamic human in the loop scenarios, where ethical, transparent, and trusted AI systems co-exist with human workers. The book is made available as an open access publication, which could make it broadly and freely available to the AI and smart manufacturing communities

Directory of Open Access Books (DOAB)

On monte carlo tree search and reinforcement learning

Author: Brank Ster
Samothrakis Spyridon
Tom Vodopivec
Publication venue: 'AI Access Foundation'
Publication date: 20/12/2017
Field of study

Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread adoption within the games community. Its links to traditional reinforcement learning (RL) methods have been outlined in the past; however, the use of RL techniques within tree search has not been thoroughly studied yet. In this paper we re-examine in depth this close relation between the two fields; our goal is to improve the cross-awareness between the two communities. We show that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new algorithms, for which the traditional MCTS is only one of the variants. We confirm that planning methods inspired by RL in conjunction with online search demonstrate encouraging results on several classic board games and in arcade video game competitions, where our algorithm recently ranked first. Our study promotes a unified view of learning, planning, and search

University of Essex Research Repository

Crossref

Robot-Enabled Construction Assembly with Automated Sequence Planning based on ChatGPT: RoboGPT

Author: Du Jing
Ye Yang
You Hengxu
Zhou Tianyu
Zhu Qi
Publication venue
Publication date: 21/04/2023
Field of study

Robot-based assembly in construction has emerged as a promising solution to address numerous challenges such as increasing costs, labor shortages, and the demand for safe and efficient construction processes. One of the main obstacles in realizing the full potential of these robotic systems is the need for effective and efficient sequence planning for construction tasks. Current approaches, including mathematical and heuristic techniques or machine learning methods, face limitations in their adaptability and scalability to dynamic construction environments. To expand the ability of the current robot system in sequential understanding, this paper introduces RoboGPT, a novel system that leverages the advanced reasoning capabilities of ChatGPT, a large language model, for automated sequence planning in robot-based assembly applied to construction tasks. The proposed system adapts ChatGPT for construction sequence planning and demonstrate its feasibility and effectiveness through experimental evaluation including Two case studies and 80 trials about real construction tasks. The results show that RoboGPT-driven robots can handle complex construction operations and adapt to changes on the fly. This paper contributes to the ongoing efforts to enhance the capabilities and performance of robot-based assembly systems in the construction industry, and it paves the way for further integration of large language model technologies in the field of construction robotics.Comment: 14 pages, 20 figures, submitted to IEEE Acces

arXiv.org e-Print Archive