1,180 research outputs found
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning
DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement
learning using deep neural networks. DQNs require a large buffer and batch
processing for an experience replay and rely on a backpropagation based
iterative optimization, making them difficult to be implemented on
resource-limited edge devices. In this paper, we propose a lightweight
on-device reinforcement learning approach for low-cost FPGA devices. It
exploits a recently proposed neural-network based on-device learning approach
that does not rely on the backpropagation method but uses OS-ELM (Online
Sequential Extreme Learning Machine) based training algorithm. In addition, we
propose a combination of L2 regularization and spectral normalization for the
on-device reinforcement learning so that output values of the neural network
can be fit into a certain range and the reinforcement learning becomes stable.
The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a
low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate
that the proposed algorithm and its FPGA implementation complete a CartPole-v0
task 29.77x and 89.40x faster than a conventional DQN-based approach when the
number of hidden-layer nodes is 64
Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives
Thanks to the augmented convenience, safety advantages, and potential
commercial value, Intelligent vehicles (IVs) have attracted wide attention
throughout the world. Although a few autonomous driving unicorns assert that
IVs will be commercially deployable by 2025, their implementation is still
restricted to small-scale validation due to various issues, among which precise
computation of control commands or trajectories by planning methods remains a
prerequisite for IVs. This paper aims to review state-of-the-art planning
methods, including pipeline planning and end-to-end planning methods. In terms
of pipeline methods, a survey of selecting algorithms is provided along with a
discussion of the expansion and optimization mechanisms, whereas in end-to-end
methods, the training approaches and verification scenarios of driving tasks
are points of concern. Experimental platforms are reviewed to facilitate
readers in selecting suitable training and validation methods. Finally, the
current challenges and future directions are discussed. The side-by-side
comparison presented in this survey not only helps to gain insights into the
strengths and limitations of the reviewed methods but also assists with
system-level design choices.Comment: 20 pages, 14 figures and 5 table
Cyber Deception for Critical Infrastructure Resiliency
The high connectivity of modern cyber networks and devices has brought many improvements to the functionality and efficiency of networked systems. Unfortunately, these benefits have come with many new entry points for attackers, making systems much more vulnerable to intrusions. Thus, it is critically important to protect cyber infrastructure against cyber attacks. The static nature of cyber infrastructure leads to adversaries performing reconnaissance activities and identifying potential threats. Threats related to software vulnerabilities can be mitigated upon discovering a vulnerability and-, developing and releasing a patch to remove the vulnerability. Unfortunately, the period between discovering a vulnerability and applying a patch is long, often lasting five months or more. These delays pose significant risks to the organization while many cyber networks are operational. This concern necessitates the development of an active defense system capable of thwarting cyber reconnaissance missions and mitigating the progression of the attacker through the network. Thus, my research investigates how to develop an efficient defense system to address these challenges. First, we proposed the framework to show how the defender can use the network of decoys along with the real network to introduce mistrust. However, another research problem, the defender’s choice of whether to save resources or spend more (number of decoys) resources in a resource-constrained system, needs to be addressed. We developed a Dynamic Deception System (DDS) that can assess various attacker types based on the attacker’s knowledge, aggression, and stealthiness level to decide whether the defender should spend or save resources. In our DDS, we leveraged Software Defined Networking (SDN) to differentiate the malicious traffic from the benign traffic to deter the cyber reconnaissance mission and redirect malicious traffic to the deception server. Experiments conducted on the prototype implementation of our DDS confirmed that the defender could decide whether to spend or save resources based on the attacker types and thwarted cyber reconnaissance mission. Next, we addressed the challenge of efficiently placing network decoys by predicting the most likely attack path in Multi-Stage Attacks (MSAs). MSAs are cyber security threats where the attack campaign is performed through several attack stages and adversarial lateral movement is one of the critical stages. Adversaries can laterally move into the network without raising an alert. To prevent lateral movement, we proposed an approach that combines reactive (graph analysis) and proactive (cyber deception technology) defense. The proposed approach is realized through two phases. The first phase predicts the most likely attack path based on Intrusion Detection System (IDS) alerts and network trace. The second phase determines the optimal deployment of decoy nodes along the predicted path. We employ transition probabilities in a Hidden Markov Model to predict the path. In the second phase, we utilize the predicted attack path to deploy decoy nodes. The evaluation results show that our approach can predict the most likely attack paths and thwart adversarial lateral movement
Trusted Artificial Intelligence in Manufacturing; Trusted Artificial Intelligence in Manufacturing
The successful deployment of AI solutions in manufacturing environments hinges on their security, safety and reliability which becomes more challenging in settings where multiple AI systems (e.g., industrial robots, robotic cells, Deep Neural Networks (DNNs)) interact as atomic systems and with humans. To guarantee the safe and reliable operation of AI systems in the shopfloor, there is a need to address many challenges in the scope of complex, heterogeneous, dynamic and unpredictable environments. Specifically, data reliability, human machine interaction, security, transparency and explainability challenges need to be addressed at the same time. Recent advances in AI research (e.g., in deep neural networks security and explainable AI (XAI) systems), coupled with novel research outcomes in the formal specification and verification of AI systems provide a sound basis for safe and reliable AI deployments in production lines. Moreover, the legal and regulatory dimension of safe and reliable AI solutions in production lines must be considered as well. To address some of the above listed challenges, fifteen European Organizations collaborate in the scope of the STAR project, a research initiative funded by the European Commission in the scope of its H2020 program (Grant Agreement Number: 956573). STAR researches, develops, and validates novel technologies that enable AI systems to acquire knowledge in order to take timely and safe decisions in dynamic and unpredictable environments. Moreover, the project researches and delivers approaches that enable AI systems to confront sophisticated adversaries and to remain robust against security attacks. This book is co-authored by the STAR consortium members and provides a review of technologies, techniques and systems for trusted, ethical, and secure AI in manufacturing. The different chapters of the book cover systems and technologies for industrial data reliability, responsible and transparent artificial intelligence systems, human centered manufacturing systems such as human-centred digital twins, cyber-defence in AI systems, simulated reality systems, human robot collaboration systems, as well as automated mobile robots for manufacturing environments. A variety of cutting-edge AI technologies are employed by these systems including deep neural networks, reinforcement learning systems, and explainable artificial intelligence systems. Furthermore, relevant standards and applicable regulations are discussed. Beyond reviewing state of the art standards and technologies, the book illustrates how the STAR research goes beyond the state of the art, towards enabling and showcasing human-centred technologies in production lines. Emphasis is put on dynamic human in the loop scenarios, where ethical, transparent, and trusted AI systems co-exist with human workers. The book is made available as an open access publication, which could make it broadly and freely available to the AI and smart manufacturing communities
On monte carlo tree search and reinforcement learning
Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread
adoption within the games community. Its links to traditional reinforcement learning (RL)
methods have been outlined in the past; however, the use of RL techniques within tree search has
not been thoroughly studied yet. In this paper we re-examine in depth this close relation between
the two fields; our goal is to improve the cross-awareness between the two communities. We show
that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new
algorithms, for which the traditional MCTS is only one of the variants. We confirm that planning
methods inspired by RL in conjunction with online search demonstrate encouraging results on
several classic board games and in arcade video game competitions, where our algorithm recently
ranked first. Our study promotes a unified view of learning, planning, and search
Robot-Enabled Construction Assembly with Automated Sequence Planning based on ChatGPT: RoboGPT
Robot-based assembly in construction has emerged as a promising solution to
address numerous challenges such as increasing costs, labor shortages, and the
demand for safe and efficient construction processes. One of the main obstacles
in realizing the full potential of these robotic systems is the need for
effective and efficient sequence planning for construction tasks. Current
approaches, including mathematical and heuristic techniques or machine learning
methods, face limitations in their adaptability and scalability to dynamic
construction environments. To expand the ability of the current robot system in
sequential understanding, this paper introduces RoboGPT, a novel system that
leverages the advanced reasoning capabilities of ChatGPT, a large language
model, for automated sequence planning in robot-based assembly applied to
construction tasks. The proposed system adapts ChatGPT for construction
sequence planning and demonstrate its feasibility and effectiveness through
experimental evaluation including Two case studies and 80 trials about real
construction tasks. The results show that RoboGPT-driven robots can handle
complex construction operations and adapt to changes on the fly. This paper
contributes to the ongoing efforts to enhance the capabilities and performance
of robot-based assembly systems in the construction industry, and it paves the
way for further integration of large language model technologies in the field
of construction robotics.Comment: 14 pages, 20 figures, submitted to IEEE Acces
- …