128,047 research outputs found
Multi-criteria Hardware Trojan Detection: A Reinforcement Learning Approach
Hardware Trojans (HTs) are undesired design or manufacturing modifications
that can severely alter the security and functionality of digital integrated
circuits. HTs can be inserted according to various design criteria, e.g., nets
switching activity, observability, controllability, etc. However, to our
knowledge, most HT detection methods are only based on a single criterion,
i.e., nets switching activity. This paper proposes a multi-criteria
reinforcement learning (RL) HT detection tool that features a tunable reward
function for different HT detection scenarios. The tool allows for exploring
existing detection strategies and can adapt new detection scenarios with
minimal effort. We also propose a generic methodology for comparing HT
detection methods fairly. Our preliminary results show an average of 84.2%
successful HT detection in ISCAS-85 benchmar
Recommended from our members
Multi-criteria average reward reinforcement learning
Reinforcement learning (RL) is the study of systems that learn from interaction with their environment. The current framework of Reinforcement Learning is based on receiving scalar rewards, which the agent aims to maximize. But in many real world situations, tradeoffs must be made among multiple objectives. This necessitates the use of vector representation of values and rewards and the use of weights to represent the importance of different objectives.
In this thesis, we consider the problem of learning in the presence of time-varying preferences among multiple objectives. Learning a new policy for every possible weight vector is wasteful. Instead we propose a method that allows us store a finite number of policies, choose an appropriate policy for any weight vector and improve upon it. The idea is that though there can be infinitely many weight vectors, a lot of them will have the same optimal policy. We prove this empirically in two domains: a version of the Buridan's ass problem and network routing. We show that while learning is required for the first few weight vectors, later the agent would settle for an already learnt policy and thus would converge very quickly
Multi-agent Learning For Game-theoretical Problems
Multi-agent systems are prevalent in the real world in various domains. In many multi-agent systems, interaction among agents is inevitable, and cooperation in some form is needed among agents to deal with the task at hand. We model the type of multi-agent systems where autonomous agents inhabit an environment with no global control or global knowledge, decentralized in the true sense. In particular, we consider game-theoretical problems such as the hedonic coalition formation games, matching problems, and Cournot games. We propose novel decentralized learning and multi-agent reinforcement learning approaches to train agents in learning behaviors and adapting to the environments. We use game-theoretic evaluation criteria such as optimality, stability, and resulting equilibria
Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games
Many real-world multi-agent interactions consider multiple distinct criteria,
i.e. the payoffs are multi-objective in nature. However, the same
multi-objective payoff vector may lead to different utilities for each
participant. Therefore, it is essential for an agent to learn about the
behaviour of other agents in the system. In this work, we present the first
study of the effects of such opponent modelling on multi-objective multi-agent
interactions with non-linear utilities. Specifically, we consider two-player
multi-objective normal form games with non-linear utility functions under the
scalarised expected returns optimisation criterion. We contribute novel
actor-critic and policy gradient formulations to allow reinforcement learning
of mixed strategies in this setting, along with extensions that incorporate
opponent policy reconstruction and learning with opponent learning awareness
(i.e., learning while considering the impact of one's policy when anticipating
the opponent's learning step). Empirical results in five different MONFGs
demonstrate that opponent learning awareness and modelling can drastically
alter the learning dynamics in this setting. When equilibria are present,
opponent modelling can confer significant benefits on agents that implement it.
When there are no Nash equilibria, opponent learning awareness and modelling
allows agents to still converge to meaningful solutions that approximate
equilibria.Comment: Under review since 14 November 202
Enhancing Exploration and Safety in Deep Reinforcement Learning
A Deep Reinforcement Learning (DRL) agent tries to learn a policy maximizing a long-term objective by trials and errors in large state spaces. However, this learning paradigm requires a non-trivial amount of interactions in the environment to achieve good performance. Moreover, critical applications, such as robotics, typically involve safety criteria to consider while designing novel DRL solutions. Hence, devising safe learning approaches with efficient exploration is crucial to avoid getting stuck in local optima, failing to learn properly, or causing damages to the surrounding environment. This thesis focuses on developing Deep Reinforcement Learning algorithms to foster efficient exploration and safer behaviors in simulation and real domains of interest, ranging from robotics to multi-agent systems. To this end, we rely both on standard benchmarks, such as SafetyGym, and robotic tasks widely adopted in the literature (e.g., manipulation, navigation). This variety of problems is crucial to assess the statistical significance of our empirical studies and the generalization skills of our approaches. We initially benchmark the sample efficiency versus performance trade-off between value-based and policy-gradient algorithms. This part highlights the benefits of using non-standard simulation environments (i.e., Unity), which also facilitates the development of further optimization for DRL. We also discuss the limitations of standard evaluation metrics (e.g., return) in characterizing the actual behaviors of a policy, proposing the use of Formal Verification (FV) as a practical methodology to evaluate behaviors over desired specifications. The second part introduces Evolutionary Algorithms (EAs) as a gradient-free complimentary optimization strategy. In detail, we combine population-based and gradient-based DRL to diversify exploration and improve performance both in single and multi-agent applications. For the latter, we discuss how prior Multi-Agent (Deep) Reinforcement Learning (MARL) approaches hinder exploration, proposing an architecture that favors cooperation without affecting exploration
Simulação de um mercado de agentes racionais numa perspectiva de inteligência colectiva
Disertação de mestrado integrado em Engenharia Electrónica Industrial e ComputadoresNeste trabalho foi implementado um modelo utilizando o paradigma de programação multi-agente. O modelo multi-agente que será implementado é o proposto por Garrido (2010c). Estabeleceram-se dois critérios com o objectivo de avaliar o comportamento dos agentes no contexto da sociedade colectiva artificial. O primeiro critério verifica-se todos os agentes possuem um nível de riqueza igual ou superior a um valor mínimo de subsistência. O segundo critério verifica-se o somatório total da riqueza cresce ao longo do tempo. Neste contexto entende-se por riqueza a quantidade de produtos de consumo na posse dos agentes.
Os objectivos deste trabalho passam por averiguar em que medida são cumpridos os critérios de avaliação comportamental dos agentes no modelo com a funcionalidade racionalidade acrescida.
O algoritmo escolhido para dotar os agentes de racionalidade é o single Q do
Reinforcement Learning. Escolheu-se um algoritmo de Reinforcement Learning porque o Reinforcement Learning é conhecido por aproximar a aprendizagem humana à computacional.
Estudou-se o modelo desprovido de racionalidade e com racionalidade. No que diz respeito à racionalidade averiguou-se a influência da exploração do ambiente, o modo como os agentes adquirem a informação e o modo como descontam o seu futuro.
Conclui-se que o algoritmo single Q-learning nao garante o cumprimento dos critérios.In this work one describes the implementation of a model based on a multi-agent
system (MAS). The multi-agent model that will be implemented was proposed
by Garrido (2010c). The implemented model has two essential criteria of agent's
behaviour: | The system has the basic objective of making available to its
members a level of wealth exceeding a subsistence minimum. It has also the
objective of increasing the collective wealth over the time.
The main objective of this work was endow agents with rationality to the agents
and investigate to what extent this in
uences the satisfaction of behaviour's criteria
of the model.
The algorithm chosen to endow the agents with rationality is the single Q of
Reinforcement Learning. Reinforcement Learning algorithm was chosen because
Reinforcement Learning is known to approximate human learning.
Two platforms were used to implement the model: | LSD and ScicosLab.
Both were evaluated and ScicosLab was preferred. The model devoid of rationality
and the model with rationality were studied. With regard to rationality, the
in
uence of the exploitation of the environment and the in
uence of learning rate
and discount factor of agents, were tested.
We conclude that the Reinforcement Learning Q-learning algorithm does not
guarantee compliance with the criteria
- …