128,047 research outputs found

    Multi-criteria Hardware Trojan Detection: A Reinforcement Learning Approach

    Full text link
    Hardware Trojans (HTs) are undesired design or manufacturing modifications that can severely alter the security and functionality of digital integrated circuits. HTs can be inserted according to various design criteria, e.g., nets switching activity, observability, controllability, etc. However, to our knowledge, most HT detection methods are only based on a single criterion, i.e., nets switching activity. This paper proposes a multi-criteria reinforcement learning (RL) HT detection tool that features a tunable reward function for different HT detection scenarios. The tool allows for exploring existing detection strategies and can adapt new detection scenarios with minimal effort. We also propose a generic methodology for comparing HT detection methods fairly. Our preliminary results show an average of 84.2% successful HT detection in ISCAS-85 benchmar

    Multi-agent Learning For Game-theoretical Problems

    Get PDF
    Multi-agent systems are prevalent in the real world in various domains. In many multi-agent systems, interaction among agents is inevitable, and cooperation in some form is needed among agents to deal with the task at hand. We model the type of multi-agent systems where autonomous agents inhabit an environment with no global control or global knowledge, decentralized in the true sense. In particular, we consider game-theoretical problems such as the hedonic coalition formation games, matching problems, and Cournot games. We propose novel decentralized learning and multi-agent reinforcement learning approaches to train agents in learning behaviors and adapting to the environments. We use game-theoretic evaluation criteria such as optimality, stability, and resulting equilibria

    Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games

    Full text link
    Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities. Specifically, we consider two-player multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion. We contribute novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting, along with extensions that incorporate opponent policy reconstruction and learning with opponent learning awareness (i.e., learning while considering the impact of one's policy when anticipating the opponent's learning step). Empirical results in five different MONFGs demonstrate that opponent learning awareness and modelling can drastically alter the learning dynamics in this setting. When equilibria are present, opponent modelling can confer significant benefits on agents that implement it. When there are no Nash equilibria, opponent learning awareness and modelling allows agents to still converge to meaningful solutions that approximate equilibria.Comment: Under review since 14 November 202

    Enhancing Exploration and Safety in Deep Reinforcement Learning

    Get PDF
    A Deep Reinforcement Learning (DRL) agent tries to learn a policy maximizing a long-term objective by trials and errors in large state spaces. However, this learning paradigm requires a non-trivial amount of interactions in the environment to achieve good performance. Moreover, critical applications, such as robotics, typically involve safety criteria to consider while designing novel DRL solutions. Hence, devising safe learning approaches with efficient exploration is crucial to avoid getting stuck in local optima, failing to learn properly, or causing damages to the surrounding environment. This thesis focuses on developing Deep Reinforcement Learning algorithms to foster efficient exploration and safer behaviors in simulation and real domains of interest, ranging from robotics to multi-agent systems. To this end, we rely both on standard benchmarks, such as SafetyGym, and robotic tasks widely adopted in the literature (e.g., manipulation, navigation). This variety of problems is crucial to assess the statistical significance of our empirical studies and the generalization skills of our approaches. We initially benchmark the sample efficiency versus performance trade-off between value-based and policy-gradient algorithms. This part highlights the benefits of using non-standard simulation environments (i.e., Unity), which also facilitates the development of further optimization for DRL. We also discuss the limitations of standard evaluation metrics (e.g., return) in characterizing the actual behaviors of a policy, proposing the use of Formal Verification (FV) as a practical methodology to evaluate behaviors over desired specifications. The second part introduces Evolutionary Algorithms (EAs) as a gradient-free complimentary optimization strategy. In detail, we combine population-based and gradient-based DRL to diversify exploration and improve performance both in single and multi-agent applications. For the latter, we discuss how prior Multi-Agent (Deep) Reinforcement Learning (MARL) approaches hinder exploration, proposing an architecture that favors cooperation without affecting exploration

    Simulação de um mercado de agentes racionais numa perspectiva de inteligência colectiva

    Get PDF
    Disertação de mestrado integrado em Engenharia Electrónica Industrial e ComputadoresNeste trabalho foi implementado um modelo utilizando o paradigma de programação multi-agente. O modelo multi-agente que será implementado é o proposto por Garrido (2010c). Estabeleceram-se dois critérios com o objectivo de avaliar o comportamento dos agentes no contexto da sociedade colectiva artificial. O primeiro critério verifica-se todos os agentes possuem um nível de riqueza igual ou superior a um valor mínimo de subsistência. O segundo critério verifica-se o somatório total da riqueza cresce ao longo do tempo. Neste contexto entende-se por riqueza a quantidade de produtos de consumo na posse dos agentes. Os objectivos deste trabalho passam por averiguar em que medida são cumpridos os critérios de avaliação comportamental dos agentes no modelo com a funcionalidade racionalidade acrescida. O algoritmo escolhido para dotar os agentes de racionalidade é o single Q do Reinforcement Learning. Escolheu-se um algoritmo de Reinforcement Learning porque o Reinforcement Learning é conhecido por aproximar a aprendizagem humana à computacional. Estudou-se o modelo desprovido de racionalidade e com racionalidade. No que diz respeito à racionalidade averiguou-se a influência da exploração do ambiente, o modo como os agentes adquirem a informação e o modo como descontam o seu futuro. Conclui-se que o algoritmo single Q-learning nao garante o cumprimento dos critérios.In this work one describes the implementation of a model based on a multi-agent system (MAS). The multi-agent model that will be implemented was proposed by Garrido (2010c). The implemented model has two essential criteria of agent's behaviour: | The system has the basic objective of making available to its members a level of wealth exceeding a subsistence minimum. It has also the objective of increasing the collective wealth over the time. The main objective of this work was endow agents with rationality to the agents and investigate to what extent this in uences the satisfaction of behaviour's criteria of the model. The algorithm chosen to endow the agents with rationality is the single Q of Reinforcement Learning. Reinforcement Learning algorithm was chosen because Reinforcement Learning is known to approximate human learning. Two platforms were used to implement the model: | LSD and ScicosLab. Both were evaluated and ScicosLab was preferred. The model devoid of rationality and the model with rationality were studied. With regard to rationality, the in uence of the exploitation of the environment and the in uence of learning rate and discount factor of agents, were tested. We conclude that the Reinforcement Learning Q-learning algorithm does not guarantee compliance with the criteria
    corecore