469 research outputs found

    How Computer Networks Can Become Smart

    Get PDF

    Multiagent Deep Reinforcement Learning: Challenges and Directions Towards Human-Like Approaches

    Full text link
    This paper surveys the field of multiagent deep reinforcement learning. The combination of deep neural networks with reinforcement learning has gained increased traction in recent years and is slowly shifting the focus from single-agent to multiagent environments. Dealing with multiple agents is inherently more complex as (a) the future rewards depend on the joint actions of multiple players and (b) the computational complexity of functions increases. We present the most common multiagent problem representations and their main challenges, and identify five research areas that address one or more of these challenges: centralised training and decentralised execution, opponent modelling, communication, efficient coordination, and reward shaping. We find that many computational studies rely on unrealistic assumptions or are not generalisable to other settings; they struggle to overcome the curse of dimensionality or nonstationarity. Approaches from psychology and sociology capture promising relevant behaviours such as communication and coordination. We suggest that, for multiagent reinforcement learning to be successful, future research addresses these challenges with an interdisciplinary approach to open up new possibilities for more human-oriented solutions in multiagent reinforcement learning.Comment: 37 pages, 6 figure

    Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration

    Full text link
    Future AI applications require performance, reliability and privacy that the existing, cloud-dependant system architectures cannot provide. In this article, we study orchestration in the device-edge-cloud continuum, and focus on AI for edge, that is, the AI methods used in resource orchestration. We claim that to support the constantly growing requirements of intelligent applications in the device-edge-cloud computing continuum, resource orchestration needs to embrace edge AI and emphasize local autonomy and intelligence. To justify the claim, we provide a general definition for continuum orchestration, and look at how current and emerging orchestration paradigms are suitable for the computing continuum. We describe certain major emerging research themes that may affect future orchestration, and provide an early vision of an orchestration paradigm that embraces those research themes. Finally, we survey current key edge AI methods and look at how they may contribute into fulfilling the vision of future continuum orchestration.Comment: 50 pages, 8 figures (Revised content in all sections, added figures and new section

    Distributed strategy adaptation with a prediction function in multi-agent task allocation

    Get PDF
    Coordinating multiple agents to complete a set of tasks under time constraints is a complex problem. Distributed consensus-based task allocation algorithms address this problem without the need for human supervision. With such algorithms, agents add tasks to their own schedule according to specified allocation strategies. Various factors, such as the available resources and number of tasks, may affect the efficiency of a particular allocation strategy. The novel idea we suggest is that each individual agent can predict locally the best task inclusion strategy, based on the limited task assignment information communicated among networked agents. Using supervised classification learning, a function is trained to predict the most appropriate strategy between two well known insertion heuristics. Using the proposed method, agents are shown to correctly predict and select the optimal insertion heuristic to achieve the overall highest number of task allocations. The adaptive agents consistently match the performances of the best non-adaptive agents across a variety of scenarios. This study aims to demonstrate the possibility and potential performance benefits of giving agents greater decision making capabilities to independently adapt the task allocation process in line with the problem of interest

    Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

    Full text link
    Bayesian policy reuse (BPR) is a general policy transfer framework for selecting a source policy from an offline library by inferring the task belief based on some observation signals and a trained observation model. In this paper, we propose an improved BPR method to achieve more efficient policy transfer in deep reinforcement learning (DRL). First, most BPR algorithms use the episodic return as the observation signal that contains limited information and cannot be obtained until the end of an episode. Instead, we employ the state transition sample, which is informative and instantaneous, as the observation signal for faster and more accurate task inference. Second, BPR algorithms usually require numerous samples to estimate the probability distribution of the tabular-based observation model, which may be expensive and even infeasible to learn and maintain, especially when using the state transition sample as the signal. Hence, we propose a scalable observation model based on fitting state transition functions of source tasks from only a small number of samples, which can generalize to any signals observed in the target task. Moreover, we extend the offline-mode BPR to the continual learning setting by expanding the scalable observation model in a plug-and-play fashion, which can avoid negative transfer when faced with new unknown tasks. Experimental results show that our method can consistently facilitate faster and more efficient policy transfer.Comment: 16 pages, 6 figures, under revie

    A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

    Get PDF
    The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research
    • …
    corecore