81 research outputs found

    Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

    Full text link
    Multi-simulator training has contributed to the recent success of Deep Reinforcement Learning by stabilizing learning and allowing for higher training throughputs. We propose Gossip-based Actor-Learner Architectures (GALA) where several actor-learners (such as A2C agents) are organized in a peer-to-peer communication topology, and exchange information through asynchronous gossip in order to take advantage of a large number of distributed simulators. We prove that GALA agents remain within an epsilon-ball of one-another during training when using loosely coupled asynchronous communication. By reducing the amount of synchronization between agents, GALA is more computationally efficient and scalable compared to A2C, its fully-synchronous counterpart. GALA also outperforms A2C, being more robust and sample efficient. We show that we can run several loosely coupled GALA agents in parallel on a single GPU and achieve significantly higher hardware utilization and frame-rates than vanilla A2C at comparable power draws

    State-of-the-art Techniques in Deep Edge Intelligence

    Full text link
    The potential held by the gargantuan volumes of data being generated across networks worldwide has been truly unlocked by machine learning techniques and more recently Deep Learning. The advantages offered by the latter have seen it rapidly becoming a framework of choice for various applications. However, the centralization of computational resources and the need for data aggregation have long been limiting factors in the democratization of Deep Learning applications. Edge Computing is an emerging paradigm that aims to utilize the hitherto untapped processing resources available at the network periphery. Edge Intelligence (EI) has quickly emerged as a powerful alternative to enable learning using the concepts of Edge Computing. Deep Learning-based Edge Intelligence or Deep Edge Intelligence (DEI) lies in this rapidly evolving domain. In this article, we provide an overview of the major constraints in operationalizing DEI. The major research avenues in DEI have been consolidated under Federated Learning, Distributed Computation, Compression Schemes and Conditional Computation. We also present some of the prevalent challenges and highlight prospective research avenues.Comment: 13 pages, 5 figures, 1 tabl

    A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

    Full text link
    This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy. To this end, the paper develops a multi-agent version of emphatic temporal difference learning for off-policy policy evaluation, and proves convergence under linear function approximation. The paper then leverages this result, in conjunction with a novel multi-agent off-policy policy gradient theorem and recent work in both multi-agent on-policy and single-agent off-policy actor-critic methods, to develop and give convergence guarantees for a new multi-agent off-policy actor-critic algorithm

    A Dynamic Service Trading in a DLT-Assisted Industrial IoT Marketplace

    Get PDF
    With the increasing demand for digitalization and participation in Industry 4.0, new challenges have emerged concerning the market of digital services to compensate for the lack of processing, computation, and other resources within Industrial Internet of Things (IIoTs). At the same time, the complexity of interplay among stakeholders has grown in size, granularity, and variation of trust. In this paper, we consider an IIoT resource market with heterogeneous buyers such as manufacturer owners. The buyers interact with the resource supplier dynamically with specific resource demands. This work introduces a broker between the supplier and the buyers, equipped with Distributed Ledger Technologies (DLT) providing a service for market security and trustworthiness. We first model the DLT-assisted IIoT market analytically to determine an offline solution and understand the selfish interactions among different entities (buyers, supplier, broker). Considering the non-cooperative heterogeneous buyers in the dynamic market, we then follow an independent learners framework to determine an online solution. In particular, the decision-making procedures of buyers are modeled as a Partially Observable Markov Decision Process which is solved using independent Q-learning. We evaluate both the offline and online solutions with analytical simulations, and the results show that the proposed approaches successfully maximize players’ satisfaction. The results further demonstrate that independent Q-learners achieve equilibrium in a dynamic market even without the availability of complete information and communication, and reach a better solution compared to that of centralized Q-learning

    Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration

    Full text link
    Future AI applications require performance, reliability and privacy that the existing, cloud-dependant system architectures cannot provide. In this article, we study orchestration in the device-edge-cloud continuum, and focus on AI for edge, that is, the AI methods used in resource orchestration. We claim that to support the constantly growing requirements of intelligent applications in the device-edge-cloud computing continuum, resource orchestration needs to embrace edge AI and emphasize local autonomy and intelligence. To justify the claim, we provide a general definition for continuum orchestration, and look at how current and emerging orchestration paradigms are suitable for the computing continuum. We describe certain major emerging research themes that may affect future orchestration, and provide an early vision of an orchestration paradigm that embraces those research themes. Finally, we survey current key edge AI methods and look at how they may contribute into fulfilling the vision of future continuum orchestration.Comment: 50 pages, 8 figures (Revised content in all sections, added figures and new section

    Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research

    Full text link
    Evolution has produced a multi-scale mosaic of interacting adaptive units. Innovations arise when perturbations push parts of the system away from stable equilibria into new regimes where previously well-adapted solutions no longer work. Here we explore the hypothesis that multi-agent systems sometimes display intrinsic dynamics arising from competition and cooperation that provide a naturally emergent curriculum, which we term an autocurriculum. The solution of one social task often begets new social tasks, continually generating novel challenges, and thereby promoting innovation. Under certain conditions these challenges may become increasingly complex over time, demanding that agents accumulate ever more innovations.Comment: 16 pages, 2 figure

    Cooperative scheduling and load balancing techniques in fog and edge computing

    Get PDF
    Fog and Edge Computing are two models that reached maturity in the last decade. Today, they are two solid concepts and plenty of literature tried to develop them. Also corroborated by the development of technologies, like for example 5G, they can now be considered de facto standards when building low and ultra-low latency applications, privacy-oriented solutions, industry 4.0 and smart city infrastructures. The common trait of Fog and Edge computing environments regards their inherent distributed and heterogeneous nature where the multiple (Fog or Edge) nodes are able to interact with each other with the essential purpose of pre-processing data gathered by the uncountable number of sensors to which they are connected to, even by running significant ML models and relying upon specific processors (TPU). However, nodes are often placed in a geographic domain, like a smart city, and the dynamic of the traffic during the day may cause some nodes to be overwhelmed by requests while others instead may become completely idle. To achieve the optimal usage of the system and also to guarantee the best possible QoS across all the users connected to the Fog or Edge nodes, the need to design load balancing and scheduling algorithms arises. In particular, a reasonable solution is to enable nodes to cooperate. This capability represents the main objective of this thesis, which is the design of fully distributed algorithms and solutions whose purpose is the one of balancing the load across all the nodes, also by following, if possible, QoS requirements in terms of latency or imposing constraints in terms of power consumption when the nodes are powered by green energy sources. Unfortunately, when a central orchestrator is missing, a crucial element which makes the design of such algorithms difficult is that nodes need to know the state of the others in order to make the best possible scheduling decision. However, it is not possible to retrieve the state without introducing further latency during the service of the request. Furthermore, the retrieved information about the state is always old, and as a consequence, the decision is always relying on imprecise data. In this thesis, the problem is circumvented in two main ways. The first one considers randomised algorithms which avoid probing all of the neighbour nodes in favour of at maximum two nodes picked at random. This is proven to bring an exponential improvement in performance with respect to the probe of a single node. The second approach, instead, considers Reinforcement Learning as a technique for inferring the state of the other nodes thanks to the reward received by the agents when requests are forwarded. Moreover, the thesis will also focus on the energy aspect of the Edge devices. In particular, will be analysed a scenario of Green Edge Computing, where devices are powered only by Photovoltaic Panels and a scenario of mobile offloading targeting ML image inference applications. Lastly, a final glance will be given at a series of infrastructural studies, which will give the foundations for implementing the proposed algorithms on real devices, in particular, Single Board Computers (SBCs). There will be presented a structural scheme of a testbed of Raspberry Pi boards, and a fully-fledged framework called ``P2PFaaS'' which allows the implementation of load balancing and scheduling algorithms based on the Function-as-a-Service (FaaS) paradigm

    Federated Bandit: A Gossiping Approach

    Full text link
    In this paper, we study \emph{Federated Bandit}, a decentralized Multi-Armed Bandit problem with a set of NN agents, who can only communicate their local data with neighbors described by a connected graph GG. Each agent makes a sequence of decisions on selecting an arm from MM candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm Gossip_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that Gossip_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(max{poly(N,M)logT,poly(N,M)logλ21N})O(\max\{ \texttt{poly}(N,M) \log T, \texttt{poly}(N,M)\log_{\lambda_2^{-1}} N\}) for all NN agents, where λ2(0,1)\lambda_2\in(0,1) is the second largest eigenvalue of the expected gossip matrix, which is a function of GG. We then propose Fed_UCB, a differentially private version of Gossip_UCB, in which the agents preserve ϵ\epsilon-differential privacy of their local data while achieving O(max{poly(N,M)ϵlog2.5T,poly(N,M)(logλ21N+logT)})O(\max \{\frac{\texttt{poly}(N,M)}{\epsilon}\log^{2.5} T, \texttt{poly}(N,M) (\log_{\lambda_2^{-1}} N + \log T) \}) regret.Comment: Accepted by ACM SIGMETRICS 202

    Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

    Full text link
    Recent years have witnessed significant advances in reinforcement learning (RL), which has registered great success in solving various sequential decision-making problems in machine learning. Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively long history, and has recently re-emerged due to advances in single-agent RL techniques. Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature. In this chapter, we provide a selective overview of MARL, with focus on algorithms backed by theoretical analysis. More specifically, we review the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two. We also introduce several significant but challenging applications of these algorithms. Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc. Some of the new angles extrapolate from our own research endeavors and interests. Our overall goal with this chapter is, beyond providing an assessment of the current state of the field on the mark, to identify fruitful future research directions on theoretical studies of MARL. We expect this chapter to serve as continuing stimulus for researchers interested in working on this exciting while challenging topic.Comment: Invited Chapter in Handbook on RL and Control (Springer Studies in Systems, Decision and Control

    Foundations of Trusted Autonomy

    Get PDF
    Trusted Autonomy; Automation Technology; Autonomous Systems; Self-Governance; Trusted Autonomous Systems; Design of Algorithms and Methodologie
    corecore