81 research outputs found
Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
Multi-simulator training has contributed to the recent success of Deep
Reinforcement Learning by stabilizing learning and allowing for higher training
throughputs. We propose Gossip-based Actor-Learner Architectures (GALA) where
several actor-learners (such as A2C agents) are organized in a peer-to-peer
communication topology, and exchange information through asynchronous gossip in
order to take advantage of a large number of distributed simulators. We prove
that GALA agents remain within an epsilon-ball of one-another during training
when using loosely coupled asynchronous communication. By reducing the amount
of synchronization between agents, GALA is more computationally efficient and
scalable compared to A2C, its fully-synchronous counterpart. GALA also
outperforms A2C, being more robust and sample efficient. We show that we can
run several loosely coupled GALA agents in parallel on a single GPU and achieve
significantly higher hardware utilization and frame-rates than vanilla A2C at
comparable power draws
State-of-the-art Techniques in Deep Edge Intelligence
The potential held by the gargantuan volumes of data being generated across
networks worldwide has been truly unlocked by machine learning techniques and
more recently Deep Learning. The advantages offered by the latter have seen it
rapidly becoming a framework of choice for various applications. However, the
centralization of computational resources and the need for data aggregation
have long been limiting factors in the democratization of Deep Learning
applications. Edge Computing is an emerging paradigm that aims to utilize the
hitherto untapped processing resources available at the network periphery. Edge
Intelligence (EI) has quickly emerged as a powerful alternative to enable
learning using the concepts of Edge Computing. Deep Learning-based Edge
Intelligence or Deep Edge Intelligence (DEI) lies in this rapidly evolving
domain. In this article, we provide an overview of the major constraints in
operationalizing DEI. The major research avenues in DEI have been consolidated
under Federated Learning, Distributed Computation, Compression Schemes and
Conditional Computation. We also present some of the prevalent challenges and
highlight prospective research avenues.Comment: 13 pages, 5 figures, 1 tabl
A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
This paper extends off-policy reinforcement learning to the multi-agent case
in which a set of networked agents communicating with their neighbors according
to a time-varying graph collaboratively evaluates and improves a target policy
while following a distinct behavior policy. To this end, the paper develops a
multi-agent version of emphatic temporal difference learning for off-policy
policy evaluation, and proves convergence under linear function approximation.
The paper then leverages this result, in conjunction with a novel multi-agent
off-policy policy gradient theorem and recent work in both multi-agent
on-policy and single-agent off-policy actor-critic methods, to develop and give
convergence guarantees for a new multi-agent off-policy actor-critic algorithm
A Dynamic Service Trading in a DLT-Assisted Industrial IoT Marketplace
With the increasing demand for digitalization and participation in Industry 4.0, new challenges have emerged concerning the market of digital services to compensate for the lack of processing, computation, and other resources within Industrial Internet of Things (IIoTs). At the same time, the complexity of interplay among stakeholders has grown in size, granularity, and variation of trust. In this paper, we consider an IIoT resource market with heterogeneous buyers such as manufacturer owners. The buyers interact with the resource supplier dynamically with specific resource demands. This work introduces a broker between the supplier and the buyers, equipped with Distributed Ledger Technologies (DLT) providing a service for market security and trustworthiness. We first model the DLT-assisted IIoT market analytically to determine an offline solution and understand the selfish interactions among different entities (buyers, supplier, broker). Considering the non-cooperative heterogeneous buyers in the dynamic market, we then follow an independent learners framework to determine an online solution. In particular, the decision-making procedures of buyers are modeled as a Partially Observable Markov Decision Process which is solved using independent Q-learning. We evaluate both the offline and online solutions with analytical simulations, and the results show that the proposed approaches successfully maximize players’ satisfaction. The results further demonstrate that independent Q-learners achieve equilibrium in a dynamic market even without the availability of complete information and communication, and reach a better solution compared to that of centralized Q-learning
Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration
Future AI applications require performance, reliability and privacy that the
existing, cloud-dependant system architectures cannot provide. In this article,
we study orchestration in the device-edge-cloud continuum, and focus on AI for
edge, that is, the AI methods used in resource orchestration. We claim that to
support the constantly growing requirements of intelligent applications in the
device-edge-cloud computing continuum, resource orchestration needs to embrace
edge AI and emphasize local autonomy and intelligence. To justify the claim, we
provide a general definition for continuum orchestration, and look at how
current and emerging orchestration paradigms are suitable for the computing
continuum. We describe certain major emerging research themes that may affect
future orchestration, and provide an early vision of an orchestration paradigm
that embraces those research themes. Finally, we survey current key edge AI
methods and look at how they may contribute into fulfilling the vision of
future continuum orchestration.Comment: 50 pages, 8 figures (Revised content in all sections, added figures
and new section
Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research
Evolution has produced a multi-scale mosaic of interacting adaptive units.
Innovations arise when perturbations push parts of the system away from stable
equilibria into new regimes where previously well-adapted solutions no longer
work. Here we explore the hypothesis that multi-agent systems sometimes display
intrinsic dynamics arising from competition and cooperation that provide a
naturally emergent curriculum, which we term an autocurriculum. The solution of
one social task often begets new social tasks, continually generating novel
challenges, and thereby promoting innovation. Under certain conditions these
challenges may become increasingly complex over time, demanding that agents
accumulate ever more innovations.Comment: 16 pages, 2 figure
Cooperative scheduling and load balancing techniques in fog and edge computing
Fog and Edge Computing are two models that reached maturity in the last decade. Today, they are two solid concepts and plenty of literature tried to develop them. Also corroborated by the development of technologies, like for example 5G, they can now be considered de facto standards when building low and ultra-low latency applications, privacy-oriented solutions, industry 4.0 and smart city infrastructures. The common trait of Fog and Edge computing environments regards their inherent distributed and heterogeneous nature where the multiple (Fog or Edge) nodes are able to interact with each other with the essential purpose of pre-processing data gathered by the uncountable number of sensors to which they are connected to, even by running significant ML models and relying upon specific processors (TPU). However, nodes are often placed in a geographic domain, like a smart city, and the dynamic of the traffic during the day may cause some nodes to be overwhelmed by requests while others instead may become completely idle. To achieve the optimal usage of the system and also to guarantee the best possible QoS across all the users connected to the Fog or Edge nodes, the need to design load balancing and scheduling algorithms arises. In particular, a reasonable solution is to enable nodes to cooperate. This capability represents the main objective of this thesis, which is the design of fully distributed algorithms and solutions whose purpose is the one of balancing the load across all the nodes, also by following, if possible, QoS requirements in terms of latency or imposing constraints in terms of power consumption when the nodes are powered by green energy sources. Unfortunately, when a central orchestrator is missing, a crucial element which makes the design of such algorithms difficult is that nodes need to know the state of the others in order to make the best possible scheduling decision. However, it is not possible to retrieve the state without introducing further latency during the service of the request. Furthermore, the retrieved information about the state is always old, and as a consequence, the decision is always relying on imprecise data. In this thesis, the problem is circumvented in two main ways. The first one considers randomised algorithms which avoid probing all of the neighbour nodes in favour of at maximum two nodes picked at random. This is proven to bring an exponential improvement in performance with respect to the probe of a single node. The second approach, instead, considers Reinforcement Learning as a technique for inferring the state of the other nodes thanks to the reward received by the agents when requests are forwarded.
Moreover, the thesis will also focus on the energy aspect of the Edge devices. In particular, will be analysed a scenario of Green Edge Computing, where devices are powered only by Photovoltaic Panels and a scenario of mobile offloading targeting ML image inference applications.
Lastly, a final glance will be given at a series of infrastructural studies, which will give the foundations for implementing the proposed algorithms on real devices, in particular, Single Board Computers (SBCs). There will be presented a structural scheme of a testbed of Raspberry Pi boards, and a fully-fledged framework called ``P2PFaaS'' which allows the implementation of load balancing and scheduling algorithms based on the Function-as-a-Service (FaaS) paradigm
Federated Bandit: A Gossiping Approach
In this paper, we study \emph{Federated Bandit}, a decentralized Multi-Armed
Bandit problem with a set of agents, who can only communicate their local
data with neighbors described by a connected graph . Each agent makes a
sequence of decisions on selecting an arm from candidates, yet they only
have access to local and potentially biased feedback/evaluation of the true
reward for each action taken. Learning only locally will lead agents to
sub-optimal actions while converging to a no-regret strategy requires a
collection of distributed data. Motivated by the proposal of federated
learning, we aim for a solution with which agents will never share their local
observations with a central entity, and will be allowed to only share a private
copy of his/her own information with their neighbors. We first propose a
decentralized bandit algorithm Gossip_UCB, which is a coupling of variants of
both the classical gossiping algorithm and the celebrated Upper Confidence
Bound (UCB) bandit algorithm. We show that Gossip_UCB successfully adapts local
bandit learning into a global gossiping process for sharing information among
connected agents, and achieves guaranteed regret at the order of for
all agents, where is the second largest eigenvalue of
the expected gossip matrix, which is a function of . We then propose
Fed_UCB, a differentially private version of Gossip_UCB, in which the agents
preserve -differential privacy of their local data while achieving
regret.Comment: Accepted by ACM SIGMETRICS 202
Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms
Recent years have witnessed significant advances in reinforcement learning
(RL), which has registered great success in solving various sequential
decision-making problems in machine learning. Most of the successful RL
applications, e.g., the games of Go and Poker, robotics, and autonomous
driving, involve the participation of more than one single agent, which
naturally fall into the realm of multi-agent RL (MARL), a domain with a
relatively long history, and has recently re-emerged due to advances in
single-agent RL techniques. Though empirically successful, theoretical
foundations for MARL are relatively lacking in the literature. In this chapter,
we provide a selective overview of MARL, with focus on algorithms backed by
theoretical analysis. More specifically, we review the theoretical results of
MARL algorithms mainly within two representative frameworks, Markov/stochastic
games and extensive-form games, in accordance with the types of tasks they
address, i.e., fully cooperative, fully competitive, and a mix of the two. We
also introduce several significant but challenging applications of these
algorithms. Orthogonal to the existing reviews on MARL, we highlight several
new angles and taxonomies of MARL theory, including learning in extensive-form
games, decentralized MARL with networked agents, MARL in the mean-field regime,
(non-)convergence of policy-based methods for learning in games, etc. Some of
the new angles extrapolate from our own research endeavors and interests. Our
overall goal with this chapter is, beyond providing an assessment of the
current state of the field on the mark, to identify fruitful future research
directions on theoretical studies of MARL. We expect this chapter to serve as
continuing stimulus for researchers interested in working on this exciting
while challenging topic.Comment: Invited Chapter in Handbook on RL and Control (Springer Studies in
Systems, Decision and Control
Foundations of Trusted Autonomy
Trusted Autonomy; Automation Technology; Autonomous Systems; Self-Governance; Trusted Autonomous Systems; Design of Algorithms and Methodologie
- …