755 research outputs found
Learning Algorithms for Minimizing Queue Length Regret
We consider a system consisting of a single transmitter/receiver pair and
channels over which they may communicate. Packets randomly arrive to the
transmitter's queue and wait to be successfully sent to the receiver. The
transmitter may attempt a frame transmission on one channel at a time, where
each frame includes a packet if one is in the queue. For each channel, an
attempted transmission is successful with an unknown probability. The
transmitter's objective is to quickly identify the best channel to minimize the
number of packets in the queue over time slots. To analyze system
performance, we introduce queue length regret, which is the expected difference
between the total queue length of a learning policy and a controller that knows
the rates, a priori. One approach to designing a transmission policy would be
to apply algorithms from the literature that solve the closely-related
stochastic multi-armed bandit problem. These policies would focus on maximizing
the number of successful frame transmissions over time. However, we show that
these methods have queue length regret. On the other hand, we
show that there exists a set of queue-length based policies that can obtain
order optimal queue length regret. We use our theoretical analysis to
devise heuristic methods that are shown to perform well in simulation.Comment: 28 Pages, 11 figure
Learning and Management for Internet-of-Things: Accounting for Adaptivity and Scalability
Internet-of-Things (IoT) envisions an intelligent infrastructure of networked
smart devices offering task-specific monitoring and control services. The
unique features of IoT include extreme heterogeneity, massive number of
devices, and unpredictable dynamics partially due to human interaction. These
call for foundational innovations in network design and management. Ideally, it
should allow efficient adaptation to changing environments, and low-cost
implementation scalable to massive number of devices, subject to stringent
latency constraints. To this end, the overarching goal of this paper is to
outline a unified framework for online learning and management policies in IoT
through joint advances in communication, networking, learning, and
optimization. From the network architecture vantage point, the unified
framework leverages a promising fog architecture that enables smart devices to
have proximity access to cloud functionalities at the network edge, along the
cloud-to-things continuum. From the algorithmic perspective, key innovations
target online approaches adaptive to different degrees of nonstationarity in
IoT dynamics, and their scalable model-free implementation under limited
feedback that motivates blind or bandit approaches. The proposed framework
aspires to offer a stepping stone that leads to systematic designs and analysis
of task-specific learning and management schemes for IoT, along with a host of
new research directions to build on.Comment: Submitted on June 15 to Proceeding of IEEE Special Issue on Adaptive
and Scalable Communication Network
Decentralized Learning in Online Queuing Systems
Motivated by packet routing in computer networks, online queuing systems are
composed of queues receiving packets at different rates. Repeatedly, they send
packets to servers, each of them treating only at most one packet at a time. In
the centralized case, the number of accumulated packets remains bounded (i.e.,
the system is \textit{stable}) as long as the ratio between service rates and
arrival rates is larger than . In the decentralized case, individual
no-regret strategies ensures stability when this ratio is larger than . Yet,
myopically minimizing regret disregards the long term effects due to the
carryover of packets to further rounds. On the other hand, minimizing long term
costs leads to stable Nash equilibria as soon as the ratio exceeds
. Stability with decentralized learning strategies with a ratio
below was a major remaining question. We first argue that for ratios up to
, cooperation is required for stability of learning strategies, as selfish
minimization of policy regret, a \textit{patient} notion of regret, might
indeed still be unstable in this case. We therefore consider cooperative queues
and propose the first learning decentralized algorithm guaranteeing stability
of the system as long as the ratio of rates is larger than , thus reaching
performances comparable to centralized strategies.Comment: NeurIPS 2021 camera read
- …