2,454 research outputs found
Spectrum Bandit Optimization
We consider the problem of allocating radio channels to links in a wireless
network. Links interact through interference, modelled as a conflict graph
(i.e., two interfering links cannot be simultaneously active on the same
channel). We aim at identifying the channel allocation maximizing the total
network throughput over a finite time horizon. Should we know the average radio
conditions on each channel and on each link, an optimal allocation would be
obtained by solving an Integer Linear Program (ILP). When radio conditions are
unknown a priori, we look for a sequential channel allocation policy that
converges to the optimal allocation while minimizing on the way the throughput
loss or {\it regret} due to the need for exploring sub-optimal allocations. We
formulate this problem as a generic linear bandit problem, and analyze it first
in a stochastic setting where radio conditions are driven by a stationary
stochastic process, and then in an adversarial setting where radio conditions
can evolve arbitrarily. We provide new algorithms in both settings and derive
upper bounds on their regrets.Comment: 21 page
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
Many applications require optimizing an unknown, noisy function that is
expensive to evaluate. We formalize this task as a multi-armed bandit problem,
where the payoff function is either sampled from a Gaussian process (GP) or has
low RKHS norm. We resolve the important open problem of deriving regret bounds
for this setting, which imply novel convergence rates for GP optimization. We
analyze GP-UCB, an intuitive upper-confidence based algorithm, and bound its
cumulative regret in terms of maximal information gain, establishing a novel
connection between GP optimization and experimental design. Moreover, by
bounding the latter in terms of operator spectra, we obtain explicit sublinear
regret bounds for many commonly used covariance functions. In some important
cases, our bounds have surprisingly weak dependence on the dimensionality. In
our experiments on real sensor data, GP-UCB compares favorably with other
heuristical GP optimization approaches
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
Joint Channel Selection and Power Control in Infrastructureless Wireless Networks: A Multi-Player Multi-Armed Bandit Framework
This paper deals with the problem of efficient resource allocation in dynamic
infrastructureless wireless networks. Assuming a reactive interference-limited
scenario, each transmitter is allowed to select one frequency channel (from a
common pool) together with a power level at each transmission trial; hence, for
all transmitters, not only the fading gain, but also the number of interfering
transmissions and their transmit powers are varying over time. Due to the
absence of a central controller and time-varying network characteristics, it is
highly inefficient for transmitters to acquire global channel and network
knowledge. Therefore a reasonable assumption is that transmitters have no
knowledge of fading gains, interference, and network topology. Each
transmitting node selfishly aims at maximizing its average reward (or
minimizing its average cost), which is a function of the action of that
specific transmitter as well as those of all other transmitters. This scenario
is modeled as a multi-player multi-armed adversarial bandit game, in which
multiple players receive an a priori unknown reward with an arbitrarily
time-varying distribution by sequentially pulling an arm, selected from a known
and finite set of arms. Since players do not know the arm with the highest
average reward in advance, they attempt to minimize their so-called regret,
determined by the set of players' actions, while attempting to achieve
equilibrium in some sense. To this end, we design in this paper two joint power
level and channel selection strategies. We prove that the gap between the
average reward achieved by our approaches and that based on the best fixed
strategy converges to zero asymptotically. Moreover, the empirical joint
frequencies of the game converge to the set of correlated equilibria. We
further characterize this set for two special cases of our designed game
Online Influence Maximization in Non-Stationary Social Networks
Social networks have been popular platforms for information propagation. An
important use case is viral marketing: given a promotion budget, an advertiser
can choose some influential users as the seed set and provide them free or
discounted sample products; in this way, the advertiser hopes to increase the
popularity of the product in the users' friend circles by the world-of-mouth
effect, and thus maximizes the number of users that information of the
production can reach. There has been a body of literature studying the
influence maximization problem. Nevertheless, the existing studies mostly
investigate the problem on a one-off basis, assuming fixed known influence
probabilities among users, or the knowledge of the exact social network
topology. In practice, the social network topology and the influence
probabilities are typically unknown to the advertiser, which can be varying
over time, i.e., in cases of newly established, strengthened or weakened social
ties. In this paper, we focus on a dynamic non-stationary social network and
design a randomized algorithm, RSB, based on multi-armed bandit optimization,
to maximize influence propagation over time. The algorithm produces a sequence
of online decisions and calibrates its explore-exploit strategy utilizing
outcomes of previous decisions. It is rigorously proven to achieve an
upper-bounded regret in reward and applicable to large-scale social networks.
Practical effectiveness of the algorithm is evaluated using both synthetic and
real-world datasets, which demonstrates that our algorithm outperforms previous
stationary methods under non-stationary conditions.Comment: 10 pages. To appear in IEEE/ACM IWQoS 2016. Full versio
- …