6,690 research outputs found
Recommended from our members
Online learning and decision-making from implicit feedback
This thesis focuses on designing learning and control algorithms for emerging resource allocation platforms like recommender systems, 5G wireless networks, and online marketplaces. These systems have an environment which is only partially known. Thus, the controllers need to make resource allocation decisions based on implicit feedback obtained from the environment based on past actions. The goal is to sequentially select actions using incremental feedback so as to optimize performance while simultaneously learning about the environment. We study three problems which exemplify this setting. The first is an inference problem which requires identification of sponsored content in recommender systems. Specifically, we ask if it is possible to detect the existence of sponsored content disguised as genuine recommendations using implicit feedback from a subset of users of the recommender system. The second problem is the design of scheduling algorithms for switch networks when the user-server link statistics are unknown (for e.g., in wireless networks, online marketplaces). The scheduling algorithm has to tradeoff between scheduling the optimal links and obtaining sufficient feedback about all the links for accurate estimates. We observe the close connection of this problem to the stochastic multi-armed bandit problem and analyze bandit-style explore-exploit algorithms for learning the statistical parameters while simultaneously assigning servers to users. The third is the joint problem of base station activation and rate allocation in an energy efficient wireless network when the channel statistics are unknown. The controller observes instantaneous channel rates of activated BSs, and thereby sequentially obtains implicit feedback about the channel. Here again, there is a tradeoff between learning the channel versus optimizing the operation cost based on estimated parameters. For each of these systems, we propose algorithms with provable asymptotic guarantees. These learning algorithms highlight the use of implicit feedback in online decision making and control.Electrical and Computer Engineerin
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
An Online Approach to Dynamic Channel Access and Transmission Scheduling
Making judicious channel access and transmission scheduling decisions is
essential for improving performance as well as energy and spectral efficiency
in multichannel wireless systems. This problem has been a subject of extensive
study in the past decade, and the resulting dynamic and opportunistic channel
access schemes can bring potentially significant improvement over traditional
schemes. However, a common and severe limitation of these dynamic schemes is
that they almost always require some form of a priori knowledge of the channel
statistics. A natural remedy is a learning framework, which has also been
extensively studied in the same context, but a typical learning algorithm in
this literature seeks only the best static policy, with performance measured by
weak regret, rather than learning a good dynamic channel access policy. There
is thus a clear disconnect between what an optimal channel access policy can
achieve with known channel statistics that actively exploits temporal, spatial
and spectral diversity, and what a typical existing learning algorithm aims
for, which is the static use of a single channel devoid of diversity gain. In
this paper we bridge this gap by designing learning algorithms that track known
optimal or sub-optimal dynamic channel access and transmission scheduling
policies, thereby yielding performance measured by a form of strong regret, the
accumulated difference between the reward returned by an optimal solution when
a priori information is available and that by our online algorithm. We do so in
the context of two specific algorithms that appeared in [1] and [2],
respectively, the former for a multiuser single-channel setting and the latter
for a single-user multichannel setting. In both cases we show that our
algorithms achieve sub-linear regret uniform in time and outperforms the
standard weak-regret learning algorithms.Comment: 10 pages, to appear in MobiHoc 201
Throughput Optimal Scheduling with Dynamic Channel Feedback
It is well known that opportunistic scheduling algorithms are throughput
optimal under full knowledge of channel and network conditions. However, these
algorithms achieve a hypothetical achievable rate region which does not take
into account the overhead associated with channel probing and feedback required
to obtain the full channel state information at every slot. We adopt a channel
probing model where fraction of time slot is consumed for acquiring the
channel state information (CSI) of a single channel. In this work, we design a
joint scheduling and channel probing algorithm named SDF by considering the
overhead of obtaining the channel state information. We first analytically
prove SDF algorithm can support fraction of of the full rate
region achieved when all users are probed where depends on the
expected number of users which are not probed. Then, for homogenous channel, we
show that when the number of users in the network is greater than 3, , i.e., we guarantee to expand the rate region. In addition, for
heterogenous channels, we prove the conditions under which SDF guarantees to
increase the rate region. We also demonstrate numerically in a realistic
simulation setting that this rate region can be achieved by probing only less
than 50% of all channels in a CDMA based cellular network utilizing high data
rate protocol under normal channel conditions.Comment: submitte
- …