256,642 research outputs found
Power Allocation for Conventional and Buffer-Aided Link Adaptive Relaying Systems with Energy Harvesting Nodes
Energy harvesting (EH) nodes can play an important role in cooperative
communication systems which do not have a continuous power supply. In this
paper, we consider the optimization of conventional and buffer-aided link
adaptive EH relaying systems, where an EH source communicates with the
destination via an EH decode-and-forward relay. In conventional relaying,
source and relay transmit signals in consecutive time slots whereas in
buffer-aided link adaptive relaying, the state of the source-relay and
relay-destination channels determines whether the source or the relay is
selected for transmission. Our objective is to maximize the system throughput
over a finite number of transmission time slots for both relaying protocols. In
case of conventional relaying, we propose an offline and several online joint
source and relay transmit power allocation schemes. For offline power
allocation, we formulate an optimization problem which can be solved optimally.
For the online case, we propose a dynamic programming (DP) approach to compute
the optimal online transmit power. To alleviate the complexity inherent to DP,
we also propose several suboptimal online power allocation schemes. For
buffer-aided link adaptive relaying, we show that the joint offline
optimization of the source and relay transmit powers along with the link
selection results in a mixed integer non-linear program which we solve
optimally using the spatial branch-and-bound method. We also propose an
efficient online power allocation scheme and a naive online power allocation
scheme for buffer-aided link adaptive relaying. Our results show that link
adaptive relaying provides performance improvement over conventional relaying
at the expense of a higher computational complexity.Comment: Submitted to IEEE Transactions on Wireless Communication
Batch Policy Learning under Constraints
When learning policies for real-world domains, two important questions arise:
(i) how to efficiently use pre-collected off-policy, non-optimal behavior data;
and (ii) how to mediate among different competing objectives and constraints.
We thus study the problem of batch policy learning under multiple constraints,
and offer a systematic solution. We first propose a flexible meta-algorithm
that admits any batch reinforcement learning and online learning procedure as
subroutines. We then present a specific algorithmic instantiation and provide
performance guarantees for the main objective and all constraints. To certify
constraint satisfaction, we propose a new and simple method for off-policy
policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves
strong empirical results in different domains, including in a challenging
problem of simulated car driving subject to multiple constraints such as lane
keeping and smooth driving. We also show experimentally that our OPE method
outperforms other popular OPE techniques on a standalone basis, especially in a
high-dimensional setting
A Bandit Approach to Maximum Inner Product Search
There has been substantial research on sub-linear time approximate algorithms
for Maximum Inner Product Search (MIPS). To achieve fast query time,
state-of-the-art techniques require significant preprocessing, which can be a
burden when the number of subsequent queries is not sufficiently large to
amortize the cost. Furthermore, existing methods do not have the ability to
directly control the suboptimality of their approximate results with
theoretical guarantees. In this paper, we propose the first approximate
algorithm for MIPS that does not require any preprocessing, and allows users to
control and bound the suboptimality of the results. We cast MIPS as a Best Arm
Identification problem, and introduce a new bandit setting that can fully
exploit the special structure of MIPS. Our approach outperforms
state-of-the-art methods on both synthetic and real-world datasets.Comment: AAAI 201
- …