6,197 research outputs found
Constrained Deep Networks: Lagrangian Optimization via Log-Barrier Extensions
This study investigates the optimization aspects of imposing hard inequality
constraints on the outputs of CNNs. In the context of deep networks,
constraints are commonly handled with penalties for their simplicity, and
despite their well-known limitations. Lagrangian-dual optimization has been
largely avoided, except for a few recent works, mainly due to the computational
complexity and stability/convergence issues caused by alternating explicit dual
updates/projections and stochastic optimization. Several studies showed that,
surprisingly for deep CNNs, the theoretical and practical advantages of
Lagrangian optimization over penalties do not materialize in practice. We
propose log-barrier extensions, which approximate Lagrangian optimization of
constrained-CNN problems with a sequence of unconstrained losses. Unlike
standard interior-point and log-barrier methods, our formulation does not need
an initial feasible solution. Furthermore, we provide a new technical result,
which shows that the proposed extensions yield an upper bound on the duality
gap. This generalizes the duality-gap result of standard log-barriers, yielding
sub-optimality certificates for feasible solutions. While sub-optimality is not
guaranteed for non-convex problems, our result shows that log-barrier
extensions are a principled way to approximate Lagrangian optimization for
constrained CNNs via implicit dual variables. We report comprehensive weakly
supervised segmentation experiments, with various constraints, showing that our
formulation outperforms substantially the existing constrained-CNN methods,
both in terms of accuracy, constraint satisfaction and training stability, more
so when dealing with a large number of constraints
Batch Policy Learning under Constraints
When learning policies for real-world domains, two important questions arise:
(i) how to efficiently use pre-collected off-policy, non-optimal behavior data;
and (ii) how to mediate among different competing objectives and constraints.
We thus study the problem of batch policy learning under multiple constraints,
and offer a systematic solution. We first propose a flexible meta-algorithm
that admits any batch reinforcement learning and online learning procedure as
subroutines. We then present a specific algorithmic instantiation and provide
performance guarantees for the main objective and all constraints. To certify
constraint satisfaction, we propose a new and simple method for off-policy
policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves
strong empirical results in different domains, including in a challenging
problem of simulated car driving subject to multiple constraints such as lane
keeping and smooth driving. We also show experimentally that our OPE method
outperforms other popular OPE techniques on a standalone basis, especially in a
high-dimensional setting
- …