4 research outputs found
Adaptive detrending to accelerate convolutional gated recurrent unit training for contextual video recognition
Video image recognition has been extensively studied with rapid progress recently. However, most methods focus on short-term rather than long-term (contextual) video recognition. Convolutional recurrent neural networks (ConvRNNs) provide robust spatio-temporal information processing capabilities for contextual video recognition, but require extensive computation that slows down training. Inspired by normalization and detrending methods, in this paper we propose "adaptive detrending" (AD) for temporal normalization in order to accelerate the training of ConvRNNs, especially of convolutional gated recurrent unit (ConvGRU). For each neuron in a recurrent neural network (RNN), AD identifies the trending change within a sequence and subtracts it, removing the internal covariate shift. In experiments testing for contextual video recognition with ConvGRU, results show that (1) ConvGRU clearly outperforms feed-forward neural networks, (2) AD consistently and significantly accelerates training and improves generalization, (3) performance is further improved when AD is coupled with other normalization methods, and most importantly, (4) the more long-term contextual information is required, the more AD outperforms existing methods
Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions
We consider local kernel metric learning for off-policy evaluation (OPE) of
deterministic policies in contextual bandits with continuous action spaces. Our
work is motivated by practical scenarios where the target policy needs to be
deterministic due to domain requirements, such as prescription of treatment
dosage and duration in medicine. Although importance sampling (IS) provides a
basic principle for OPE, it is ill-posed for the deterministic target policy
with continuous actions. Our main idea is to relax the target policy and pose
the problem as kernel-based estimation, where we learn the kernel metric in
order to minimize the overall mean squared error (MSE). We present an analytic
solution for the optimal metric, based on the analysis of bias and variance.
Whereas prior work has been limited to scalar action spaces or kernel bandwidth
selection, our work takes a step further being capable of vector action spaces
and metric optimization. We show that our estimator is consistent, and
significantly reduces the MSE compared to baseline OPE methods through
experiments on various domains