53 research outputs found
Reducing Estimation Bias via Weighted Delayed Deep Deterministic Policy Gradient
The overestimation phenomenon caused by function approximation is a
well-known issue in value-based reinforcement learning algorithms such as deep
Q-networks and DDPG, which could lead to suboptimal policies. To address this
issue, TD3 takes the minimum value between a pair of critics, which introduces
underestimation bias. By unifying these two opposites, we propose a novel
Weighted Delayed Deep Deterministic Policy Gradient algorithm, which can reduce
the estimation error and further improve the performance by weighting a pair of
critics. We compare the learning process of value function between DDPG, TD3,
and our proposed algorithm, which verifies that our algorithm could indeed
eliminate the estimation error of value function. We evaluate our algorithm in
the OpenAI Gym continuous control tasks, outperforming the state-of-the-art
algorithms on every environment tested
Learning boosted asymmetric classifiers for object detection
http://ieeexplore.ieee.orgObject detection can be posted as those classification tasks where the rare positive patterns are to be distinguished from the enormous negative patterns. To avoid the danger of missing positive patterns, more attention should be payed on them. Therefore there should be different requirements for False Reject Rate (FRR) and False Accept Rate (FAR) , and learning a classifier should use an asymmetric factor to balance between FRR and FAR. In this paper, a normalized asymmetric classification error is proposed for the task of rejecting negative patterns. Minimizing it not only controls the ratio of FRR and FAR, but more importantly limits the upper-bound of FRR. The latter characteristic is advantageous for those tasks where there is a requirement for low FRR. Based on this normalized asymmetric classification error, we develop an asymmetric AdaBoost algorithm with variable asymmetric factor and apply it to the learning of cascade classifiers for face detection. Experiments demonstrate that the proposed method achieves less complex classifiers and better performance than some previous AdaBoost methods
Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning
Deep reinforcement learning (DRL) gives the promise that an agent learns good
policy from high-dimensional information, whereas representation learning
removes irrelevant and redundant information and retains pertinent information.
In this work, we demonstrate that the learned representation of the -network
and its target -network should, in theory, satisfy a favorable
distinguishable representation property. Specifically, there exists an upper
bound on the representation similarity of the value functions of two adjacent
time steps in a typical DRL setting. However, through illustrative experiments,
we show that the learned DRL agent may violate this property and lead to a
sub-optimal policy. Therefore, we propose a simple yet effective regularizer
called Policy Evaluation with Easy Regularization on Representation (PEER),
which aims to maintain the distinguishable representation property via explicit
regularization on internal representations. And we provide the convergence rate
guarantee of PEER. Implementing PEER requires only one line of code. Our
experiments demonstrate that incorporating PEER into DRL can significantly
improve performance and sample efficiency. Comprehensive experiments show that
PEER achieves state-of-the-art performance on all 4 environments on PyBullet, 9
out of 12 tasks on DMControl, and 19 out of 26 games on Atari. To the best of
our knowledge, PEER is the first work to study the inherent representation
property of Q-network and its target. Our code is available at
https://sites.google.com/view/peer-cvpr2023/.Comment: Accepted to CVPR23. Website:
https://sites.google.com/view/peer-cvpr2023
Recover Triggered States: Protect Model Against Backdoor Attack in Reinforcement Learning
A backdoor attack allows a malicious user to manipulate the environment or
corrupt the training data, thus inserting a backdoor into the trained agent.
Such attacks compromise the RL system's reliability, leading to potentially
catastrophic results in various key fields. In contrast, relatively limited
research has investigated effective defenses against backdoor attacks in RL.
This paper proposes the Recovery Triggered States (RTS) method, a novel
approach that effectively protects the victim agents from backdoor attacks. RTS
involves building a surrogate network to approximate the dynamics model.
Developers can then recover the environment from the triggered state to a clean
state, thereby preventing attackers from activating backdoors hidden in the
agent by presenting the trigger. When training the surrogate to predict states,
we incorporate agent action information to reduce the discrepancy between the
actions taken by the agent on predicted states and the actions taken on real
states. RTS is the first approach to defend against backdoor attacks in a
single-agent setting. Our results show that using RTS, the cumulative reward
only decreased by 1.41% under the backdoor attack
Traffic sign detection using a cascade method with fast feature extraction and saliency test
Automatic traffic sign detection is challenging due to the complexity of scene images, and fast detection is required in real applications such as driver assistance systems. In this paper, we propose a fast traffic sign detection method based on a cascade method with saliency test and neighboring scale awareness. In the cascade method, feature maps of several channels are extracted efficiently using approximation techniques. Sliding windows are pruned hierarchically using coarse-to-fine classifiers and the correlation between neighboring scales. The cascade system has only one free parameter, while the multiple thresholds are selected by a data-driven approach. To further increase speed, we also use a novel saliency test based on mid-level features to pre-prune background windows. Experiments on two public traffic sign data sets show that the proposed method achieves competing performance and runs 27 times as fast as most of the state-of-the-art methods
Unsupervised Domain Adaptation GAN Inversion for Image Editing
Existing GAN inversion methods work brilliantly for high-quality image
reconstruction and editing while struggling with finding the corresponding
high-quality images for low-quality inputs. Therefore, recent works are
directed toward leveraging the supervision of paired high-quality and
low-quality images for inversion. However, these methods are infeasible in
real-world scenarios and further hinder performance improvement. In this paper,
we resolve this problem by introducing Unsupervised Domain Adaptation (UDA)
into the Inversion process, namely UDA-Inversion, for both high-quality and
low-quality image inversion and editing. Particularly, UDA-Inversion first
regards the high-quality and low-quality images as the source domain and
unlabeled target domain, respectively. Then, a discrepancy function is
presented to measure the difference between two domains, after which we
minimize the source error and the discrepancy between the distributions of two
domains in the latent space to obtain accurate latent codes for low-quality
images. Without direct supervision, constructive representations of
high-quality images can be spontaneously learned and transformed into
low-quality images based on unsupervised domain adaptation. Experimental
results indicate that UDA-inversion is the first that achieves a comparable
level of performance with supervised methods in low-quality images across
multiple domain datasets. We hope this work provides a unique inspiration for
latent embedding distributions in image process tasks
- …