6,863 research outputs found
Learning to Refine Human Pose Estimation
Multi-person pose estimation in images and videos is an important yet
challenging task with many applications. Despite the large improvements in
human pose estimation enabled by the development of convolutional neural
networks, there still exist a lot of difficult cases where even the
state-of-the-art models fail to correctly localize all body joints. This
motivates the need for an additional refinement step that addresses these
challenging cases and can be easily applied on top of any existing method. In
this work, we introduce a pose refinement network (PoseRefiner) which takes as
input both the image and a given pose estimate and learns to directly predict a
refined pose by jointly reasoning about the input-output space. In order for
the network to learn to refine incorrect body joint predictions, we employ a
novel data augmentation scheme for training, where we model "hard" human pose
cases. We evaluate our approach on four popular large-scale pose estimation
benchmarks such as MPII Single- and Multi-Person Pose Estimation, PoseTrack
Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement
over the state of the art.Comment: To appear in CVPRW (2018). Workshop: Visual Understanding of Humans
in Crowd Scene and the 2nd Look Into Person Challenge (VUHCS-LIP
Msb r‐cnn: A multi‐stage balanced defect detection network
Deep learning networks are applied for defect detection, among which Cascade R‐CNN is a multi‐stage object detection network and is state of the art in terms of accuracy and efficiency. However, it is still a challenge for Cascade R‐CNN to deal with complex and diverse defects, as the widely varied shapes of defects lead to inefficiency for the traditional convolution filter to extract features. Additionally, the imbalance in features, losses and samples cause lower accuracy. To address the above challenges, this paper proposes a multi‐stage balanced R‐CNN (MSB R‐CNN) for defect detection based on Cascade R‐CNN. Firstly, deformable convolution is adopted in different stages of the backbone network to improve its adaptability to the varying shapes of the defect. Then, the features obtained by the backbone network are refined and enhanced by the balanced feature pyramid. To overcome the imbalance of classification and regression loss, the balanced L1 loss is applied at different stages to correct it. Finally, for the sample selection, the interaction of union (IoU) balanced sampler and the online hard example mining (OHEM) sampler are combined at different stages to make the sampling more reasonable, which can bring a better accuracy and convergence effect to the model. The results of our experiments on the DAGM2007 dataset has shown that our network (MSB R‐CNN) can achieve a mean average precision (mAP) of 67.5%, an increase of 1.5% mAP, compared to Cascade R‐CNN
- …