9,381 research outputs found
Online Video Deblurring via Dynamic Temporal Blending Network
State-of-the-art video deblurring methods are capable of removing non-uniform
blur caused by unwanted camera shake and/or object motion in dynamic scenes.
However, most existing methods are based on batch processing and thus need
access to all recorded frames, rendering them computationally demanding and
time consuming and thus limiting their practical use. In contrast, we propose
an online (sequential) video deblurring method based on a spatio-temporal
recurrent network that allows for real-time performance. In particular, we
introduce a novel architecture which extends the receptive field while keeping
the overall size of the network small to enable fast execution. In doing so,
our network is able to remove even large blur caused by strong camera shake
and/or fast moving objects. Furthermore, we propose a novel network layer that
enforces temporal consistency between consecutive frames by dynamic temporal
blending which compares and adaptively (at test time) shares features obtained
at different time steps. We show the superiority of the proposed method in an
extensive experimental evaluation.Comment: 10 page
V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
Most of the existing deep learning-based methods for 3D hand and human pose
estimation from a single depth map are based on a common framework that takes a
2D depth map and directly regresses the 3D coordinates of keypoints, such as
hand or human body joints, via 2D convolutional neural networks (CNNs). The
first weakness of this approach is the presence of perspective distortion in
the 2D depth map. While the depth map is intrinsically 3D data, many previous
methods treat depth maps as 2D images that can distort the shape of the actual
object through projection from 3D to 2D space. This compels the network to
perform perspective distortion-invariant estimation. The second weakness of the
conventional approach is that directly regressing 3D coordinates from a 2D
image is a highly non-linear mapping, which causes difficulty in the learning
procedure. To overcome these weaknesses, we firstly cast the 3D hand and human
pose estimation problem from a single depth map into a voxel-to-voxel
prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood
for each keypoint. We design our model as a 3D CNN that provides accurate
estimates while running in real-time. Our system outperforms previous methods
in almost all publicly available 3D hand and human pose estimation datasets and
placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge.
The code is available in https://github.com/mks0601/V2V-PoseNet_RELEASE.Comment: HANDS 2017 Challenge Frame-based 3D Hand Pose Estimation Winner (ICCV
2017), Published at CVPR 201
Multi-Person Tracking Based on Faster R-CNN and Deep Appearance Features
Mostly computer vision problems related to crowd analytics are highly dependent upon multi-object tracking (MOT) systems. There are two major steps involved in the design of MOT system: object detection and association. In the first step, desired objects are detected in every frame of video stream. Detection quality directly influences the performance of tracking. The second step involves the correspondence of detected objects in current frame with the previous to obtain their trajectories. High accuracy in object detection system results in less number of missing detection and finally produces less fragmented tracks. Better object association increases the affinity between objects in different frames. This paper presents a novel algorithm for improved object detection followed by enhanced object tracking. Object detection accuracy has been increased by employing deep learning-based Faster region convolutional neural network (Faster R-CNN) algorithm. Object association is carried out by using appearance and improved motion features. Evaluation results show that we have enhanced the performance of current state-of-the-art work by reducing identity switches and fragmentation
- …