325,223 research outputs found
PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume
We present a compact but effective CNN model for optical flow, called
PWC-Net. PWC-Net has been designed according to simple and well-established
principles: pyramidal processing, warping, and the use of a cost volume. Cast
in a learnable feature pyramid, PWC-Net uses the cur- rent optical flow
estimate to warp the CNN features of the second image. It then uses the warped
features and features of the first image to construct a cost volume, which is
processed by a CNN to estimate the optical flow. PWC-Net is 17 times smaller in
size and easier to train than the recent FlowNet2 model. Moreover, it
outperforms all published optical flow methods on the MPI Sintel final pass and
KITTI 2015 benchmarks, running at about 35 fps on Sintel resolution (1024x436)
images. Our models are available on https://github.com/NVlabs/PWC-Net.Comment: CVPR 2018 camera ready version (with github link to Caffe and PyTorch
code
STV-based Video Feature Processing for Action Recognition
In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end
Colour Constancy: Biologically-inspired Contrast Variant Pooling Mechanism
Pooling is a ubiquitous operation in image processing algorithms that allows
for higher-level processes to collect relevant low-level features from a region
of interest. Currently, max-pooling is one of the most commonly used operators
in the computational literature. However, it can lack robustness to outliers
due to the fact that it relies merely on the peak of a function. Pooling
mechanisms are also present in the primate visual cortex where neurons of
higher cortical areas pool signals from lower ones. The receptive fields of
these neurons have been shown to vary according to the contrast by aggregating
signals over a larger region in the presence of low contrast stimuli. We
hypothesise that this contrast-variant-pooling mechanism can address some of
the shortcomings of max-pooling. We modelled this contrast variation through a
histogram clipping in which the percentage of pooled signal is inversely
proportional to the local contrast of an image. We tested our hypothesis by
applying it to the phenomenon of colour constancy where a number of popular
algorithms utilise a max-pooling step (e.g. White-Patch, Grey-Edge and
Double-Opponency). For each of these methods, we investigated the consequences
of replacing their original max-pooling by the proposed
contrast-variant-pooling. Our experiments on three colour constancy benchmark
datasets suggest that previous results can significantly improve by adopting a
contrast-variant-pooling mechanism
A factorization approach to inertial affine structure from motion
We consider the problem of reconstructing a 3-D scene from a moving camera with high frame rate using the affine projection model. This problem is traditionally known as Affine Structure from Motion (Affine SfM), and can be solved using an elegant low-rank factorization formulation. In this paper, we assume that an accelerometer and gyro are rigidly mounted with the camera, so that synchronized linear acceleration and angular velocity measurements are available together with the image measurements. We extend the standard Affine SfM algorithm to integrate these measurements through the use of image derivatives
- …