32,229 research outputs found
LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning
We present a novel procedural framework to generate an arbitrary number of
labeled crowd videos (LCrowdV). The resulting crowd video datasets are used to
design accurate algorithms or training models for crowded scene understanding.
Our overall approach is composed of two components: a procedural simulation
framework for generating crowd movements and behaviors, and a procedural
rendering framework to generate different videos or images. Each video or image
is automatically labeled based on the environment, number of pedestrians,
density, behavior, flow, lighting conditions, viewpoint, noise, etc.
Furthermore, we can increase the realism by combining synthetically-generated
behaviors with real-world background videos. We demonstrate the benefits of
LCrowdV over prior lableled crowd datasets by improving the accuracy of
pedestrian detection and crowd behavior classification algorithms. LCrowdV
would be released on the WWW
Learning to Personalize in Appearance-Based Gaze Tracking
Personal variations severely limit the performance of appearance-based gaze
tracking. Adapting to these variations using standard neural network model
adaptation methods is difficult. The problems range from overfitting, due to
small amounts of training data, to underfitting, due to restrictive model
architectures. We tackle these problems by introducing the SPatial Adaptive
GaZe Estimator (SPAZE). By modeling personal variations as a low-dimensional
latent parameter space, SPAZE provides just enough adaptability to capture the
range of personal variations without being prone to overfitting. Calibrating
SPAZE for a new person reduces to solving a small optimization problem. SPAZE
achieves an error of 2.70 degrees with 9 calibration samples on MPIIGaze,
improving on the state-of-the-art by 14 %. We contribute to gaze tracking
research by empirically showing that personal variations are well-modeled as a
3-dimensional latent parameter space for each eye. We show that this
low-dimensionality is expected by examining model-based approaches to gaze
tracking. We also show that accurate head pose-free gaze tracking is possible
A LiDAR Point Cloud Generator: from a Virtual World to Autonomous Driving
3D LiDAR scanners are playing an increasingly important role in autonomous
driving as they can generate depth information of the environment. However,
creating large 3D LiDAR point cloud datasets with point-level labels requires a
significant amount of manual annotation. This jeopardizes the efficient
development of supervised deep learning algorithms which are often data-hungry.
We present a framework to rapidly create point clouds with accurate point-level
labels from a computer game. The framework supports data collection from both
auto-driving scenes and user-configured scenes. Point clouds from auto-driving
scenes can be used as training data for deep learning algorithms, while point
clouds from user-configured scenes can be used to systematically test the
vulnerability of a neural network, and use the falsifying examples to make the
neural network more robust through retraining. In addition, the scene images
can be captured simultaneously in order for sensor fusion tasks, with a method
proposed to do automatic calibration between the point clouds and captured
scene images. We show a significant improvement in accuracy (+9%) in point
cloud segmentation by augmenting the training dataset with the generated
synthesized data. Our experiments also show by testing and retraining the
network using point clouds from user-configured scenes, the weakness/blind
spots of the neural network can be fixed
Multi-view passive 3D face acquisition device
Approaches to acquisition of 3D facial data include laser scanners, structured
light devices and (passive) stereo vision. The laser scanner and structured light
methods allow accurate reconstruction of the 3D surface but strong light is projected
on the faces of subjects. Passive stereo vision based approaches do not require strong
light to be projected, however, it is hard to obtain comparable accuracy and robustness
of the surface reconstruction. In this paper a passive multiple view approach using
5 cameras in a ’+’ configuration is proposed that significantly increases robustness
and accuracy relative to traditional stereo vision approaches. The normalised cross
correlations of all 5 views are combined using direct projection of points instead of
the traditionally used rectified images. Also, errors caused by different perspective
deformation of the surface in the different views are reduced by using an iterative reconstruction
technique where the depth estimation of the previous iteration is used to
warp the windows of the normalised cross correlation for the different views
Multi-Scale 3D Scene Flow from Binocular Stereo Sequences
Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene flow estimation that provides reliable results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than previous methods allow. To handle the aperture problems inherent in the estimation of optical flow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.National Science Foundation (CNS-0202067, IIS-0208876); Office of Naval Research (N00014-03-1-0108
- …