7,944 research outputs found
Driver Gaze Region Estimation Without Using Eye Movement
Automated estimation of the allocation of a driver's visual attention may be
a critical component of future Advanced Driver Assistance Systems. In theory,
vision-based tracking of the eye can provide a good estimate of gaze location.
In practice, eye tracking from video is challenging because of sunglasses,
eyeglass reflections, lighting conditions, occlusions, motion blur, and other
factors. Estimation of head pose, on the other hand, is robust to many of these
effects, but cannot provide as fine-grained of a resolution in localizing the
gaze. However, for the purpose of keeping the driver safe, it is sufficient to
partition gaze into regions. In this effort, we propose a system that extracts
facial features and classifies their spatial configuration into six regions in
real-time. Our proposed method achieves an average accuracy of 91.4% at an
average decision rate of 11 Hz on a dataset of 50 drivers from an on-road
study.Comment: Accepted for Publication in IEEE Intelligent System
Learnable Triangulation of Human Pose
We present two novel solutions for multi-view 3D human pose estimation based
on new learnable triangulation methods that combine 3D information from
multiple 2D views. The first (baseline) solution is a basic differentiable
algebraic triangulation with an addition of confidence weights estimated from
the input images. The second solution is based on a novel method of volumetric
aggregation from intermediate 2D backbone feature maps. The aggregated volume
is then refined via 3D convolutions that produce final 3D joint heatmaps and
allow modelling a human pose prior. Crucially, both approaches are end-to-end
differentiable, which allows us to directly optimize the target metric. We
demonstrate transferability of the solutions across datasets and considerably
improve the multi-view state of the art on the Human3.6M dataset. Video
demonstration, annotations and additional materials will be posted on our
project page (https://saic-violet.github.io/learnable-triangulation).Comment: Project page: https://saic-violet.github.io/learnable-triangulatio
Anomaly-Sensitive Dictionary Learning for Unsupervised Diagnostics of Solid Media
This paper proposes a strategy for the detection and triangulation of
structural anomalies in solid media. The method revolves around the
construction of sparse representations of the medium's dynamic response,
obtained by learning instructive dictionaries which form a suitable basis for
the response data. The resulting sparse coding problem is recast as a modified
dictionary learning task with additional spatial sparsity constraints enforced
on the atoms of the learned dictionaries, which provides them with a prescribed
spatial topology that is designed to unveil anomalous regions in the physical
domain. The proposed methodology is model agnostic, i.e., it forsakes the need
for a physical model and requires virtually no a priori knowledge of the
structure's material properties, as all the inferences are exclusively informed
by the data through the layers of information that are available in the
intrinsic salient structure of the material's dynamic response. This
characteristic makes the approach powerful for anomaly identification in
systems with unknown or heterogeneous property distribution, for which a model
is unsuitable or unreliable. The method is validated using both syntheticallyComment: Submitted to the Proceedings of the Royal Society
Pixel-Level Alignment of Facial Images for High Accuracy Recognition Using Ensemble of Patches
The variation of pose, illumination and expression makes face recognition
still a challenging problem. As a pre-processing in holistic approaches, faces
are usually aligned by eyes. The proposed method tries to perform a pixel
alignment rather than eye-alignment by mapping the geometry of faces to a
reference face while keeping their own textures. The proposed geometry
alignment not only creates a meaningful correspondence among every pixel of all
faces, but also removes expression and pose variations effectively. The
geometry alignment is performed pixel-wise, i.e., every pixel of the face is
corresponded to a pixel of the reference face. In the proposed method, the
information of intensity and geometry of faces are separated properly, trained
by separate classifiers, and finally fused together to recognize human faces.
Experimental results show a great improvement using the proposed method in
comparison to eye-aligned recognition. For instance, at the false acceptance
rate of 0.001, the recognition rates are respectively improved by 24% and 33%
in Yale and AT&T datasets. In LFW dataset, which is a challenging big dataset,
improvement is 20% at FAR of 0.1.Comment: 11 pages, 16 figures, 1 table, key-words: face recognition, pixel
alignment, geometrical transformation, pose and expression variation,
ensemble of patches, fusion of texture and geometr
A Geometric View of Optimal Transportation and Generative Model
In this work, we show the intrinsic relations between optimal transportation
and convex geometry, especially the variational approach to solve Alexandrov
problem: constructing a convex polytope with prescribed face normals and
volumes. This leads to a geometric interpretation to generative models, and
leads to a novel framework for generative models. By using the optimal
transportation view of GAN model, we show that the discriminator computes the
Kantorovich potential, the generator calculates the transportation map. For a
large class of transportation costs, the Kantorovich potential can give the
optimal transportation map by a close-form formula. Therefore, it is sufficient
to solely optimize the discriminator. This shows the adversarial competition
can be avoided, and the computational architecture can be simplified.
Preliminary experimental results show the geometric method outperforms WGAN for
approximating probability measures with multiple clusters in low dimensional
space
Multi-camera Realtime 3D Tracking of Multiple Flying Animals
Automated tracking of animal movement allows analyses that would not
otherwise be possible by providing great quantities of data. The additional
capability of tracking in realtime - with minimal latency - opens up the
experimental possibility of manipulating sensory feedback, thus allowing
detailed explorations of the neural basis for control of behavior. Here we
describe a new system capable of tracking the position and body orientation of
animals such as flies and birds. The system operates with less than 40 msec
latency and can track multiple animals simultaneously. To achieve these
results, a multi target tracking algorithm was developed based on the Extended
Kalman Filter and the Nearest Neighbor Standard Filter data association
algorithm. In one implementation, an eleven camera system is capable of
tracking three flies simultaneously at 60 frames per second using a gigabit
network of nine standard Intel Pentium 4 and Core 2 Duo computers. This
manuscript presents the rationale and details of the algorithms employed and
shows three implementations of the system. An experiment was performed using
the tracking system to measure the effect of visual contrast on the flight
speed of Drosophila melanogaster. At low contrasts, speed is more variable and
faster on average than at high contrasts. Thus, the system is already a useful
tool to study the neurobiology and behavior of freely flying animals. If
combined with other techniques, such as `virtual reality'-type computer
graphics or genetic manipulation, the tracking system would offer a powerful
new way to investigate the biology of flying animals.Comment: pdfTeX using libpoppler 3.141592-1.40.3-2.2 (Web2C 7.5.6), 18 pages
with 9 figure
A Review on Facial Micro-Expressions Analysis: Datasets, Features and Metrics
Facial micro-expressions are very brief, spontaneous facial expressions that
appear on the face of humans when they either deliberately or unconsciously
conceal an emotion. Micro-expression has shorter duration than
macro-expression, which makes it more challenging for human and machine. Over
the past ten years, automatic micro-expressions recognition has attracted
increasing attention from researchers in psychology, computer science,
security, neuroscience and other related disciplines. The aim of this paper is
to provide the insights of automatic micro-expressions and recommendations for
future research. There has been a lot of datasets released over the last decade
that facilitated the rapid growth in this field. However, comparison across
different datasets is difficult due to the inconsistency in experiment
protocol, features used and evaluation methods. To address these issues, we
review the datasets, features and the performance metrics deployed in the
literature. Relevant challenges such as the spatial temporal settings during
data collection, emotional classes versus objective classes in data labelling,
face regions in data analysis, standardisation of metrics and the requirements
for real-world implementation are discussed. We conclude by proposing some
promising future directions to advancing micro-expressions research.Comment: Preprint submitted to IEEE Transaction
Improving Aviation Safety using Synthetic Vision System integrated with Eye-tracking Devices
By collecting the data of eyeball movement of pilots, it is possible to
monitor pilot's operation in the future flight in order to detect potential
accidents. In this paper, we designed a novel SVS system that is integrated
with an eye tracking device, and is able to achieve the following functions:1)
A novel method that is able to learn from the eyeball movements of pilots and
preload or render the terrain data in various resolutions, in order to improve
the quality of terrain display by comprehending the interested regions of the
pilot. 2) A warning mechanism that may detect the risky operation via analyzing
the aviation information from the SVS and the eyeball movement from the eye
tracking device, in order to prevent the maloperations or human factor
accidents. The user study and experiments show that the proposed
SVS-Eyetracking system works efficiently and is capable of avoiding potential
risked caused by fatigue in the flight simulation
Accurate and Robust Neural Networks for Security Related Applications Exampled by Face Morphing Attacks
Artificial neural networks tend to learn only what they need for a task. A
manipulation of the training data can counter this phenomenon. In this paper,
we study the effect of different alterations of the training data, which limit
the amount and position of information that is available for the decision
making. We analyze the accuracy and robustness against semantic and black box
attacks on the networks that were trained on different training data
modifications for the particular example of morphing attacks. A morphing attack
is an attack on a biometric facial recognition system where the system is
fooled to match two different individuals with the same synthetic face image.
Such a synthetic image can be created by aligning and blending images of the
two individuals that should be matched with this image.Comment: 16 pages, 7 figure
Driver Gaze Zone Estimation using Convolutional Neural Networks: A General Framework and Ablative Analysis
Driver gaze has been shown to be an excellent surrogate for driver attention
in intelligent vehicles. With the recent surge of highly autonomous vehicles,
driver gaze can be useful for determining the handoff time to a human driver.
While there has been significant improvement in personalized driver gaze zone
estimation systems, a generalized system which is invariant to different
subjects, perspectives and scales is still lacking. We take a step towards this
generalized system using Convolutional Neural Networks (CNNs). We finetune 4
popular CNN architectures for this task, and provide extensive comparisons of
their outputs. We additionally experiment with different input image patches,
and also examine how image size affects performance. For training and testing
the networks, we collect a large naturalistic driving dataset comprising of 11
long drives, driven by 10 subjects in two different cars. Our best performing
model achieves an accuracy of 95.18% during cross-subject testing,
outperforming current state of the art techniques for this task. Finally, we
evaluate our best performing model on the publicly available Columbia Gaze
Dataset comprising of images from 56 subjects with varying head pose and gaze
directions. Without any training, our model successfully encodes the different
gaze directions on this diverse dataset, demonstrating good generalization
capabilities
- …