165,659 research outputs found
Multi-modal Face Pose Estimation with Multi-task Manifold Deep Learning
Human face pose estimation aims at estimating the gazing direction or head
postures with 2D images. It gives some very important information such as
communicative gestures, saliency detection and so on, which attracts plenty of
attention recently. However, it is challenging because of complex background,
various orientations and face appearance visibility. Therefore, a descriptive
representation of face images and mapping it to poses are critical. In this
paper, we make use of multi-modal data and propose a novel face pose estimation
method that uses a novel deep learning framework named Multi-task Manifold Deep
Learning . It is based on feature extraction with improved deep neural
networks and multi-modal mapping relationship with multi-task learning. In the
proposed deep learning based framework, Manifold Regularized Convolutional
Layers (MRCL) improve traditional convolutional layers by learning the
relationship among outputs of neurons. Besides, in the proposed mapping
relationship learning method, different modals of face representations are
naturally combined to learn the mapping function from face images to poses. In
this way, the computed mapping model with multiple tasks is improved.
Experimental results on three challenging benchmark datasets DPOSE, HPID and
BKHPD demonstrate the outstanding performance of
Improved Method for Individualization of Head-Related Transfer Functions on Horizontal Plane Using Reduced Number of Anthropometric Measurements
An important problem to be solved in modeling head-related impulse responses
(HRIRs) is how to individualize HRIRs so that they are suitable for a listener.
We modeled the entire magnitude head-related transfer functions (HRTFs), in
frequency domain, for sound sources on horizontal plane of 37 subjects using
principal components analysis (PCA). The individual magnitude HRTFs could be
modeled adequately well by a linear combination of only ten orthonormal basis
functions. The goal of this research was to establish multiple linear
regression (MLR) between weights of basis functions obtained from PCA and fewer
anthropometric measurements in order to individualize a given listener's HRTFs
with his or her own anthropomety. We proposed here an improved
individualization method based on MLR of weights of basis functions by
utilizing 8 chosen out of 27 anthropometric measurements. Our objective
experiments' results show a superior performance than that of our previous work
on individualizing minimum phase HRIRs and also better than similar research.
The proposed individualization method shows that the individualized magnitude
HRTFs could approximated well the the original ones with small error. Moving
sound employing the reconstructed HRIRs could be perceived as if it was moving
around the horizontal plane
Light-weight Head Pose Invariant Gaze Tracking
Unconstrained remote gaze tracking using off-the-shelf cameras is a
challenging problem. Recently, promising algorithms for appearance-based gaze
estimation using convolutional neural networks (CNN) have been proposed.
Improving their robustness to various confounding factors including variable
head pose, subject identity, illumination and image quality remain open
problems. In this work, we study the effect of variable head pose on machine
learning regressors trained to estimate gaze direction. We propose a novel
branched CNN architecture that improves the robustness of gaze classifiers to
variable head pose, without increasing computational cost. We also present
various procedures to effectively train our gaze network including transfer
learning from the more closely related task of object viewpoint estimation and
from a large high-fidelity synthetic gaze dataset, which enable our ten times
faster gaze network to achieve competitive accuracy to its current
state-of-the-art direct competitor.Comment: 9 pages, IEEE Conference on Computer Vision and Pattern Recognition
Worksho
Dynamics of Driver's Gaze: Explorations in Behavior Modeling & Maneuver Prediction
The study and modeling of driver's gaze dynamics is important because, if and
how the driver is monitoring the driving environment is vital for driver
assistance in manual mode, for take-over requests in highly automated mode and
for semantic perception of the surround in fully autonomous mode. We developed
a machine vision based framework to classify driver's gaze into context rich
zones of interest and model driver's gaze behavior by representing gaze
dynamics over a time period using gaze accumulation, glance duration and glance
frequencies. As a use case, we explore the driver's gaze dynamic patterns
during maneuvers executed in freeway driving, namely, left lane change
maneuver, right lane change maneuver and lane keeping. It is shown that
condensing gaze dynamics into durations and frequencies leads to recurring
patterns based on driver activities. Furthermore, modeling these patterns show
predictive powers in maneuver detection up to a few hundred milliseconds a
priori
CNNs based Viewpoint Estimation for Volume Visualization
Viewpoint estimation from 2D rendered images is helpful in understanding how
users select viewpoints for volume visualization and guiding users to select
better viewpoints based on previous visualizations. In this paper, we propose a
viewpoint estimation method based on Convolutional Neural Networks (CNNs) for
volume visualization. We first design an overfit-resistant image rendering
pipeline to generate the training images with accurate viewpoint annotations,
and then train a category-specific viewpoint classification network to estimate
the viewpoint for the given rendered image. Our method can achieve good
performance on images rendered with different transfer functions and rendering
parameters in several categories. We apply our model to recover the viewpoints
of the rendered images in publications, and show how experts look at volumes.
We also introduce a CNN feature-based image similarity measure for similarity
voting based viewpoint selection, which can suggest semantically meaningful
optimal viewpoints for different volumes and transfer functions
Recommended from our members
EyeGAN: Gaze-Preserving, Mask-Mediated Eye Image Synthesis
Automatic synthesis of realistic eye images with pre- scribed gaze direction is important for multiple application domains. We introduce EyeGAN, an algorithm to generate eye images in the style of a desired target domain, that in- herit annotations available in images from a source domain. EyeGAN takes in input ternary masks, which are used as domain-independent proxies for gaze direction. We eval- uate EyeGAN against competing eye image synthesis al- gorithms by measuring a specific gaze consistency index. In addition, we present results from multiple experiments (involving eye region segmentation, pupil localization, and gaze direction estimation) showing that the use of EyeGAN- generated images with inherited annotations for network training leads to superior performances compared to other domain transfer algorithms
Interactive Sound Rendering on Mobile Devices using Ray-Parameterized Reverberation Filters
We present a new sound rendering pipeline that is able to generate plausible
sound propagation effects for interactive dynamic scenes. Our approach combines
ray-tracing-based sound propagation with reverberation filters using robust
automatic reverb parameter estimation that is driven by impulse responses
computed at a low sampling rate.We propose a unified spherical harmonic
representation of directional sound in both the propagation and auralization
modules and use this formulation to perform a constant number of convolution
operations for any number of sound sources while rendering spatial audio. In
comparison to previous geometric acoustic methods, we achieve a speedup of over
an order of magnitude while delivering similar audio to high-quality
convolution rendering algorithms. As a result, our approach is the first
capable of rendering plausible dynamic sound propagation effects on commodity
smartphones
Binaural LCMV Beamforming with Partial Noise Estimation
Besides reducing undesired sources (interfering sources and background
noise), another important objective of a binaural beamforming algorithm is to
preserve the spatial impression of the acoustic scene, which can be achieved by
preserving the binaural cues of all sound sources. While the binaural minimum
variance distortionless response (BMVDR) beamformer provides a good noise
reduction performance and preserves the binaural cues of the desired source, it
does not allow to control the reduction of the interfering sources and distorts
the binaural cues of the interfering sources and the background noise. Hence,
several extensions have been proposed. First, the binaural linearly constrained
minimum variance (BLCMV) beamformer uses additional constraints, enabling to
control the reduction of the interfering sources while preserving their
binaural cues. Second, the BMVDR with partial noise estimation (BMVDR-N) mixes
the output signals of the BMVDR with the noisy reference microphone signals,
enabling to control the binaural cues of the background noise. Merging the
advantages of both extensions, in this paper we propose the BLCMV with partial
noise estimation (BLCMV-N). We show that the output signals of the BLCMV-N can
be interpreted as a mixture of the noisy reference microphone signals and the
output signals of a BLCMV using an adjusted interference scaling parameter. We
provide a theoretical comparison between the BMVDR, the BLCMV, the BMVDR-N and
the proposed BLCMV-N in terms of noise and interference reduction performance
and binaural cue preservation. Experimental results using recorded signals as
well as the results of a perceptual listening test show that the BLCMV-N is
able to preserve the binaural cues of an interfering source (like the BLCMV),
while enabling to trade off between noise reduction performance and binaural
cue preservation of the background noise (like the BMVDR-N).Comment: submitted to IEEE/ACM Transactions on Audio, Speech, and Language
Processin
A Two-Layer Local Constrained Sparse Coding Method for Fine-Grained Visual Categorization
Fine-grained categories are more difficulty distinguished than generic
categories due to the similarity of inter-class and the diversity of
intra-class. Therefore, the fine-grained visual categorization (FGVC) is
considered as one of challenge problems in computer vision recently. A new
feature learning framework, which is based on a two-layer local constrained
sparse coding architecture, is proposed in this paper. The two-layer
architecture is introduced for learning intermediate-level features, and the
local constrained term is applied to guarantee the local smooth of coding
coefficients. For extracting more discriminative information, local orientation
histograms are the input of sparse coding instead of raw pixels. Moreover, a
quick dictionary updating process is derived to further improve the training
speed. Two experimental results show that our method achieves 85.29% accuracy
on the Oxford 102 flowers dataset and 67.8% accuracy on the CUB-200-2011 bird
dataset, and the performance of our framework is highly competitive with
existing literatures.Comment: 19 pages, 12 figures, 8 table
Pain Intensity Estimation by a Self--Taught Selection of Histograms of Topographical Features
Pain assessment through observational pain scales is necessary for special
categories of patients such as neonates, patients with dementia, critically ill
patients, etc. The recently introduced Prkachin-Solomon score allows pain
assessment directly from facial images opening the path for multiple assistive
applications. In this paper, we introduce the Histograms of Topographical (HoT)
features, which are a generalization of the topographical primal sketch, for
the description of the face parts contributing to the mentioned score. We
propose a semi-supervised, clustering oriented self--taught learning procedure
developed on the emotion oriented Cohn-Kanade database. We use this procedure
to improve the discrimination between different pain intensity levels and the
generalization with respect to the monitored persons, while testing on the UNBC
McMaster Shoulder Pain database
- …