16,779 research outputs found
Scale Invariant Interest Points with Shearlets
Shearlets are a relatively new directional multi-scale framework for signal
analysis, which have been shown effective to enhance signal discontinuities
such as edges and corners at multiple scales. In this work we address the
problem of detecting and describing blob-like features in the shearlets
framework. We derive a measure which is very effective for blob detection and
closely related to the Laplacian of Gaussian. We demonstrate the measure
satisfies the perfect scale invariance property in the continuous case. In the
discrete setting, we derive algorithms for blob detection and keypoint
description. Finally, we provide qualitative justifications of our findings as
well as a quantitative evaluation on benchmark data. We also report an
experimental evidence that our method is very suitable to deal with compressed
and noisy images, thanks to the sparsity property of shearlets
Analyzing Input and Output Representations for Speech-Driven Gesture Generation
This paper presents a novel framework for automatic speech-driven gesture
generation, applicable to human-agent interaction including both virtual agents
and robots. Specifically, we extend recent deep-learning-based, data-driven
methods for speech-driven gesture generation by incorporating representation
learning. Our model takes speech as input and produces gestures as output, in
the form of a sequence of 3D coordinates. Our approach consists of two steps.
First, we learn a lower-dimensional representation of human motion using a
denoising autoencoder neural network, consisting of a motion encoder MotionE
and a motion decoder MotionD. The learned representation preserves the most
important aspects of the human pose variation while removing less relevant
variation. Second, we train a novel encoder network SpeechE to map from speech
to a corresponding motion representation with reduced dimensionality. At test
time, the speech encoder and the motion decoder networks are combined: SpeechE
predicts motion representations based on a given speech signal and MotionD then
decodes these representations to produce motion sequences. We evaluate
different representation sizes in order to find the most effective
dimensionality for the representation. We also evaluate the effects of using
different speech features as input to the model. We find that mel-frequency
cepstral coefficients (MFCCs), alone or combined with prosodic features,
perform the best. The results of a subsequent user study confirm the benefits
of the representation learning.Comment: Accepted at IVA '19. Shorter version published at AAMAS '19. The code
is available at
https://github.com/GestureGeneration/Speech_driven_gesture_generation_with_autoencode
No-reference image quality assessment through the von Mises distribution
An innovative way of calculating the von Mises distribution (VMD) of image
entropy is introduced in this paper. The VMD's concentration parameter and some
fitness parameter that will be later defined, have been analyzed in the
experimental part for determining their suitability as a image quality
assessment measure in some particular distortions such as Gaussian blur or
additive Gaussian noise. To achieve such measure, the local R\'{e}nyi entropy
is calculated in four equally spaced orientations and used to determine the
parameters of the von Mises distribution of the image entropy. Considering
contextual images, experimental results after applying this model show that the
best-in-focus noise-free images are associated with the highest values for the
von Mises distribution concentration parameter and the highest approximation of
image data to the von Mises distribution model. Our defined von Misses fitness
parameter experimentally appears also as a suitable no-reference image quality
assessment indicator for no-contextual images.Comment: 29 pages, 11 figure
Multispectral Palmprint Encoding and Recognition
Palmprints are emerging as a new entity in multi-modal biometrics for human
identification and verification. Multispectral palmprint images captured in the
visible and infrared spectrum not only contain the wrinkles and ridge structure
of a palm, but also the underlying pattern of veins; making them a highly
discriminating biometric identifier. In this paper, we propose a feature
encoding scheme for robust and highly accurate representation and matching of
multispectral palmprints. To facilitate compact storage of the feature, we
design a binary hash table structure that allows for efficient matching in
large databases. Comprehensive experiments for both identification and
verification scenarios are performed on two public datasets -- one captured
with a contact-based sensor (PolyU dataset), and the other with a contact-free
sensor (CASIA dataset). Recognition results in various experimental setups show
that the proposed method consistently outperforms existing state-of-the-art
methods. Error rates achieved by our method (0.003% on PolyU and 0.2% on CASIA)
are the lowest reported in literature on both dataset and clearly indicate the
viability of palmprint as a reliable and promising biometric. All source codes
are publicly available.Comment: Preliminary version of this manuscript was published in ICCV 2011. Z.
Khan A. Mian and Y. Hu, "Contour Code: Robust and Efficient Multispectral
Palmprint Encoding for Human Recognition", International Conference on
Computer Vision, 2011. MATLAB Code available:
https://sites.google.com/site/zohaibnet/Home/code
Time-causal and time-recursive spatio-temporal receptive fields
We present an improved model and theory for time-causal and time-recursive
spatio-temporal receptive fields, based on a combination of Gaussian receptive
fields over the spatial domain and first-order integrators or equivalently
truncated exponential filters coupled in cascade over the temporal domain.
Compared to previous spatio-temporal scale-space formulations in terms of
non-enhancement of local extrema or scale invariance, these receptive fields
are based on different scale-space axiomatics over time by ensuring
non-creation of new local extrema or zero-crossings with increasing temporal
scale. Specifically, extensions are presented about (i) parameterizing the
intermediate temporal scale levels, (ii) analysing the resulting temporal
dynamics, (iii) transferring the theory to a discrete implementation, (iv)
computing scale-normalized spatio-temporal derivative expressions for
spatio-temporal feature detection and (v) computational modelling of receptive
fields in the lateral geniculate nucleus (LGN) and the primary visual cortex
(V1) in biological vision.
We show that by distributing the intermediate temporal scale levels according
to a logarithmic distribution, we obtain much faster temporal response
properties (shorter temporal delays) compared to a uniform distribution.
Specifically, these kernels converge very rapidly to a limit kernel possessing
true self-similar scale-invariant properties over temporal scales, thereby
allowing for true scale invariance over variations in the temporal scale,
although the underlying temporal scale-space representation is based on a
discretized temporal scale parameter.
We show how scale-normalized temporal derivatives can be defined for these
time-causal scale-space kernels and how the composed theory can be used for
computing basic types of scale-normalized spatio-temporal derivative
expressions in a computationally efficient manner.Comment: 39 pages, 12 figures, 5 tables in Journal of Mathematical Imaging and
Vision, published online Dec 201
Einstein equations in the null quasi-spherical gauge III: numerical algorithms
We describe numerical techniques used in the construction of our 4th order
evolution for the full Einstein equations, and assess the accuracy of
representative solutions. The code is based on a null gauge with a
quasi-spherical radial coordinate, and simulates the interaction of a single
black hole with gravitational radiation. Techniques used include spherical
harmonic representations, convolution spline interpolation and filtering, and
an RK4 "method of lines" evolution. For sample initial data of "intermediate"
size (gravitational field with 19% of the black hole mass), the code is
accurate to 1 part in 10^5, until null time z=55 when the coordinate condition
breaks down.Comment: Latex, 38 pages, 29 figures (360Kb compressed
Acoustic waves: should they be propagated forward in time, or forward in space?
The evolution of acoustic waves can be evaluated in two ways: either as a
temporal, or a spatial propagation. Propagating in space provides the
considerable advantage of being able to handle dispersion and propagation
across interfaces with remarkable efficiency; but propagating in time is more
physical and gives correctly behaved reflections and scattering without effort.
Which should be chosen in a given situation, and what compromises might have to
be made? Here the natural behaviors of each choice of propagation are compared
and contrasted for an ordinary second order wave equation, the time-dependent
diffusion wave equation, an elastic rod wave equation, and the Stokes'/ van
Wijngaarden's equations, each case illuminating a characteristic feature of the
technique. Either choice of propagation axis enables a partitioning the wave
equation that gives rise to a directional factorization based on a natural
"reference" dispersion relation. The resulting exact coupled bidirectional
equations then reduce to a single unidirectional first-order wave equation
using a simple "slow evolution" assumption that minimizes effect of subsequent
approximations, while allowing a direct term-to-term comparison between exact
and approximate theories.Comment: 12 pages, v2 correcte
3D weak lensing with spin wavelets on the ball
We construct the spin flaglet transform, a wavelet transform to analyze spin
signals in three dimensions. Spin flaglets can probe signal content localized
simultaneously in space and frequency and, moreover, are separable so that
their angular and radial properties can be controlled independently. They are
particularly suited to analyzing of cosmological observations such as the weak
gravitational lensing of galaxies. Such observations have a unique 3D
geometrical setting since they are natively made on the sky, have spin angular
symmetries, and are extended in the radial direction by additional distance or
redshift information. Flaglets are constructed in the harmonic space defined by
the Fourier-Laguerre transform, previously defined for scalar functions and
extended here to signals with spin symmetries. Thanks to various sampling
theorems, both the Fourier-Laguerre and flaglet transforms are theoretically
exact when applied to bandlimited signals. In other words, in numerical
computations the only loss of information is due to the finite representation
of floating point numbers. We develop a 3D framework relating the weak lensing
power spectrum to covariances of flaglet coefficients. We suggest that the
resulting novel flaglet weak lensing estimator offers a powerful alternative to
common 2D and 3D approaches to accurately capture cosmological information.
While standard weak lensing analyses focus on either real or harmonic space
representations (i.e., correlation functions or Fourier-Bessel power spectra,
respectively), a wavelet approach inherits the advantages of both techniques,
where both complicated sky coverage and uncertainties associated with the
physical modeling of small scales can be handled effectively. Our codes to
compute the Fourier-Laguerre and flaglet transforms are made publicly
available.Comment: 24 pages, 4 figures, version accepted for publication in PR
- …