Search CORE

16,779 research outputs found

Scale Invariant Interest Points with Shearlets

Author: De Vito Ernesto
Duval-Poo Miguel A.
Noceti Nicoletta
Odone Francesca
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/07/2016
Field of study

Shearlets are a relatively new directional multi-scale framework for signal analysis, which have been shown effective to enhance signal discontinuities such as edges and corners at multiple scales. In this work we address the problem of detecting and describing blob-like features in the shearlets framework. We derive a measure which is very effective for blob detection and closely related to the Laplacian of Gaussian. We demonstrate the measure satisfies the perfect scale invariance property in the continuous case. In the discrete setting, we derive algorithms for blob detection and keypoint description. Finally, we provide qualitative justifications of our findings as well as a quantitative evaluation on benchmark data. We also report an experimental evidence that our method is very suitable to deal with compressed and noisy images, thanks to the sparsity property of shearlets

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Genova

Analyzing Input and Output Representations for Speech-Driven Gesture Generation

Author: Boersma Paul
Bütepage Judith
Kingma Diederik P
Kucherenko Taras
Matsumoto David
Pavllo Dario
Zhou Yi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/06/2019
Field of study

This paper presents a novel framework for automatic speech-driven gesture generation, applicable to human-agent interaction including both virtual agents and robots. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates. Our approach consists of two steps. First, we learn a lower-dimensional representation of human motion using a denoising autoencoder neural network, consisting of a motion encoder MotionE and a motion decoder MotionD. The learned representation preserves the most important aspects of the human pose variation while removing less relevant variation. Second, we train a novel encoder network SpeechE to map from speech to a corresponding motion representation with reduced dimensionality. At test time, the speech encoder and the motion decoder networks are combined: SpeechE predicts motion representations based on a given speech signal and MotionD then decodes these representations to produce motion sequences. We evaluate different representation sizes in order to find the most effective dimensionality for the representation. We also evaluate the effects of using different speech features as input to the model. We find that mel-frequency cepstral coefficients (MFCCs), alone or combined with prosodic features, perform the best. The results of a subsequent user study confirm the benefits of the representation learning.Comment: Accepted at IVA '19. Shorter version published at AAMAS '19. The code is available at https://github.com/GestureGeneration/Speech_driven_gesture_generation_with_autoencode

arXiv.org e-Print Archive

Crossref

No-reference image quality assessment through the von Mises distribution

Author: Ciocca
Claasen
Cohen
Conrad
Ferzli
Gabarda
Gabriel Cristóbal
Jacobson
Ponomarenko
Redi
Salvador Gabarda
Snelson
Stankovic
Stephens
Valdecasas
von Mises
Wigner
Williams
Zhu
Zyczkowski
Publication venue: 'The Optical Society'
Publication date: 14/02/2012
Field of study

An innovative way of calculating the von Mises distribution (VMD) of image entropy is introduced in this paper. The VMD's concentration parameter and some fitness parameter that will be later defined, have been analyzed in the experimental part for determining their suitability as a image quality assessment measure in some particular distortions such as Gaussian blur or additive Gaussian noise. To achieve such measure, the local R\'{e}nyi entropy is calculated in four equally spaced orientations and used to determine the parameters of the von Mises distribution of the image entropy. Considering contextual images, experimental results after applying this model show that the best-in-focus noise-free images are associated with the highest values for the von Mises distribution concentration parameter and the highest approximation of image data to the von Mises distribution model. Our defined von Misses fitness parameter experimentally appears also as a suitable no-reference image quality assessment indicator for no-contextual images.Comment: 29 pages, 11 figure

arXiv.org e-Print Archive

Crossref

Digital.CSIC

Multispectral Palmprint Encoding and Recognition

Author: Hu Yiqun
Khan Zohaib
Mian Ajmal
Shafait Faisal
Publication venue
Publication date: 06/02/2014
Field of study

Palmprints are emerging as a new entity in multi-modal biometrics for human identification and verification. Multispectral palmprint images captured in the visible and infrared spectrum not only contain the wrinkles and ridge structure of a palm, but also the underlying pattern of veins; making them a highly discriminating biometric identifier. In this paper, we propose a feature encoding scheme for robust and highly accurate representation and matching of multispectral palmprints. To facilitate compact storage of the feature, we design a binary hash table structure that allows for efficient matching in large databases. Comprehensive experiments for both identification and verification scenarios are performed on two public datasets -- one captured with a contact-based sensor (PolyU dataset), and the other with a contact-free sensor (CASIA dataset). Recognition results in various experimental setups show that the proposed method consistently outperforms existing state-of-the-art methods. Error rates achieved by our method (0.003% on PolyU and 0.2% on CASIA) are the lowest reported in literature on both dataset and clearly indicate the viability of palmprint as a reliable and promising biometric. All source codes are publicly available.Comment: Preliminary version of this manuscript was published in ICCV 2011. Z. Khan A. Mian and Y. Hu, "Contour Code: Robust and Efficient Multispectral Palmprint Encoding for Human Recognition", International Conference on Computer Vision, 2011. MATLAB Code available: https://sites.google.com/site/zohaibnet/Home/code

arXiv.org e-Print Archive

CiteSeerX

Time-causal and time-recursive spatio-temporal receptive fields

Author: Lindeberg Tony
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

We present an improved model and theory for time-causal and time-recursive spatio-temporal receptive fields, based on a combination of Gaussian receptive fields over the spatial domain and first-order integrators or equivalently truncated exponential filters coupled in cascade over the temporal domain. Compared to previous spatio-temporal scale-space formulations in terms of non-enhancement of local extrema or scale invariance, these receptive fields are based on different scale-space axiomatics over time by ensuring non-creation of new local extrema or zero-crossings with increasing temporal scale. Specifically, extensions are presented about (i) parameterizing the intermediate temporal scale levels, (ii) analysing the resulting temporal dynamics, (iii) transferring the theory to a discrete implementation, (iv) computing scale-normalized spatio-temporal derivative expressions for spatio-temporal feature detection and (v) computational modelling of receptive fields in the lateral geniculate nucleus (LGN) and the primary visual cortex (V1) in biological vision. We show that by distributing the intermediate temporal scale levels according to a logarithmic distribution, we obtain much faster temporal response properties (shorter temporal delays) compared to a uniform distribution. Specifically, these kernels converge very rapidly to a limit kernel possessing true self-similar scale-invariant properties over temporal scales, thereby allowing for true scale invariance over variations in the temporal scale, although the underlying temporal scale-space representation is based on a discretized temporal scale parameter. We show how scale-normalized temporal derivatives can be defined for these time-causal scale-space kernels and how the composed theory can be used for computing basic types of scale-normalized spatio-temporal derivative expressions in a computationally efficient manner.Comment: 39 pages, 12 figures, 5 tables in Journal of Mathematical Imaging and Vision, published online Dec 201

arXiv.org e-Print Archive

Publikationer från KTH

Springer - Publisher Connector

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Einstein equations in the null quasi-spherical gauge III: numerical algorithms

Author: Andrew H. Norton
Bartnik Robert
Chandrasekhar S.
d’Inverno Ray
Merilees P. E.
Müller zum Hagen H.
Robert Bartnik
Trautman A.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 19/04/1999
Field of study

We describe numerical techniques used in the construction of our 4th order evolution for the full Einstein equations, and assess the accuracy of representative solutions. The code is based on a null gauge with a quasi-spherical radial coordinate, and simulates the interaction of a single black hole with gravitational radiation. Techniques used include spherical harmonic representations, convolution spline interpolation and filtering, and an RK4 "method of lines" evolution. For sample initial data of "intermediate" size (gravitational field with 19% of the black hole mass), the code is accurate to 1 part in 10^5, until null time z=55 when the coordinate condition breaks down.Comment: Latex, 38 pages, 29 figures (360Kb compressed

arXiv.org e-Print Archive

Crossref

CERN Document Server

Acoustic waves: should they be propagated forward in time, or forward in space?

Author: Ajaib M A
Arfken G
Kinsler P
Kinsler P
Kinsler P
Kuznetsov V P
Morse P M
Stokes G G
Zabolotskaya E A
Publication venue: 'IOP Publishing'
Publication date: 26/09/2012
Field of study

The evolution of acoustic waves can be evaluated in two ways: either as a temporal, or a spatial propagation. Propagating in space provides the considerable advantage of being able to handle dispersion and propagation across interfaces with remarkable efficiency; but propagating in time is more physical and gives correctly behaved reflections and scattering without effort. Which should be chosen in a given situation, and what compromises might have to be made? Here the natural behaviors of each choice of propagation are compared and contrasted for an ordinary second order wave equation, the time-dependent diffusion wave equation, an elastic rod wave equation, and the Stokes'/ van Wijngaarden's equations, each case illuminating a characteristic feature of the technique. Either choice of propagation axis enables a partitioning the wave equation that gives rise to a directional factorization based on a natural "reference" dispersion relation. The resulting exact coupled bidirectional equations then reduce to a single unidirectional first-order wave equation using a simple "slow evolution" assumption that minimizes effect of subsequent approximations, while allowing a direct term-to-term comparison between exact and approximate theories.Comment: 12 pages, v2 correcte

arXiv.org e-Print Archive

Crossref

ePubs: the open archive for STFC research publications

Lancaster E-Prints

3D weak lensing with spin wavelets on the ball

Author: Kitching Thomas D.
Leistedt Boris
McEwen Jason D.
Peiris Hiranya V.
Publication venue: 'American Physical Society (APS)'
Publication date: 21/12/2015
Field of study

We construct the spin flaglet transform, a wavelet transform to analyze spin signals in three dimensions. Spin flaglets can probe signal content localized simultaneously in space and frequency and, moreover, are separable so that their angular and radial properties can be controlled independently. They are particularly suited to analyzing of cosmological observations such as the weak gravitational lensing of galaxies. Such observations have a unique 3D geometrical setting since they are natively made on the sky, have spin angular symmetries, and are extended in the radial direction by additional distance or redshift information. Flaglets are constructed in the harmonic space defined by the Fourier-Laguerre transform, previously defined for scalar functions and extended here to signals with spin symmetries. Thanks to various sampling theorems, both the Fourier-Laguerre and flaglet transforms are theoretically exact when applied to bandlimited signals. In other words, in numerical computations the only loss of information is due to the finite representation of floating point numbers. We develop a 3D framework relating the weak lensing power spectrum to covariances of flaglet coefficients. We suggest that the resulting novel flaglet weak lensing estimator offers a powerful alternative to common 2D and 3D approaches to accurately capture cosmological information. While standard weak lensing analyses focus on either real or harmonic space representations (i.e., correlation functions or Fourier-Bessel power spectra, respectively), a wavelet approach inherits the advantages of both techniques, where both complicated sky coverage and uncertainties associated with the physical modeling of small scales can be handled effectively. Our codes to compute the Fourier-Laguerre and flaglet transforms are made publicly available.Comment: 24 pages, 4 figures, version accepted for publication in PR

arXiv.org e-Print Archive

UCL Discovery