16,779 research outputs found

    Scale Invariant Interest Points with Shearlets

    Full text link
    Shearlets are a relatively new directional multi-scale framework for signal analysis, which have been shown effective to enhance signal discontinuities such as edges and corners at multiple scales. In this work we address the problem of detecting and describing blob-like features in the shearlets framework. We derive a measure which is very effective for blob detection and closely related to the Laplacian of Gaussian. We demonstrate the measure satisfies the perfect scale invariance property in the continuous case. In the discrete setting, we derive algorithms for blob detection and keypoint description. Finally, we provide qualitative justifications of our findings as well as a quantitative evaluation on benchmark data. We also report an experimental evidence that our method is very suitable to deal with compressed and noisy images, thanks to the sparsity property of shearlets

    Analyzing Input and Output Representations for Speech-Driven Gesture Generation

    Full text link
    This paper presents a novel framework for automatic speech-driven gesture generation, applicable to human-agent interaction including both virtual agents and robots. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates. Our approach consists of two steps. First, we learn a lower-dimensional representation of human motion using a denoising autoencoder neural network, consisting of a motion encoder MotionE and a motion decoder MotionD. The learned representation preserves the most important aspects of the human pose variation while removing less relevant variation. Second, we train a novel encoder network SpeechE to map from speech to a corresponding motion representation with reduced dimensionality. At test time, the speech encoder and the motion decoder networks are combined: SpeechE predicts motion representations based on a given speech signal and MotionD then decodes these representations to produce motion sequences. We evaluate different representation sizes in order to find the most effective dimensionality for the representation. We also evaluate the effects of using different speech features as input to the model. We find that mel-frequency cepstral coefficients (MFCCs), alone or combined with prosodic features, perform the best. The results of a subsequent user study confirm the benefits of the representation learning.Comment: Accepted at IVA '19. Shorter version published at AAMAS '19. The code is available at https://github.com/GestureGeneration/Speech_driven_gesture_generation_with_autoencode

    No-reference image quality assessment through the von Mises distribution

    Get PDF
    An innovative way of calculating the von Mises distribution (VMD) of image entropy is introduced in this paper. The VMD's concentration parameter and some fitness parameter that will be later defined, have been analyzed in the experimental part for determining their suitability as a image quality assessment measure in some particular distortions such as Gaussian blur or additive Gaussian noise. To achieve such measure, the local R\'{e}nyi entropy is calculated in four equally spaced orientations and used to determine the parameters of the von Mises distribution of the image entropy. Considering contextual images, experimental results after applying this model show that the best-in-focus noise-free images are associated with the highest values for the von Mises distribution concentration parameter and the highest approximation of image data to the von Mises distribution model. Our defined von Misses fitness parameter experimentally appears also as a suitable no-reference image quality assessment indicator for no-contextual images.Comment: 29 pages, 11 figure

    Multispectral Palmprint Encoding and Recognition

    Full text link
    Palmprints are emerging as a new entity in multi-modal biometrics for human identification and verification. Multispectral palmprint images captured in the visible and infrared spectrum not only contain the wrinkles and ridge structure of a palm, but also the underlying pattern of veins; making them a highly discriminating biometric identifier. In this paper, we propose a feature encoding scheme for robust and highly accurate representation and matching of multispectral palmprints. To facilitate compact storage of the feature, we design a binary hash table structure that allows for efficient matching in large databases. Comprehensive experiments for both identification and verification scenarios are performed on two public datasets -- one captured with a contact-based sensor (PolyU dataset), and the other with a contact-free sensor (CASIA dataset). Recognition results in various experimental setups show that the proposed method consistently outperforms existing state-of-the-art methods. Error rates achieved by our method (0.003% on PolyU and 0.2% on CASIA) are the lowest reported in literature on both dataset and clearly indicate the viability of palmprint as a reliable and promising biometric. All source codes are publicly available.Comment: Preliminary version of this manuscript was published in ICCV 2011. Z. Khan A. Mian and Y. Hu, "Contour Code: Robust and Efficient Multispectral Palmprint Encoding for Human Recognition", International Conference on Computer Vision, 2011. MATLAB Code available: https://sites.google.com/site/zohaibnet/Home/code

    Time-causal and time-recursive spatio-temporal receptive fields

    Get PDF
    We present an improved model and theory for time-causal and time-recursive spatio-temporal receptive fields, based on a combination of Gaussian receptive fields over the spatial domain and first-order integrators or equivalently truncated exponential filters coupled in cascade over the temporal domain. Compared to previous spatio-temporal scale-space formulations in terms of non-enhancement of local extrema or scale invariance, these receptive fields are based on different scale-space axiomatics over time by ensuring non-creation of new local extrema or zero-crossings with increasing temporal scale. Specifically, extensions are presented about (i) parameterizing the intermediate temporal scale levels, (ii) analysing the resulting temporal dynamics, (iii) transferring the theory to a discrete implementation, (iv) computing scale-normalized spatio-temporal derivative expressions for spatio-temporal feature detection and (v) computational modelling of receptive fields in the lateral geniculate nucleus (LGN) and the primary visual cortex (V1) in biological vision. We show that by distributing the intermediate temporal scale levels according to a logarithmic distribution, we obtain much faster temporal response properties (shorter temporal delays) compared to a uniform distribution. Specifically, these kernels converge very rapidly to a limit kernel possessing true self-similar scale-invariant properties over temporal scales, thereby allowing for true scale invariance over variations in the temporal scale, although the underlying temporal scale-space representation is based on a discretized temporal scale parameter. We show how scale-normalized temporal derivatives can be defined for these time-causal scale-space kernels and how the composed theory can be used for computing basic types of scale-normalized spatio-temporal derivative expressions in a computationally efficient manner.Comment: 39 pages, 12 figures, 5 tables in Journal of Mathematical Imaging and Vision, published online Dec 201

    Einstein equations in the null quasi-spherical gauge III: numerical algorithms

    Get PDF
    We describe numerical techniques used in the construction of our 4th order evolution for the full Einstein equations, and assess the accuracy of representative solutions. The code is based on a null gauge with a quasi-spherical radial coordinate, and simulates the interaction of a single black hole with gravitational radiation. Techniques used include spherical harmonic representations, convolution spline interpolation and filtering, and an RK4 "method of lines" evolution. For sample initial data of "intermediate" size (gravitational field with 19% of the black hole mass), the code is accurate to 1 part in 10^5, until null time z=55 when the coordinate condition breaks down.Comment: Latex, 38 pages, 29 figures (360Kb compressed

    Acoustic waves: should they be propagated forward in time, or forward in space?

    Get PDF
    The evolution of acoustic waves can be evaluated in two ways: either as a temporal, or a spatial propagation. Propagating in space provides the considerable advantage of being able to handle dispersion and propagation across interfaces with remarkable efficiency; but propagating in time is more physical and gives correctly behaved reflections and scattering without effort. Which should be chosen in a given situation, and what compromises might have to be made? Here the natural behaviors of each choice of propagation are compared and contrasted for an ordinary second order wave equation, the time-dependent diffusion wave equation, an elastic rod wave equation, and the Stokes'/ van Wijngaarden's equations, each case illuminating a characteristic feature of the technique. Either choice of propagation axis enables a partitioning the wave equation that gives rise to a directional factorization based on a natural "reference" dispersion relation. The resulting exact coupled bidirectional equations then reduce to a single unidirectional first-order wave equation using a simple "slow evolution" assumption that minimizes effect of subsequent approximations, while allowing a direct term-to-term comparison between exact and approximate theories.Comment: 12 pages, v2 correcte

    3D weak lensing with spin wavelets on the ball

    Get PDF
    We construct the spin flaglet transform, a wavelet transform to analyze spin signals in three dimensions. Spin flaglets can probe signal content localized simultaneously in space and frequency and, moreover, are separable so that their angular and radial properties can be controlled independently. They are particularly suited to analyzing of cosmological observations such as the weak gravitational lensing of galaxies. Such observations have a unique 3D geometrical setting since they are natively made on the sky, have spin angular symmetries, and are extended in the radial direction by additional distance or redshift information. Flaglets are constructed in the harmonic space defined by the Fourier-Laguerre transform, previously defined for scalar functions and extended here to signals with spin symmetries. Thanks to various sampling theorems, both the Fourier-Laguerre and flaglet transforms are theoretically exact when applied to bandlimited signals. In other words, in numerical computations the only loss of information is due to the finite representation of floating point numbers. We develop a 3D framework relating the weak lensing power spectrum to covariances of flaglet coefficients. We suggest that the resulting novel flaglet weak lensing estimator offers a powerful alternative to common 2D and 3D approaches to accurately capture cosmological information. While standard weak lensing analyses focus on either real or harmonic space representations (i.e., correlation functions or Fourier-Bessel power spectra, respectively), a wavelet approach inherits the advantages of both techniques, where both complicated sky coverage and uncertainties associated with the physical modeling of small scales can be handled effectively. Our codes to compute the Fourier-Laguerre and flaglet transforms are made publicly available.Comment: 24 pages, 4 figures, version accepted for publication in PR
    • …
    corecore