33,179 research outputs found
BaNa: a noise resilient fundamental frequency detection algorithm for speech and music
Fundamental frequency (F0) is one of the essential features in many acoustic related applications. Although numerous F0 detection algorithms have been developed, the detection accuracy in noisy environments still needs improvement. We present a hybrid noise resilient F0 detection algorithm named BaNa that combines the approaches of harmonic ratios and Cepstrum analysis. A Viterbi algorithm with a cost function is used to identify the F0 value among several F0 candidates. Speech and music databases with eight different types of additive noise are used to evaluate the performance of the BaNa algorithm and several classic and state-of-the-art F0 detection algorithms. Results show that for almost all types of noise and signal-to-noise ratio (SNR) values investigated, BaNa achieves the lowest Gross Pitch Error (GPE) rate among all the algorithms. Moreover, for the 0 dB SNR scenarios, the BaNa algorithm is shown to achieve 20% to 35% GPE rate for speech and 12% to 39% GPE rate for music. We also describe implementation issues that must be addressed to run the BaNa algorithm as a real-time application on a smartphone platform.Peer ReviewedPostprint (author's final draft
Simultaneous Facial Landmark Detection, Pose and Deformation Estimation under Facial Occlusion
Facial landmark detection, head pose estimation, and facial deformation
analysis are typical facial behavior analysis tasks in computer vision. The
existing methods usually perform each task independently and sequentially,
ignoring their interactions. To tackle this problem, we propose a unified
framework for simultaneous facial landmark detection, head pose estimation, and
facial deformation analysis, and the proposed model is robust to facial
occlusion. Following a cascade procedure augmented with model-based head pose
estimation, we iteratively update the facial landmark locations, facial
occlusion, head pose and facial de- formation until convergence. The
experimental results on benchmark databases demonstrate the effectiveness of
the proposed method for simultaneous facial landmark detection, head pose and
facial deformation estimation, even if the images are under facial occlusion.Comment: International Conference on Computer Vision and Pattern Recognition,
201
Spoken affect classification : algorithms and experimental implementation : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University, Palmerston North, New Zealand
Machine-based emotional intelligence is a requirement for natural interaction between humans and computer interfaces and a basic level of accurate emotion perception is needed for computer systems to respond adequately to human emotion. Humans convey emotional information both intentionally and unintentionally via speech patterns. These vocal patterns are perceived and understood by listeners during conversation. This research aims to improve the automatic perception of vocal emotion in two ways. First, we compare two emotional speech data sources: natural, spontaneous emotional speech and acted or portrayed emotional speech. This comparison demonstrates the advantages and disadvantages of both acquisition methods and how these methods affect the end application of vocal emotion recognition. Second, we look at two classification methods which have gone unexplored in this field: stacked generalisation and unweighted vote. We show how these techniques can yield an improvement over traditional classification methods
Studying the Imaging Characteristics of Ultra Violet Imaging Telescope (UVIT) through Numerical Simulations
Ultra-Violet Imaging Telescope (UVIT) is one of the five payloads aboard the
Indian Space Research Organization (ISRO)'s ASTROSAT space mission. The science
objectives of UVIT are broad, extending from individual hot stars, star-forming
regions to active galactic nuclei. Imaging performance of UVIT would depend on
several factors in addition to the optics, e.g. resolution of the detectors,
Satellite Drift and Jitter, image frame acquisition rate, sky background,
source intensity etc. The use of intensified CMOS-imager based photon counting
detectors in UVIT put their own complexity over reconstruction of the images.
All these factors could lead to several systematic effects in the reconstructed
images. A study has been done through numerical simulations with artificial
point sources and archival image of a galaxy from GALEX data archive, to
explore the effects of all the above mentioned parameters on the reconstructed
images. In particular the issues of angular resolution, photometric accuracy
and photometric-nonlinearity associated with the intensified CMOS-imager based
photon counting detectors have been investigated. The photon events in image
frames are detected by three different centroid algorithms with some energy
thresholds. Our results show that in presence of bright sources, reconstructed
images from UVIT would suffer from photometric distortion in a complex way and
the presence of overlapping photon events could lead to complex patterns near
the bright sources. Further the angular resolution, photometric accuracy and
distortion would depend on the values of various thresholds chosen to detect
photon events.Comment: Submitted to PASP, 16 Pages, 9 figure
- …