33,179 research outputs found

    BaNa: a noise resilient fundamental frequency detection algorithm for speech and music

    Get PDF
    Fundamental frequency (F0) is one of the essential features in many acoustic related applications. Although numerous F0 detection algorithms have been developed, the detection accuracy in noisy environments still needs improvement. We present a hybrid noise resilient F0 detection algorithm named BaNa that combines the approaches of harmonic ratios and Cepstrum analysis. A Viterbi algorithm with a cost function is used to identify the F0 value among several F0 candidates. Speech and music databases with eight different types of additive noise are used to evaluate the performance of the BaNa algorithm and several classic and state-of-the-art F0 detection algorithms. Results show that for almost all types of noise and signal-to-noise ratio (SNR) values investigated, BaNa achieves the lowest Gross Pitch Error (GPE) rate among all the algorithms. Moreover, for the 0 dB SNR scenarios, the BaNa algorithm is shown to achieve 20% to 35% GPE rate for speech and 12% to 39% GPE rate for music. We also describe implementation issues that must be addressed to run the BaNa algorithm as a real-time application on a smartphone platform.Peer ReviewedPostprint (author's final draft

    Simultaneous Facial Landmark Detection, Pose and Deformation Estimation under Facial Occlusion

    Full text link
    Facial landmark detection, head pose estimation, and facial deformation analysis are typical facial behavior analysis tasks in computer vision. The existing methods usually perform each task independently and sequentially, ignoring their interactions. To tackle this problem, we propose a unified framework for simultaneous facial landmark detection, head pose estimation, and facial deformation analysis, and the proposed model is robust to facial occlusion. Following a cascade procedure augmented with model-based head pose estimation, we iteratively update the facial landmark locations, facial occlusion, head pose and facial de- formation until convergence. The experimental results on benchmark databases demonstrate the effectiveness of the proposed method for simultaneous facial landmark detection, head pose and facial deformation estimation, even if the images are under facial occlusion.Comment: International Conference on Computer Vision and Pattern Recognition, 201

    Spoken affect classification : algorithms and experimental implementation : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University, Palmerston North, New Zealand

    Get PDF
    Machine-based emotional intelligence is a requirement for natural interaction between humans and computer interfaces and a basic level of accurate emotion perception is needed for computer systems to respond adequately to human emotion. Humans convey emotional information both intentionally and unintentionally via speech patterns. These vocal patterns are perceived and understood by listeners during conversation. This research aims to improve the automatic perception of vocal emotion in two ways. First, we compare two emotional speech data sources: natural, spontaneous emotional speech and acted or portrayed emotional speech. This comparison demonstrates the advantages and disadvantages of both acquisition methods and how these methods affect the end application of vocal emotion recognition. Second, we look at two classification methods which have gone unexplored in this field: stacked generalisation and unweighted vote. We show how these techniques can yield an improvement over traditional classification methods

    Studying the Imaging Characteristics of Ultra Violet Imaging Telescope (UVIT) through Numerical Simulations

    Full text link
    Ultra-Violet Imaging Telescope (UVIT) is one of the five payloads aboard the Indian Space Research Organization (ISRO)'s ASTROSAT space mission. The science objectives of UVIT are broad, extending from individual hot stars, star-forming regions to active galactic nuclei. Imaging performance of UVIT would depend on several factors in addition to the optics, e.g. resolution of the detectors, Satellite Drift and Jitter, image frame acquisition rate, sky background, source intensity etc. The use of intensified CMOS-imager based photon counting detectors in UVIT put their own complexity over reconstruction of the images. All these factors could lead to several systematic effects in the reconstructed images. A study has been done through numerical simulations with artificial point sources and archival image of a galaxy from GALEX data archive, to explore the effects of all the above mentioned parameters on the reconstructed images. In particular the issues of angular resolution, photometric accuracy and photometric-nonlinearity associated with the intensified CMOS-imager based photon counting detectors have been investigated. The photon events in image frames are detected by three different centroid algorithms with some energy thresholds. Our results show that in presence of bright sources, reconstructed images from UVIT would suffer from photometric distortion in a complex way and the presence of overlapping photon events could lead to complex patterns near the bright sources. Further the angular resolution, photometric accuracy and distortion would depend on the values of various thresholds chosen to detect photon events.Comment: Submitted to PASP, 16 Pages, 9 figure
    • …
    corecore