256 research outputs found

    Visually Indicated Sounds

    Get PDF
    Objects make distinctive sounds when they are hit or scratched. These sounds reveal aspects of an object's material properties, as well as the actions that produced them. In this paper, we propose the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene. We present an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick. This algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We show that the sounds predicted by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that they convey significant information about material properties and physical interactions

    Acoustic Simulations of Cochlear Implants in Human and Machine Hearing Research

    Get PDF

    Fir filter for makhraj recognition system

    Get PDF
    Audio and speech processing systems have steadily risen in importance in the everyday of most people in developed countries. Speech recognition is the process of converting an acoustic signal, captured by a microphone to a set of words. Recognition is generally more difficult when vocabularies are larger or have many similar-sounding words. There are some external parameters that can effects speech recognition system performance, including the characteristics of the environmental noise and the type and also the placement of the microphone. A particular objective of the invention is to recognize the correct makhraj pronounce for the recognition analysis using pre-processing data base Matlab. In this project, speech processing for makhraj recognition is built using Finite Impulse Response (FIR) filter. The speech that was collects all of data from respondent. It requires the simultaneously recording of the speech wave as many parameters as possible. Then, get the correct makhraj pronounce example (alif), (ba), (ta), (tsa), (jim), (ha) and others. After that, the project will built using Matlab softwar

    Computer Models for Musical Instrument Identification

    Get PDF
    PhDA particular aspect in the perception of sound is concerned with what is commonly termed as texture or timbre. From a perceptual perspective, timbre is what allows us to distinguish sounds that have similar pitch and loudness. Indeed most people are able to discern a piano tone from a violin tone or able to distinguish different voices or singers. This thesis deals with timbre modelling. Specifically, the formant theory of timbre is the main theme throughout. This theory states that acoustic musical instrument sounds can be characterised by their formant structures. Following this principle, the central point of our approach is to propose a computer implementation for building musical instrument identification and classification systems. Although the main thrust of this thesis is to propose a coherent and unified approach to the musical instrument identification problem, it is oriented towards the development of algorithms that can be used in Music Information Retrieval (MIR) frameworks. Drawing on research in speech processing, a complete supervised system taking into account both physical and perceptual aspects of timbre is described. The approach is composed of three distinct processing layers. Parametric models that allow us to represent signals through mid-level physical and perceptual representations are considered. Next, the use of the Line Spectrum Frequencies as spectral envelope and formant descriptors is emphasised. Finally, the use of generative and discriminative techniques for building instrument and database models is investigated. Our system is evaluated under realistic recording conditions using databases of isolated notes and melodic phrases

    Advances on the automatic estimation of the P-wave onset time.

    Get PDF
    This work describes the automatic picking of the P-phase arrivals of the 3*10^6 seismic registers originated during the TOMO-ETNA experiment. Air-gun shots produced by the vessel “Sarmiento de Gamboa” and contemporary passive seismicity occurring in the island are recorded by a dense network of stations deployed for the experiment. In such scenario, automatic processing is needed given: (i) the enormous amount of data, (ii) the low signal-to-noise ratio of many of the available registers and, (iii) the accuracy needed for the velocity tomography resulting from the experiment. A preliminary processing is performed with the records obtained from all stations. Raw data formats from the different types of stations are unified, eliminating defective records and reducing noise through filtering in the band of interest for the phase picking. The advanced multiband picking algorithm (AMPA) is then used to process the big database obtained and determine the travel times of the seismic phases. The approach of AMPA, based on frequency multiband denoising and enhancement of expected arrivals through optimum detectors, is detailed together with its calibration and quality assessment procedure. Examples of its usage for active and passive seismic events are presented.PublishedS04342V. Dinamiche di unrest e scenari pre-eruttiviJCR Journalope

    A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

    Get PDF
    Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term temporal operations, they have not been extensively studied with DNN-based methods. We aim to fill this gap by providing extensive re-assessment of 14 feature extractors on VoxCeleb and SITW datasets. Our findings reveal that features equipped with techniques such as spectral centroids, group delay function, and integrated noise suppression provide promising alternatives to MFCCs for deep speaker embeddings extraction. Experimental results demonstrate up to 16.3\% (VoxCeleb) and 25.1\% (SITW) relative decrease in equal error rate (EER) to the baseline.Comment: Accepted to Interspeech 202
    corecore