Search CORE

256 research outputs found

Visually Indicated Sounds

Author: Adelson Edward H.
Freeman William T.
Isola Phillip
McDermott Josh
Owens Andrew
Torralba Antonio
Publication venue
Publication date: 29/04/2016
Field of study

Objects make distinctive sounds when they are hit or scratched. These sounds reveal aspects of an object's material properties, as well as the actions that produced them. In this paper, we propose the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene. We present an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick. This algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We show that the sounds predicted by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that they convey significant information about material properties and physical interactions

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Complex Wavelet Based Modulation Analysis

Author: Jensen Søren Holdt
Lebrun Jérôme
Luneau Jean-Marc
Publication venue
Publication date: 01/01/2008
Field of study

VBN

Acoustic Simulations of Cochlear Implants in Human and Machine Hearing Research

Author: Cong-Thanh Do
Publication venue: 'IntechOpen'
Publication date: 27/04/2012
Field of study

IntechOpen

Blind identification of acoustic systems and enhancement of reverberant speech

Author: Gaubitch Nikolay Dian
Gaubitch Nikolay Dian
Publication venue
Publication date: 01/01/2007
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Fir filter for makhraj recognition system

Author: Aimi Nadia Azmi
Publication venue
Publication date: 01/12/2010
Field of study

Audio and speech processing systems have steadily risen in importance in the everyday of most people in developed countries. Speech recognition is the process of converting an acoustic signal, captured by a microphone to a set of words. Recognition is generally more difficult when vocabularies are larger or have many similar-sounding words. There are some external parameters that can effects speech recognition system performance, including the characteristics of the environmental noise and the type and also the placement of the microphone. A particular objective of the invention is to recognize the correct makhraj pronounce for the recognition analysis using pre-processing data base Matlab. In this project, speech processing for makhraj recognition is built using Finite Impulse Response (FIR) filter. The speech that was collects all of data from respondent. It requires the simultaneously recording of the speech wave as many parameters as possible. Then, get the correct makhraj pronounce example (alif), (ba), (ta), (tsa), (jim), (ha) and others. After that, the project will built using Matlab softwar

Directory of Open Access Journals

Research Plus Journals (RPJ)

UMP Institutional Repository

Computer Models for Musical Instrument Identification

Author: Chetry Nicolas D.
Publication venue
Publication date: 01/01/2006
Field of study

PhDA particular aspect in the perception of sound is concerned with what is commonly termed as texture or timbre. From a perceptual perspective, timbre is what allows us to distinguish sounds that have similar pitch and loudness. Indeed most people are able to discern a piano tone from a violin tone or able to distinguish different voices or singers. This thesis deals with timbre modelling. Specifically, the formant theory of timbre is the main theme throughout. This theory states that acoustic musical instrument sounds can be characterised by their formant structures. Following this principle, the central point of our approach is to propose a computer implementation for building musical instrument identification and classification systems. Although the main thrust of this thesis is to propose a coherent and unified approach to the musical instrument identification problem, it is oriented towards the development of algorithms that can be used in Music Information Retrieval (MIR) frameworks. Drawing on research in speech processing, a complete supervised system taking into account both physical and perceptual aspects of timbre is described. The approach is composed of three distinct processing layers. Parametric models that allow us to represent signals through mid-level physical and perceptual representations are considered. Next, the use of the Line Spectrum Frequencies as spectral envelope and formant descriptors is emphasised. Finally, the use of generative and discriminative techniques for building instrument and database models is investigated. Our system is evaluated under realistic recording conditions using databases of isolated notes and melodic phrases

Queen Mary Research Online

OpenGrey Repository

Advances on the automatic estimation of the P-wave onset time.

Author: Aguacil G.
Benitez C.
Bueno A.
Cocina O.
De La Torre A.
Diaz-Moreno A.
Garcia-Yeguas A.
García L.
Ibanez J. M.
Mota S.
Patane D.
Prudencio J.
Segura J. C.
Titos M.
Zuccarello L.
Álvarez I.
Publication venue: 'Instituto Nazionale di Geofisica e Vulcanologia, INGV'
Publication date: 01/01/2016
Field of study

This work describes the automatic picking of the P-phase arrivals of the 3*10^6 seismic registers originated during the TOMO-ETNA experiment. Air-gun shots produced by the vessel “Sarmiento de Gamboa” and contemporary passive seismicity occurring in the island are recorded by a dense network of stations deployed for the experiment. In such scenario, automatic processing is needed given: (i) the enormous amount of data, (ii) the low signal-to-noise ratio of many of the available registers and, (iii) the accuracy needed for the velocity tomography resulting from the experiment. A preliminary processing is performed with the records obtained from all stations. Raw data formats from the different types of stations are unified, eliminating defective records and reducing noise through filtering in the band of interest for the phase picking. The advanced multiband picking algorithm (AMPA) is then used to process the big database obtained and determine the travel times of the seismic phases. The approach of AMPA, based on frequency multiband denoising and enhancement of expected arrivals through optimum detectors, is detailed together with its calibration and quality assessment procedure. Examples of its usage for active and passive seismic events are presented.PublishedS04342V. Dinamiche di unrest e scenari pre-eruttiviJCR Journalope

A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

Author: Kinnunen Tomi
Liu Xuechen
Sahidullah Md
Publication venue
Publication date: 01/01/2020
Field of study

Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term temporal operations, they have not been extensively studied with DNN-based methods. We aim to fill this gap by providing extensive re-assessment of 14 feature extractors on VoxCeleb and SITW datasets. Our findings reveal that features equipped with techniques such as spectral centroids, group delay function, and integrated noise suppression provide promising alternatives to MFCCs for deep speaker embeddings extraction. Experimental results demonstrate up to 16.3\% (VoxCeleb) and 25.1\% (SITW) relative decrease in equal error rate (EER) to the baseline.Comment: Accepted to Interspeech 202

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server