256 research outputs found
Visually Indicated Sounds
Objects make distinctive sounds when they are hit or scratched. These sounds
reveal aspects of an object's material properties, as well as the actions that
produced them. In this paper, we propose the task of predicting what sound an
object makes when struck as a way of studying physical interactions within a
visual scene. We present an algorithm that synthesizes sound from silent videos
of people hitting and scratching objects with a drumstick. This algorithm uses
a recurrent neural network to predict sound features from videos and then
produces a waveform from these features with an example-based synthesis
procedure. We show that the sounds predicted by our model are realistic enough
to fool participants in a "real or fake" psychophysical experiment, and that
they convey significant information about material properties and physical
interactions
Fir filter for makhraj recognition system
Audio and speech processing systems have steadily risen in importance in the everyday of most people in developed countries. Speech recognition is the process of converting an acoustic signal, captured by a microphone to a set of words. Recognition is generally more difficult when vocabularies are larger or have many similar-sounding words. There are some external parameters that can effects speech recognition system performance, including the characteristics of the environmental noise and the type and also the placement of the microphone. A particular objective of the invention is to recognize the correct makhraj pronounce for the recognition analysis using pre-processing data base Matlab. In this project, speech processing for makhraj recognition is built using Finite Impulse Response (FIR) filter. The speech that was collects all of data from respondent. It requires the simultaneously recording of the speech wave as many parameters as possible. Then, get the correct makhraj pronounce example (alif), (ba), (ta), (tsa), (jim), (ha) and others. After that, the project will built using Matlab softwar
Computer Models for Musical Instrument Identification
PhDA particular aspect in the perception of sound is concerned with what is commonly
termed as texture or timbre. From a perceptual perspective, timbre is what allows us
to distinguish sounds that have similar pitch and loudness. Indeed most people are
able to discern a piano tone from a violin tone or able to distinguish different voices
or singers.
This thesis deals with timbre modelling. Specifically, the formant theory of timbre
is the main theme throughout. This theory states that acoustic musical instrument
sounds can be characterised by their formant structures. Following this principle, the
central point of our approach is to propose a computer implementation for building
musical instrument identification and classification systems.
Although the main thrust of this thesis is to propose a coherent and unified
approach to the musical instrument identification problem, it is oriented towards the
development of algorithms that can be used in Music Information Retrieval (MIR)
frameworks. Drawing on research in speech processing, a complete supervised system
taking into account both physical and perceptual aspects of timbre is described.
The approach is composed of three distinct processing layers. Parametric models
that allow us to represent signals through mid-level physical and perceptual representations
are considered. Next, the use of the Line Spectrum Frequencies as spectral
envelope and formant descriptors is emphasised. Finally, the use of generative and
discriminative techniques for building instrument and database models is investigated.
Our system is evaluated under realistic recording conditions using databases of isolated
notes and melodic phrases
Advances on the automatic estimation of the P-wave onset time.
This work describes the automatic picking of the P-phase arrivals of the 3*10^6 seismic registers originated during the TOMO-ETNA experiment. Air-gun shots produced by the vessel “Sarmiento de Gamboa” and contemporary passive seismicity occurring in the island are recorded by a dense network of stations deployed for the experiment. In such scenario, automatic processing is needed given: (i) the enormous amount of data,
(ii) the low signal-to-noise ratio of many of the available registers and, (iii) the accuracy needed for the velocity tomography resulting from the experiment. A preliminary processing is performed with the records obtained from all stations. Raw data formats from the different types of stations are unified, eliminating defective records and reducing noise through filtering in the band of interest for the phase picking. The advanced multiband picking algorithm (AMPA) is then used to process the big database obtained and determine the travel times of the seismic phases. The approach of AMPA, based on frequency multiband denoising
and enhancement of expected arrivals through optimum detectors, is detailed together with its calibration and quality assessment procedure. Examples of its usage for active and passive seismic events are presented.PublishedS04342V. Dinamiche di unrest e scenari pre-eruttiviJCR Journalope
A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings
Modern automatic speaker verification relies largely on deep neural networks
(DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While
there are alternative feature extraction methods based on phase, prosody and
long-term temporal operations, they have not been extensively studied with
DNN-based methods. We aim to fill this gap by providing extensive re-assessment
of 14 feature extractors on VoxCeleb and SITW datasets. Our findings reveal
that features equipped with techniques such as spectral centroids, group delay
function, and integrated noise suppression provide promising alternatives to
MFCCs for deep speaker embeddings extraction. Experimental results demonstrate
up to 16.3\% (VoxCeleb) and 25.1\% (SITW) relative decrease in equal error rate
(EER) to the baseline.Comment: Accepted to Interspeech 202
- …