19,447 research outputs found
Learning sound representations using trainable COPE feature extractors
Sound analysis research has mainly been focused on speech and music
processing. The deployed methodologies are not suitable for analysis of sounds
with varying background noise, in many cases with very low signal-to-noise
ratio (SNR). In this paper, we present a method for the detection of patterns
of interest in audio signals. We propose novel trainable feature extractors,
which we call COPE (Combination of Peaks of Energy). The structure of a COPE
feature extractor is determined using a single prototype sound pattern in an
automatic configuration process, which is a type of representation learning. We
construct a set of COPE feature extractors, configured on a number of training
patterns. Then we take their responses to build feature vectors that we use in
combination with a classifier to detect and classify patterns of interest in
audio signals. We carried out experiments on four public data sets: MIVIA audio
events, MIVIA road events, ESC-10 and TU Dortmund data sets. The results that
we achieved (recognition rate equal to 91.71% on the MIVIA audio events, 94% on
the MIVIA road events, 81.25% on the ESC-10 and 94.27% on the TU Dortmund)
demonstrate the effectiveness of the proposed method and are higher than the
ones obtained by other existing approaches. The COPE feature extractors have
high robustness to variations of SNR. Real-time performance is achieved even
when the value of a large number of features is computed.Comment: Accepted for publication in Pattern Recognitio
Rhythm detection for speech-music discrimination in MPEG compressed domain
A novel approach to speech-music discrimination based on rhythm (or beat) detection is introduced. Rhythmic pulses are detected by applying a long-term autocorrelation method on band-passed signals. This approach is combined with another, in which the features describe the energy peaks of the signal. The discriminator uses just three features that are computed from data directly taken from an MPEG-1 bitstream. The discriminator was tested on more than 3 hours of audio data. Average recognition rate is 97.7%
UPM-UC3M system for music and speech segmentation
This paper describes the UPM-UC3M system for the AlbayzĂn evaluation 2010 on Audio Segmentation. This evaluation task consists of segmenting a broadcast news audio document into clean speech, music, speech with noise in background and speech with music in background. The UPM-UC3M system is based on Hidden Markov Models (HMMs), including a 3-state HMM for every acoustic class. The number of states and the number of Gaussian per state have been tuned for this evaluation. The main analysis during system development has been focused on feature selection. Also, two different architectures have been tested: the first one corresponds to an one-step system whereas the second one is a hierarchical system in which different features have been used for segmenting the different audio classes. For both systems, we have considered long term statistics of MFCC (Mel Frequency Ceptral Coefficients), spectral entropy and CHROMA coefficients. For the best configuration of the one-step system, we have obtained a 25.3% average error rate and 18.7% diarization error (using the NIST tool) and a 23.9% average error rate and 17.9% diarization error for the hierarchical one
Neonatal Seizure Detection using Convolutional Neural Networks
This study presents a novel end-to-end architecture that learns hierarchical
representations from raw EEG data using fully convolutional deep neural
networks for the task of neonatal seizure detection. The deep neural network
acts as both feature extractor and classifier, allowing for end-to-end
optimization of the seizure detector. The designed system is evaluated on a
large dataset of continuous unedited multi-channel neonatal EEG totaling 835
hours and comprising of 1389 seizures. The proposed deep architecture, with
sample-level filters, achieves an accuracy that is comparable to the
state-of-the-art SVM-based neonatal seizure detector, which operates on a set
of carefully designed hand-crafted features. The fully convolutional
architecture allows for the localization of EEG waveforms and patterns that
result in high seizure probabilities for further clinical examination.Comment: IEEE International Workshop on Machine Learning for Signal Processin
- âŠ