Search CORE

81,959 research outputs found

Speech/Music Discrimination: Novel Features in Time Domain

Author: ALNADABI MUHAMMAD,SAEID,MUHAMMAD
Publication venue
Publication date: 01/01/2010
Field of study

This research aimed to find novel features that can be used to discriminate between speech and music in the time domain for the purpose of data retrieval. The study used speech and music data that were recorded in standard anechoic chambers and sampled at 44.1 kHz. Two types of new features were found and thoroughly examined: the Ratio of Silent Frames (RSF) feature and the Time Series Events (TSE) set of features. The Receiver Operating Characteristics (ROC) curves were used to assess each one of the proposed features as well as certain relevant features from the literature for the purpose of comparison. The RSF feature introduced up to 8% enhancement when compared to a couple of relevant features from the literature. One of the TSE set of features provided close to 100% speech/music discrimination

Durham e-Theses

OpenGrey Repository

Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach

Author: Bugatti Alessandro
Flammini Alessandra
Migliorati Pierangelo
Publication venue
Publication date: 01/01/2002
Field of study

We focus the attention on the problem of audio classification in speech and music for multimedia applications. In particular, we present a comparison between two different techniques for speech/music discrimination. The first method is based on Zero crossing rate and Bayesian classification. It is very simple from a computational point of view, and gives good results in case of pure music or speech. The simulation results show that some performance degradation arises when the music segment contains also some speech superimposed on music, or strong rhythmic components. To overcome these problems, we propose a second method, that uses more features, and is based on neural networks (specifically a multi-layer Perceptron). In this case we obtain better performance, at the expense of a limited growth in the computational complexity. In practice, the proposed neural network is simple to be implemented if a suitable polynomial is used as the activation function, and a real-time implementation is possible even if low-cost embedded systems are used

Springer - Publisher Connector

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Brescia

Open Access Repository

Speech and music discrimination: Human detection of differences between music and speech based on rhythm

Author: Knörzer Christian
Lykartsis Athanasios
Redlich Johannes
Rosenfeld Ninett
Stanev Madeleine
Publication venue
Publication date: 01/01/2016
Field of study

Rhythm in speech and singing forms one of its basic acoustic components. Therefore, it is interesting to investigate the capability of subjects to distinguish between speech and singing when only the rhythm remains as an acoustic cue. For this study we developed a method to eliminate all linguistic components but rhythm from the speech and singing signals. The study was conducted online and participants could listen to the stimuli via loudspeakers or headphones. The analysis of the survey shows that people are able to significantly discriminate between speech and singing after they have been altered. Furthermore, our results reveal specific features, which supported participants in their decision, such as differences in regularity and tempo between singing and speech samples. The hypothesis that music trained people perform more successfully on the task was not proved. The results of the study are important for the understanding of the structure of and differences between speech and singing, for the use in further studies and for future application in the field of speech recognition

DepositOnce

Crossref

Rhythm detection for speech-music discrimination in MPEG compressed domain

Author: Jarina Roman
Marlow Seán
Murphy Noel
O'Connor Noel E.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

A novel approach to speech-music discrimination based on rhythm (or beat) detection is introduced. Rhythmic pulses are detected by applying a long-term autocorrelation method on band-passed signals. This approach is combined with another, in which the features describe the energy peaks of the signal. The discriminator uses just three features that are computed from data directly taken from an MPEG-1 bitstream. The discriminator was tested on more than 3 hours of audio data. Average recognition rate is 97.7%

Crossref

DCU Online Research Access Service

Feature extraction for speech and music discrimination

Author: Jiang M
Sadka A H
Zhou H
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Driven by the demand of information retrieval, video editing and human-computer interface, in this paper we propose a novel spectral feature for music and speech discrimination. This scheme attempts to simulate a biological model using the averaged cepstrum, where human perception tends to pick up the areas of large cepstral changes. The cepstrum data that is away from the mean value will be exponentially reduced in magnitude. We conduct experiments of music/speech discrimination by comparing the performance of the proposed feature with that of previously proposed features in classification. The dynamic time warping based classification verifies that the proposed feature has the best quality of music/speech classification in the test database

Queen's University Belfast Research Portal

Crossref

Lancaster E-Prints

Brunel University Research Archive

UPM-UC3M system for music and speech segmentation

Author: Gallardo Antolín Ascensión
San Segundo Hernández Rubén
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2010
Field of study

This paper describes the UPM-UC3M system for the Albayzín evaluation 2010 on Audio Segmentation. This evaluation task consists of segmenting a broadcast news audio document into clean speech, music, speech with noise in background and speech with music in background. The UPM-UC3M system is based on Hidden Markov Models (HMMs), including a 3-state HMM for every acoustic class. The number of states and the number of Gaussian per state have been tuned for this evaluation. The main analysis during system development has been focused on feature selection. Also, two different architectures have been tested: the first one corresponds to an one-step system whereas the second one is a hierarchical system in which different features have been used for segmenting the different audio classes. For both systems, we have considered long term statistics of MFCC (Mel Frequency Ceptral Coefficients), spectral entropy and CHROMA coefficients. For the best configuration of the one-step system, we have obtained a 25.3% average error rate and 18.7% diarization error (using the NIST tool) and a 23.9% average error rate and 17.9% diarization error for the hierarchical one

Archivo Digital UPM

An experiment in audio classification from compressed data

Author: Jarina Roman
Marlow Seán
Murphy Noel
O'Connor Noel E.
Publication venue
Publication date: 01/09/2004
Field of study

In this paper we present an algorithm for automatic classification of sound into speech, instrumental sound/ music and silence. The method is based on thresholding of features derived from the modulation envelope of the frequency limited audio signal. Four characteristics are examined for discrimination: the occurrence and duration of energy peaks, rhythmic content and the level of harmonic content. The proposed algorithm allows classification directly on MPEG-1 audio bitstreams. The performance of the classifier was evaluated on TRECVID test data. The test results are above-average among all TREC participants. The approaches adopted by other research groups participating in TREC are also discussed

Irish Universities

DCU Online Research Access Service

Frontal brain asymmetries as effective parameters to assess the quality of audiovisual stimuli perception in adult and young cochlear implant users

Author: Babiloni F
Cartocci G
Giannantonio S
Grassia R
Leone C A
Maglione A G
Malerba P
Marsella P
Modica E
Mosca F
Rossi D
Scorpecci A
Vecchiato G
Publication venue: 'Pacini Editore'
Publication date: 01/01/2018
Field of study

How is music perceived by cochlear implant (CI) users? This question arises as "the next step" given the impressive performance obtained by these patients in language perception. Furthermore, how can music perception be evaluated beyond self-report rating, in order to obtain measurable data? To address this question, estimation of the frontal electroencephalographic (EEG) alpha activity imbalance, acquired through a 19-channel EEG cap, appears to be a suitable instrument to measure the approach/withdrawal (AW index) reaction to external stimuli. Specifically, a greater value of AW indicates an increased propensity to stimulus approach, and vice versa a lower one a tendency to withdraw from the stimulus. Additionally, due to prelingually and postlingually deafened pathology acquisition, children and adults, respectively, would probably differ in music perception. The aim of the present study was to investigate children and adult CI users, in unilateral (UCI) and bilateral (BCI) implantation conditions, during three experimental situations of music exposure (normal, distorted and mute). Additionally, a study of functional connectivity patterns within cerebral networks was performed to investigate functioning patterns in different experimental populations. As a general result, congruency among patterns between BCI patients and control (CTRL) subjects was seen, characterised by lowest values for the distorted condition (vs. normal and mute conditions) in the AW index and in the connectivity analysis. Additionally, the normal and distorted conditions were significantly different in CI and CTRL adults, and in CTRL children, but not in CI children. These results suggest a higher capacity of discrimination and approach motivation towards normal music in CTRL and BCI subjects, but not for UCI patients. Therefore, for perception of music CTRL and BCI participants appear more similar than UCI subjects, as estimated by measurable and not self-reported parameters

Archivio della ricerca- Università di Roma La Sapienza