7,142 research outputs found
Using the beat histogram for speech rhythm description and language identification
In this paper we present a novel approach for the description of speech rhythm and the extraction of rhythm-related features for automatic language identification (LID). Previous methods have extracted speech rhythm through the calculation of features based on salient elements of speech such as consonants, vowels and syllables. We present how an automatic rhythm extraction method borrowed from music information retrieval, the beat histogram, can be adapted for the analysis of speech rhythm by defining the most relevant novelty functions in the speech signal and extracting features describing their periodicities. We have evaluated those features in a rhythm-based LID task for two multilingual speech corpora using support vector machines, including feature selection methods to identify the most informative descriptors. Results suggest that the method is successful in describing speech rhythm and provides LID classification accuracy comparable to or better than that of other approaches, without the need for a preceding segmentation or annotation of the speech signal. Concerning rhythm typology, the rhythm class hypothesis in its original form seems to be only partly confirmed by our results
Towards Music Structural Segmentation across Genres: Features, Structural Hypotheses, and Annotation Principles
This work is supported by China Scholarship Council (CSC) and EPSRC project (EP/L019981/1) Fusing Semantic and Audio Technologies for Intelligent Music Production and Consumption (FAST-IMPACt). Sandler acknowledges the support of the Royal Society as a recipient of a Wolfson Research Merit Award
Beat histogram features for rhythm-based musical genre classification using multiple novelty functions
In this paper we present beat histogram features for multiple level rhythm description and evaluate them in a musical genre classification task. Audio features pertaining to various musical content categories and their related novelty functions are extracted as a basis for the creation of beat histograms. The proposed features capture not only amplitude, but also tonal and general spectral changes in the signal, aiming to represent as much rhythmic information as possible. The most and least informative features are identified through feature selection methods and are then tested using Support Vector Machines on five genre datasets concerning classification accuracy against a baseline feature set. Results show that the presented features provide comparable classification accuracy with respect to other genre classification approaches using periodicity histograms and display a performance close to that of much more elaborate up-to-date approaches for rhythm description. The use of bar boundary annotations for the texture frames has provided an improvement for the dance-oriented Ballroom dataset. The comparably small number of descriptors and the possibility of evaluating the influence of specific signal components to the general rhythmic content encourage the further use of the method in rhythm description tasks
Low-frequency oscillatory correlates of auditory predictive processing in cortical-subcortical networks: a MEG-study
Emerging evidence supports the role of neural oscillations as a mechanism for predictive information processing across large-scale networks. However, the oscillatory signatures underlying auditory mismatch detection and information flow between brain regions remain unclear. To address this issue, we examined the contribution of oscillatory activity at theta/alpha-bands (4–8/8–13 Hz) and assessed directed connectivity in magnetoencephalographic data while 17 human participants were presented with sound sequences containing predictable repetitions and order manipulations that elicited prediction-error responses. We characterized the spectro-temporal properties of neural generators using a minimum-norm approach and assessed directed connectivity using Granger Causality analysis. Mismatching sequences elicited increased theta power and phase-locking in auditory, hippocampal and prefrontal cortices, suggesting that theta-band oscillations underlie prediction-error generation in cortical-subcortical networks. Furthermore, enhanced feedforward theta/alpha-band connectivity was observed in auditory-prefrontal networks during mismatching sequences, while increased feedback connectivity in the alpha-band was observed between hippocampus and auditory regions during predictable sounds. Our findings highlight the involvement of hippocampal theta/alpha-band oscillations towards auditory prediction-error generation and suggest a spectral dissociation between inter-areal feedforward vs. feedback signalling, thus providing novel insights into the oscillatory mechanisms underlying auditory predictive processing
Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation
Singing voice detection is the task to identify the frames which contain the
singer vocal or not. It has been one of the main components in music
information retrieval (MIR), which can be applicable to melody extraction,
artist recognition, and music discovery in popular music. Although there are
several methods which have been proposed, a more robust and more complete
system is desired to improve the detection performance. In this paper, our
motivation is to provide an extensive comparison in different stages of singing
voice detection. Based on the analysis a novel method was proposed to build a
more efficiently singing voice detection system. In the proposed system, there
are main three parts. The first is a pre-process of singing voice separation to
extract the vocal without the music. The improvements of several singing voice
separation methods were compared to decide the best one which is integrated to
singing voice detection system. And the second is a deep neural network based
classifier to identify the given frames. Different deep models for
classification were also compared. The last one is a post-process to filter out
the anomaly frame on the prediction result of the classifier. The median filter
and Hidden Markov Model (HMM) based filter as the post process were compared.
Through the step by step module extension, the different methods were compared
and analyzed. Finally, classification performance on two public datasets
indicates that the proposed approach which based on the Long-term Recurrent
Convolutional Networks (LRCN) model is a promising alternative.Comment: 15 page
- …