7,142 research outputs found

    Using the beat histogram for speech rhythm description and language identification

    Get PDF
    In this paper we present a novel approach for the description of speech rhythm and the extraction of rhythm-related features for automatic language identification (LID). Previous methods have extracted speech rhythm through the calculation of features based on salient elements of speech such as consonants, vowels and syllables. We present how an automatic rhythm extraction method borrowed from music information retrieval, the beat histogram, can be adapted for the analysis of speech rhythm by defining the most relevant novelty functions in the speech signal and extracting features describing their periodicities. We have evaluated those features in a rhythm-based LID task for two multilingual speech corpora using support vector machines, including feature selection methods to identify the most informative descriptors. Results suggest that the method is successful in describing speech rhythm and provides LID classification accuracy comparable to or better than that of other approaches, without the need for a preceding segmentation or annotation of the speech signal. Concerning rhythm typology, the rhythm class hypothesis in its original form seems to be only partly confirmed by our results

    Towards Music Structural Segmentation across Genres: Features, Structural Hypotheses, and Annotation Principles

    Get PDF
    This work is supported by China Scholarship Council (CSC) and EPSRC project (EP/L019981/1) Fusing Semantic and Audio Technologies for Intelligent Music Production and Consumption (FAST-IMPACt). Sandler acknowledges the support of the Royal Society as a recipient of a Wolfson Research Merit Award

    Beat histogram features for rhythm-based musical genre classification using multiple novelty functions

    Get PDF
    In this paper we present beat histogram features for multiple level rhythm description and evaluate them in a musical genre classification task. Audio features pertaining to various musical content categories and their related novelty functions are extracted as a basis for the creation of beat histograms. The proposed features capture not only amplitude, but also tonal and general spectral changes in the signal, aiming to represent as much rhythmic information as possible. The most and least informative features are identified through feature selection methods and are then tested using Support Vector Machines on five genre datasets concerning classification accuracy against a baseline feature set. Results show that the presented features provide comparable classification accuracy with respect to other genre classification approaches using periodicity histograms and display a performance close to that of much more elaborate up-to-date approaches for rhythm description. The use of bar boundary annotations for the texture frames has provided an improvement for the dance-oriented Ballroom dataset. The comparably small number of descriptors and the possibility of evaluating the influence of specific signal components to the general rhythmic content encourage the further use of the method in rhythm description tasks

    Low-frequency oscillatory correlates of auditory predictive processing in cortical-subcortical networks: a MEG-study

    Get PDF
    Emerging evidence supports the role of neural oscillations as a mechanism for predictive information processing across large-scale networks. However, the oscillatory signatures underlying auditory mismatch detection and information flow between brain regions remain unclear. To address this issue, we examined the contribution of oscillatory activity at theta/alpha-bands (4–8/8–13 Hz) and assessed directed connectivity in magnetoencephalographic data while 17 human participants were presented with sound sequences containing predictable repetitions and order manipulations that elicited prediction-error responses. We characterized the spectro-temporal properties of neural generators using a minimum-norm approach and assessed directed connectivity using Granger Causality analysis. Mismatching sequences elicited increased theta power and phase-locking in auditory, hippocampal and prefrontal cortices, suggesting that theta-band oscillations underlie prediction-error generation in cortical-subcortical networks. Furthermore, enhanced feedforward theta/alpha-band connectivity was observed in auditory-prefrontal networks during mismatching sequences, while increased feedback connectivity in the alpha-band was observed between hippocampus and auditory regions during predictable sounds. Our findings highlight the involvement of hippocampal theta/alpha-band oscillations towards auditory prediction-error generation and suggest a spectral dissociation between inter-areal feedforward vs. feedback signalling, thus providing novel insights into the oscillatory mechanisms underlying auditory predictive processing

    Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation

    Full text link
    Singing voice detection is the task to identify the frames which contain the singer vocal or not. It has been one of the main components in music information retrieval (MIR), which can be applicable to melody extraction, artist recognition, and music discovery in popular music. Although there are several methods which have been proposed, a more robust and more complete system is desired to improve the detection performance. In this paper, our motivation is to provide an extensive comparison in different stages of singing voice detection. Based on the analysis a novel method was proposed to build a more efficiently singing voice detection system. In the proposed system, there are main three parts. The first is a pre-process of singing voice separation to extract the vocal without the music. The improvements of several singing voice separation methods were compared to decide the best one which is integrated to singing voice detection system. And the second is a deep neural network based classifier to identify the given frames. Different deep models for classification were also compared. The last one is a post-process to filter out the anomaly frame on the prediction result of the classifier. The median filter and Hidden Markov Model (HMM) based filter as the post process were compared. Through the step by step module extension, the different methods were compared and analyzed. Finally, classification performance on two public datasets indicates that the proposed approach which based on the Long-term Recurrent Convolutional Networks (LRCN) model is a promising alternative.Comment: 15 page
    • …
    corecore