113 research outputs found

    DEEP-RHYTHM FOR TEMPO ESTIMATION AND RHYTHM PATTERN RECOGNITION

    Get PDF
    International audienceIt has been shown that the harmonic series at the tempo frequency of the onset-strength-function of an audio signal accurately describes its rhythm pattern and can be used to perform tempo or rhythm pattern estimation. Recently, in the case of multi-pitch estimation, the depth of the input layer of a convolutional network has been used to represent the harmonic series of pitch candidates. We use a similar idea here to represent the harmonic series of tempo candidates. We propose the Harmonic-Constant-Q-Modulation which represents, using a 4D-tensors, the harmonic series of modulation frequencies (considered as tempo frequencies) in several acoustic frequency bands over time. This representation is used as input to a convolutional network which is trained to estimate tempo or rhythm pattern classes. Using a large number of datasets, we evaluate the performance of our approach and compare it with previous approaches. We show that it slightly increases Accuracy-1 for tempo estimation but not the average-mean-Recall for rhythm pattern recognition

    Extending Deep Rhythm for Tempo and Genre Estimation Using Complex Convolutions, Multitask Learning and Multi-input Network

    Get PDF
    Tempo and genre are two inter-leaved aspects of music, genres are often associated to rhythm patterns which are played in specific tempo ranges. In this paper, we focus on the Deep Rhythm system based on a harmonic representation of rhythm used as an input to a convolutional neural network. To consider the relationships between frequency bands, we process complex-valued inputs through complex-convolutions. We also study the joint estimation of tempo/genre using a multitask learning approach. Finally, we study the addition of a second input convolutional branch to the system applied to a mel-spectrogram input dedicated to the timbre. This multi-input approach allows to improve the performances for tempo and genre estimation

    Automatic music genre classification

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, in fulfillment of the requirements for the degree of Master of Science. 2014.No abstract provided

    Content-based music structure analysis

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Feature extraction of musical content for automatic music transcription

    Get PDF
    The purpose of this thesis is to develop new methods for automatic transcription of melody and harmonic parts of real-life music signal. Music transcription is here defined as an act of analyzing a piece of music signal and writing down the parameter representations, which indicate the pitch, onset time and duration of each pitch, loudness and instrument applied in the analyzed music signal. The proposed algorithms and methods aim at resolving two key sub-problems in automatic music transcription: music onset detection and polyphonic pitch estimation. There are three original contributions in this thesis. The first is an original frequency-dependent time-frequency analysis tool called the Resonator Time-Frequency Image (RTFI). By simply defining a parameterized function mapping frequency to the exponent decay factor of the complex resonator filter bank, the RTFI can easily and flexibly implement the time-frequency analysis with different time-frequency resolutions such as ear-like (similar to human ear frequency analyzer), constant-Q or uniform (evenly-spaced) time-frequency resolutions. The corresponding multi-resolution fast implementation of RTFI has also been developed. The second original contribution consists of two new music onset detection algorithms: Energy-based detection algorithm and Pitch-based detection algorithm. The Energy-based detection algorithm performs well on the detection of hard onsets. The Pitch-based detection algorithm is the first one, which successfully exploits the pitch change clue for the onset detection in real polyphonic music, and achieves a much better performance than the other existing detection algorithms for the detection of soft onsets. The third contribution is the development of two new polyphonic pitch estimation methods. They are based on the RTFI analysis. The first proposed estimation method mainly makes best of the harmonic relation and spectral smoothing principle, consequently achieves an excellent performance on the real polyphonic music signals. The second proposed polyphonic pitch estimation method is based on the combination of signal processing and machine learning. The basic idea behind this method is to transform the polyphonic pitch estimation as a pattern recognition problem. The proposed estimation method is mainly composed by a signal processing block followed by a learning machine. Multi-resolution fast RTFI analysis is used as a signal processing component, and support vector machine (SVM) is selected as learning machine. The experimental result of the first approach show clear improvement versus the other state of the art methods

    Accurate telemonitoring of Parkinson's disease symptom severity using nonlinear speech signal processing and statistical machine learning

    Get PDF
    This study focuses on the development of an objective, automated method to extract clinically useful information from sustained vowel phonations in the context of Parkinson’s disease (PD). The aim is twofold: (a) differentiate PD subjects from healthy controls, and (b) replicate the Unified Parkinson’s Disease Rating Scale (UPDRS) metric which provides a clinical impression of PD symptom severity. This metric spans the range 0 to 176, where 0 denotes a healthy person and 176 total disability. Currently, UPDRS assessment requires the physical presence of the subject in the clinic, is subjective relying on the clinical rater’s expertise, and logistically costly for national health systems. Hence, the practical frequency of symptom tracking is typically confined to once every several months, hindering recruitment for large-scale clinical trials and under-representing the true time scale of PD fluctuations. We develop a comprehensive framework to analyze speech signals by: (1) extracting novel, distinctive signal features, (2) using robust feature selection techniques to obtain a parsimonious subset of those features, and (3a) differentiating PD subjects from healthy controls, or (3b) determining UPDRS using powerful statistical machine learning tools. Towards this aim, we also investigate 10 existing fundamental frequency (F_0) estimation algorithms to determine the most useful algorithm for this application, and propose a novel ensemble F_0 estimation algorithm which leads to a 10% improvement in accuracy over the best individual approach. Moreover, we propose novel feature selection schemes which are shown to be very competitive against widely-used schemes which are more complex. We demonstrate that we can successfully differentiate PD subjects from healthy controls with 98.5% overall accuracy, and also provide rapid, objective, and remote replication of UPDRS assessment with clinically useful accuracy (approximately 2 UPDRS points from the clinicians’ estimates), using only simple, self-administered, and non-invasive speech tests. The findings of this study strongly support the use of speech signal analysis as an objective basis for practical clinical decision support tools in the context of PD assessment.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Singing voice resynthesis using concatenative-based techniques

    Get PDF
    Tese de Doutoramento. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 201

    Automatic annotation of musical audio for interactive applications

    Get PDF
    PhDAs machines become more and more portable, and part of our everyday life, it becomes apparent that developing interactive and ubiquitous systems is an important aspect of new music applications created by the research community. We are interested in developing a robust layer for the automatic annotation of audio signals, to be used in various applications, from music search engines to interactive installations, and in various contexts, from embedded devices to audio content servers. We propose adaptations of existing signal processing techniques to a real time context. Amongst these annotation techniques, we concentrate on low and mid-level tasks such as onset detection, pitch tracking, tempo extraction and note modelling. We present a framework to extract these annotations and evaluate the performances of different algorithms. The first task is to detect onsets and offsets in audio streams within short latencies. The segmentation of audio streams into temporal objects enables various manipulation and analysis of metrical structure. Evaluation of different algorithms and their adaptation to real time are described. We then tackle the problem of fundamental frequency estimation, again trying to reduce both the delay and the computational cost. Different algorithms are implemented for real time and experimented on monophonic recordings and complex signals. Spectral analysis can be used to label the temporal segments; the estimation of higher level descriptions is approached. Techniques for modelling of note objects and localisation of beats are implemented and discussed. Applications of our framework include live and interactive music installations, and more generally tools for the composers and sound engineers. Speed optimisations may bring a significant improvement to various automated tasks, such as automatic classification and recommendation systems. We describe the design of our software solution, for our research purposes and in view of its integration within other systems.EU-FP6-IST-507142 project SIMAC (Semantic Interaction with Music Audio Contents); EPSRC grants GR/R54620; GR/S75802/01
    corecore