6 research outputs found

    MELODY EXTRACTION ON VOCAL SEGMENTS USING MULTI-COLUMN DEEP NEURAL NETWORKS

    Get PDF
    ABSTRACT Singing melody extraction is a task that tracks pitch contour of singing voice in polyphonic music. While the majority of melody extraction algorithms are based on computing a saliency function of pitch candidates or separating the melody source from the mixture, data-driven approaches based on classification have been rarely explored. In this paper, we present a classification-based approach for melody extraction on vocal segments using multi-column deep neural networks. In the proposed model, each of neural networks is trained to predict a pitch label of singing voice from spectrogram, but their outputs have different pitch resolutions. The final melody contour is inferred by combining the outputs of the networks and post-processing it with a hidden Markov model. In order to take advantage of the data-driven approach, we also augment training data by pitch-shifting the audio content and modifying the pitch label accordingly. We use the RWC dataset and vocal tracks of the MedleyDB dataset for training the model and evaluate it on the ADC 2004, MIREX 2005 and MIR-1k datasets. Through several settings of experiments, we show incremental improvements of the melody prediction. Lastly, we compare our best result to those of previous state-of-the-arts

    From heuristics-based to data-driven audio melody extraction

    Get PDF
    The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications

    Automatic transcription of the melody from polyphonic music

    Get PDF
    This dissertation addresses the problem of melody detection in polyphonic musical audio. The proposed algorithm uses a bottom-up design, in which each module leads to a more abstract representation of the audio data, which allows a very efficient computation of the melody. Nonetheless, the dataflow is not strictly unidirectional: on several occasions, feedback from higher processing modules controls the processing of low-level modules. The spectral analysis is based on a technique for the efficient computation of short-time Fourier spectra in different time-frequency resolutions. The pitch determination algorithm (PDA) is based on the pair-wise analysis of spectral peaks. Although melody detection implies a strong focus on the predominant voice, the proposed tone processing module aims at extracting multiple fundamental frequencies (F0). In order to identify the melody, the best succession of tones has to be chosen. This thesis describes an efficient computational method for auditory stream segregation that processes a variable number of simultaneous voices. The presented melody extraction algorithm has been evaluated during the MIREX audio melody extraction task. The MIREX results show that the proposed algorithm belongs to the state-of-the-art-algorithms, reaching the best overall accuracy in MIREX 2014.Diese Dissertation befasst sich mit dem Problem der Melodiextraktion aus polyphonem musikalischen Audio. Der vorgestellte Algorithmus umfasst ein „bottom-up“-Design, in dem jedes dieser Module eine abstraktere Darstellung der Audiodaten liefert, was eine effiziente Extraktion der Melodie erlaubt. Allerdings ist der Datenstrom nicht unidirektional -- bei verschiedenen Gelegenheiten steuert Feedback von höheren Verarbeitungsmodulen die Verarbeitung von vorangestellten Modulen. Die Spektralanalyse basiert auf einer Technik zur effizienten Berechnung von Kurzzeit-Fourier-Spektren in verschiedenen Zeit-Frequenz-Auflösungen. Der Pitchbestimmungsalgorithmus basiert auf der paarweisen Analyse von spektralen Maxima. Obwohl die Melodieextraktion einen starken Fokus auf die vorherrschende Stimme voraussetzt, zielt das Tonverabeitungsmodul auf eine Extraktion von allen auftretenden Grundfrequenzen (F0) ab. Um die Melodiestimme zu identifizieren, muss die beste Abfolge von Tönen ausgewählt werden. Diese Dissertation beschreibt eine effiziente Methode für die automatische Segregation von sogenannten auditiven Klangströmen. Dabei wird eine variable Anzahl von gleichzeitigen Stimmen verarbeitet. Der vorgestellte Melodieextraktionsalgorithmus wurde im MIREX „audio melody extraction task“ evaluiert. Die Resultate zeigen, dass der Algorithmus zum Stand der Technik gehört – es wurde die beste Gesamtgenauigkeit der im Jahr 2014 ausgewerteten Algorithmen erreicht

    An integrative computational modelling of music structure apprehension

    Get PDF

    Classification-based melody transcription

    No full text
    The melody of a musical piece – informally, the part you would hum along with – is a useful and compact summary of a full audio recording. The extraction of melodic content has practical applications ranging from content-based audio retrieval to the analysis of musical structure. Whereas previous systems generate transcriptions based on a model of the harmonic (or periodic) structure of musical pitches, we present a classification-based system for performing automatic melody transcription that makes no assumptions beyond what is learned from its training data. We evaluate the success of our algorithm by predicting the melody of the ADC 2004 Melody Competition evaluation set, and we show that a simple framelevel note classifier, temporally smoothed by post processing with a hidden Markov model, produces results comparable to state of the art model-based transcription systems.

    DOI 10.1007/s10994-006-8373-9 Classification-based melody transcription

    No full text
    Abstract The melody of a musical piece—informally, the part you would hum along with— is a useful and compact summary of a full audio recording. The extraction of melodic content has practical applications ranging from content-based audio retrieval to the analysis of musical structure. Whereas previous systems generate transcriptions based on a model of the harmonic (or periodic) structure of musical pitches, we present a classification-based system for performing automatic melody transcription that makes no assumptions beyond what is learned from its training data. We evaluate the success of our algorithm by predicting the melody of the ADC 2004 Melody Competition evaluation set, and we show that a simple frame-level note classifier, temporally smoothed by post processing with a hidden Markov model, produces results comparable to state of the art model-based transcription systems
    corecore