80 research outputs found

    The importance of F0 tracking in query-by-singing-humming

    Get PDF
    In this paper, we present a comparative study of several state-of-the-art F0 trackers applied to the context of query-by-singing-humming (QBSH). This study has been carried out using the well known, freely available, MIR-QBSH dataset in different conditions of added pub-style noise and smartphone-style distortion. For audio-to-MIDI melodic matching, we have used two state-of-the-art systems and a simple, easily reproducible baseline method. For the evaluation, we measured the QBSH performance for 189 different combinations of F0 tracker, noise/distortion conditions and matcher. Additionally, the overall accuracy of the F0 transcriptions (as defined in MIREX) was also measured. In the results, we found that F0 tracking overall accuracy correlates with QBSH performance, but it does not totally measure the suitability of a pitch vector for QBSH. In addition, we also found clear differences in robustness to F0 transcription errors between different matchers.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work has been funded by the Ministerio de Economía y Competitividad of the Spanish Government under Project No. TIN2013-47276-C6-2-R and by the Junta de Andalucía under Project No. P11-TIC-7154

    Singing information processing: techniques and applications

    Get PDF
    Por otro lado, se presenta un método para el cambio realista de intensidad de voz cantada. Esta transformación se basa en un modelo paramétrico de la envolvente espectral, y mejora sustancialmente la percepción de realismo al compararlo con software comerciales como Melodyne o Vocaloid. El inconveniente del enfoque propuesto es que requiere intervención manual, pero los resultados conseguidos arrojan importantes conclusiones hacia la modificación automática de intensidad con resultados realistas. Por último, se propone un método para la corrección de disonancias en acordes aislados. Se basa en un análisis de múltiples F0, y un desplazamiento de la frecuencia de su componente sinusoidal. La evaluación la ha realizado un grupo de músicos entrenados, y muestra un claro incremento de la consonancia percibida después de la transformación propuesta.La voz cantada es una componente esencial de la música en todas las culturas del mundo, ya que se trata de una forma increíblemente natural de expresión musical. En consecuencia, el procesado automático de voz cantada tiene un gran impacto desde la perspectiva de la industria, la cultura y la ciencia. En este contexto, esta Tesis contribuye con un conjunto variado de técnicas y aplicaciones relacionadas con el procesado de voz cantada, así como con un repaso del estado del arte asociado en cada caso. En primer lugar, se han comparado varios de los mejores estimadores de tono conocidos para el caso de uso de recuperación por tarareo. Los resultados demuestran que \cite{Boersma1993} (con un ajuste no obvio de parámetros) y \cite{Mauch2014}, tienen un muy buen comportamiento en dicho caso de uso dada la suavidad de los contornos de tono extraídos. Además, se propone un novedoso sistema de transcripción de voz cantada basada en un proceso de histéresis definido en tiempo y frecuencia, así como una herramienta para evaluación de voz cantada en Matlab. El interés del método propuesto es que consigue tasas de error cercanas al estado del arte con un método muy sencillo. La herramienta de evaluación propuesta, por otro lado, es un recurso útil para definir mejor el problema, y para evaluar mejor las soluciones propuestas por futuros investigadores. En esta Tesis también se presenta un método para evaluación automática de la interpretación vocal. Usa alineamiento temporal dinámico para alinear la interpretación del usuario con una referencia, proporcionando de esta forma una puntuación de precisión de afinación y de ritmo. La evaluación del sistema muestra una alta correlación entre las puntuaciones dadas por el sistema, y las puntuaciones anotadas por un grupo de músicos expertos

    Evaluation framework for automatic singing transcription

    Get PDF
    In this paper, we analyse the evaluation strategies used in previous works on automatic singing transcription, and we present a novel, comprehensive and freely available evaluation framework for automatic singing transcription. This framework consists of a cross-annotated dataset and a set of extended evaluation measures, which are integrated in a Matlab toolbox. The presented evaluation measures are based on standard MIREX note-tracking measures, but they provide extra information about the type of errors made by the singing transcriber. Finally, a practical case of use is presented, in which the evaluation framework has been used to perform a comparison in detail of several state-of-the-art singing transcribers.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work has been funded by the Ministerio de Economía y Competitividad of the Spanish Government under Project No. TIN2013-47276-C6-2-R and by the Junta de Andalucía under Project No. P11-TIC-7154

    Singing voice correction using canonical time warping

    Full text link
    Expressive singing voice correction is an appealing but challenging problem. A robust time-warping algorithm which synchronizes two singing recordings can provide a promising solution. We thereby propose to address the problem by canonical time warping (CTW) which aligns amateur singing recordings to professional ones. A new pitch contour is generated given the alignment information, and a pitch-corrected singing is synthesized back through the vocoder. The objective evaluation shows that CTW is robust against pitch-shifting and time-stretching effects, and the subjective test demonstrates that CTW prevails the other methods including DTW and the commercial auto-tuning software. Finally, we demonstrate the applicability of the proposed method in a practical, real-world scenario

    Computer-aided Melody Note Transcription Using the Tony Software: Accuracy and Efficiency

    Get PDF
    accepteddate-added: 2015-05-24 19:18:46 +0000 date-modified: 2017-12-28 10:36:36 +0000 keywords: Tony, melody, note, transcription, open source software bdsk-url-1: https://code.soundsoftware.ac.uk/attachments/download/1423/tony-paper_preprint.pdfdate-added: 2015-05-24 19:18:46 +0000 date-modified: 2017-12-28 10:36:36 +0000 keywords: Tony, melody, note, transcription, open source software bdsk-url-1: https://code.soundsoftware.ac.uk/attachments/download/1423/tony-paper_preprint.pdfWe present Tony, a software tool for the interactive an- notation of melodies from monophonic audio recordings, and evaluate its usability and the accuracy of its note extraction method. The scientific study of acoustic performances of melodies, whether sung or played, requires the accurate transcription of notes and pitches. To achieve the desired transcription accuracy for a particular application, researchers manually correct results obtained by automatic methods. Tony is an interactive tool directly aimed at making this correction task efficient. It provides (a) state-of-the art algorithms for pitch and note estimation, (b) visual and auditory feedback for easy error-spotting, (c) an intelligent graphical user interface through which the user can rapidly correct estimation errors, (d) extensive export functions enabling further processing in other applications. We show that Tony’s built in automatic note transcription method compares favourably with existing tools. We report how long it takes to annotate recordings on a set of 96 solo vocal recordings and study the effect of piece, the number of edits made and the annotator’s increasing mastery of the software. Tony is Open Source software, with source code and compiled binaries for Windows, Mac OS X and Linux available from https://code.soundsoftware.ac.uk/projects/tony/

    심층 신경망 기반의 음악 리드 시트 자동 채보 및 멜로디 유사도 평가

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 공과대학 산업공학과, 2023. 2. 이경식.Since the composition, arrangement, and distribution of music became convenient thanks to the digitization of the music industry, the number of newly supplied music recordings is increasing. Recently, due to platform environments being established whereby anyone can become a creator, user-created music such as their songs, cover songs, and remixes is being distributed through YouTube and TikTok. With such a large volume of musical recordings, the demand to transcribe music into sheet music has always existed for musicians. However, it requires musical knowledge and is time-consuming. This thesis studies automatic lead sheet transcription using deep neural networks. The development of transcription artificial intelligence (AI) can greatly reduce the time and cost for people in the music industry to find or transcribe sheet music. In addition, since the conversion from music sources to the form of digital music is possible, the applications could be expanded, such as music plagiarism detection and music composition AI. The thesis first proposes a model recognizing chords from audio signals. Chord recognition is an important task in music information retrieval since chords are highly abstract and descriptive features of music. We utilize a self-attention mechanism for chord recognition to focus on certain regions of chords. Through an attention map analysis, we visualize how attention is performed. It turns out that the model is able to divide segments of chords by utilizing the adaptive receptive field of the attention mechanism. This thesis proposes a note-level singing melody transcription model using sequence-to-sequence transformers. Overlapping decoding is introduced to solve the problem of the context between segments being broken. Applying pitch augmentation and adding a noisy dataset with data cleansing turns out to be effective in preventing overfitting and generalizing the model performance. Ablation studies demonstrate the effects of the proposed techniques in note-level singing melody transcription, both quantitatively and qualitatively. The proposed model outperforms other models in note-level singing melody transcription performance for all the metrics considered. Finally, subjective human evaluation demonstrates that the results of the proposed models are perceived as more accurate than the results of a previous study. Utilizing the above research results, we introduce the entire process of an automatic music lead sheet transcription. By combining various music information recognized from audio signals, we show that it is possible to transcribe lead sheets that express the core of popular music. Furthermore, we compare the results with lead sheets transcribed by musicians. Finally, we propose a melody similarity assessment method based on self-supervised learning by applying the automatic lead sheet transcription. We present convolutional neural networks that express the melody of lead sheet transcription results in embedding space. To apply self-supervised learning, we introduce methods of generating training data by musical data augmentation techniques. Furthermore, a loss function is presented to utilize the training data. Experimental results demonstrate that the proposed model is able to detect similar melodies of popular music from plagiarism and cover song cases.음악 산업의 디지털화를 통해 음악의 작곡, 편곡 및 유통이 편리해졌기 때문에 새롭게 공급되는 음원의 수가 증가하고 있다. 최근에는 누구나 크리에이터가 될 수 있는 플랫폼 환경이 구축되어, 사용자가 만든 자작곡, 커버곡, 리믹스 등이 유튜브, 틱톡을 통해 유통되고 있다. 이렇게 많은 양의 음악에 대해, 음악을 악보로 채보하고자 하는 수요는 음악가들에게 항상 존재했다. 그러나 악보 채보에는 음악적 지식이 필요하고, 시간과 비용이 많이 소요된다는 문제점이 있다. 본 논문에서는 심층 신경망을 활용하여 음악 리드 시트 악보 자동 채보 기법을 연구한다. 채보 인공지능의 개발은 음악 종사자 및 연주자들이 악보를 구하거나 만들기 위해 소모하는 시간과 비용을 크게 줄여 줄 수 있다. 또한 음원에서 디지털 악보 형태로 변환이 가능해지므로, 자동 표절 탐지, 작곡 인공지능 학습 등 다양하게 활용이 가능하다. 리드 시트 채보를 위해, 먼저 오디오 신호로부터 코드를 인식하는 모델을 제안한다. 음악에서 코드는 함축적이고 표현적인 음악의 중요한 특징이므로 이를 인식하는 것은 매우 중요하다. 코드 구간 인식을 위해, 어텐션 매커니즘을 이용하는 트랜스포머 기반 모델을 제시한다. 어텐션 지도 분석을 통해, 어텐션이 실제로 어떻게 적용되는지 시각화하고, 모델이 코드의 구간을 나누고 인식하는 과정을 살펴본다. 그리고 시퀀스 투 시퀀스 트랜스포머를 이용한 음표 수준의 가창 멜로디 채보 모델을 제안한다. 디코딩 과정에서 각 구간 사이의 문맥 정보가 단절되는 문제를 해결하기 위해 중첩 디코딩을 도입한다. 데이터 변형 기법으로 음높이 변형을 적용하는 방법과 데이터 클렌징을 통해 학습 데이터를 추가하는 방법을 소개한다. 정량 및 정성적인 비교를 통해 제안한 기법들이 성능 개선에 도움이 되는 것을 확인하였고, 제안모델이 MIR-ST500 데이터 셋에 대한 음표 수준의 가창 멜로디 채보 성능에서 가장 우수한 성능을 보였다. 추가로 주관적인 사람의 평가에서 제안 모델의 채보 결과가 이전 모델보다 저 정확하다고 인식됨을 확인하였다. 앞의 연구의 결과를 활용하여, 음악 리드 시트 자동 채보의 전체 과정을 제시한다. 오디오 신호로부터 인식한 다양한 음악 정보를 종합하여, 대중 음악 오디오 신호의 핵심을 표현하는 리드 시트 악보 채보가 가능함을 보인다. 그리고 이를 전문가가 제작한 리드시트와 비교하여 분석한다. 마지막으로 리드 시트 악보 자동 채보 기법을 응용하여, 자기 지도 학습 기반 멜로디 유사도 평가 방법을 제안한다. 리드 시트 채보 결과의 멜로디를 임베딩 공간에 표현하는 합성곱 신경망 모델을 제시한다. 자기지도 학습 방법론을 적용하기 위해, 음악적 데이터 변형 기법을 적용하여 학습 데이터를 생성하는 방법을 제안한다. 그리고 준비된 학습 데이터를 활용하는 심층 거리 학습 손실함수를 설계한다. 실험 결과 분석을 통해, 제안 모델이 표절 및 커버송 케이스에서 대중음악의 유사한 멜로디를 탐지할 수 있음을 확인한다.Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Objectives 4 1.3 Thesis Outline 6 Chapter 2 Literature Review 7 2.1 Attention Mechanism and Transformers 7 2.1.1 Attention-based Models 7 2.1.2 Transformers with Musical Event Sequence 8 2.2 Chord Recognition 11 2.3 Note-level Singing Melody Transcription 13 2.4 Musical Key Estimation 15 2.5 Beat Tracking 17 2.6 Music Plagiarism Detection and Cover Song Identi cation 19 2.7 Deep Metric Learning and Triplet Loss 21 Chapter 3 Problem De nition 23 3.1 Lead Sheet Transcription 23 3.1.1 Chord Recognition 24 3.1.2 Singing Melody Transcription 25 3.1.3 Post-processing for Lead Sheet Representation 26 3.2 Melody Similarity Assessment 28 Chapter 4 A Bi-directional Transformer for Musical Chord Recognition 29 4.1 Methodology 29 4.1.1 Model Architecture 29 4.1.2 Self-attention in Chord Recognition 33 4.2 Experiments 35 4.2.1 Datasets 35 4.2.2 Preprocessing 35 4.2.3 Evaluation Metrics 36 4.2.4 Training 37 4.3 Results 38 4.3.1 Quantitative Evaluation 38 4.3.2 Attention Map Analysis 41 Chapter 5 Note-level Singing Melody Transcription 44 5.1 Methodology 44 5.1.1 Monophonic Note Event Sequence 44 5.1.2 Audio Features 45 5.1.3 Model Architecture 46 5.1.4 Autoregressive Decoding and Monophonic Masking 47 5.1.5 Overlapping Decoding 47 5.1.6 Pitch Augmentation 49 5.1.7 Adding Noisy Dataset with Data Cleansing 50 5.2 Experiments 51 5.2.1 Dataset 51 5.2.2 Experiment Con gurations 52 5.2.3 Evaluation Metrics 53 5.2.4 Comparison Models 54 5.2.5 Human Evaluation 55 5.3 Results 56 5.3.1 Ablation Study 56 5.3.2 Note-level Transcription Model Comparison 59 5.3.3 Transcription Performance Distribution Analysis 59 5.3.4 Fundamental Frequency (F0) Metric Evaluation 60 5.4 Qualitative Analysis 62 5.4.1 Visualization of Ablation Study 62 5.4.2 Spectrogram Analysis 65 5.4.3 Human Evaluation 67 Chapter 6 Automatic Music Lead Sheet Transcription 68 6.1 Post-processing for Lead Sheet Representation 68 6.2 Lead Sheet Transcription Results 71 Chapter 7 Melody Similarity Assessment with Self-supervised Convolutional Neural Networks 77 7.1 Methodology 77 7.1.1 Input Data Representation 77 7.1.2 Data Augmentation 78 7.1.3 Model Architecture 82 7.1.4 Loss Function 84 7.1.5 De nition of Distance between Songs 85 7.2 Experiments 87 7.2.1 Dataset 87 7.2.2 Training 88 7.2.3 Evaluation Metrics 88 7.3 Results 89 7.3.1 Quantitative Evaluation 89 7.3.2 Qualitative Evaluation 99 Chapter 8 Conclusion 107 8.1 Summary and Contributions 107 8.2 Limitations and Future Research 110 Bibliography 111 국문초록 126박

    Automatic transcription of traditional Turkish art music recordings: A computational ethnomusicology appraoach

    Get PDF
    Thesis (Doctoral)--Izmir Institute of Technology, Electronics and Communication Engineering, Izmir, 2012Includes bibliographical references (leaves: 96-109)Text in English; Abstract: Turkish and Englishxi, 131 leavesMusic Information Retrieval (MIR) is a recent research field, as an outcome of the revolutionary change in the distribution of, and access to the music recordings. Although MIR research already covers a wide range of applications, MIR methods are primarily developed for western music. Since the most important dimensions of music are fundamentally different in western and non-western musics, developing MIR methods for non-western musics is a challenging task. On the other hand, the discipline of ethnomusicology supplies some useful insights for the computational studies on nonwestern musics. Therefore, this thesis overcomes this challenging task within the framework of computational ethnomusicology, a new emerging interdisciplinary research domain. As a result, the main contribution of this study is the development of an automatic transcription system for traditional Turkish art music (Turkish music) for the first time in the literature. In order to develop such system for Turkish music, several subjects are also studied for the first time in the literature which constitute other contributions of the thesis: Automatic music transcription problem is considered from the perspective of ethnomusicology, an automatic makam recognition system is developed and the scale theory of Turkish music is evaluated computationally for nine makamlar in order to understand whether it can be used for makam detection. Furthermore, there is a wide geographical region such as Middle-East, North Africa and Asia sharing similarities with Turkish music. Therefore our study would also provide more relevant techniques and methods than the MIR literature for the study of these non-western musics

    Content-based music structure analysis

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    From heuristics-based to data-driven audio melody extraction

    Get PDF
    The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications
    corecore