12 research outputs found

    Electroglottography based real-time voice-to-MIDI controller

    Get PDF
    Voice-to-MIDI real-time conversion is a challenging problem that comes with a series of obstacles and complications. The main issue is the tracking of the human voice pitch. Extracting the voice fundamental frequency can be inaccurate and highly computationally exacting due to the spectral complexity of voice signals. In addition, on account of microphone usage, the presence of environmental noise can further affect voice processing. An analysis of the current research and status of the market shows a plethora of voice-to-MIDI implementations revolving around the processing of audio signals deriving from microphones. This paper addresses the above-mentioned issues by implementing a novel experimental method where electroglottography is employed instead of microphones as a source for pitch-tracking. In the proposed system, the signal is processed and converted through an embedded hardware device. The use of electroglottography improves both the accuracy of pitch evaluation and the ease of voice information processing; firstly, it provides a direct measurement of the vocal folds' activity and, secondly, it bypasses the interferences caused by external sound sources. This allows the extraction of a simpler and cleaner signal that yields a more effective evaluation of the fundamental frequency during phonation. The proposed method delivers a faster and less computationally demanding conversion thus in turn, allowing for an efficacious real-time voice-to-MIDI conversion

    노래 신호의 자동 전사

    Get PDF
    학위논문 (박사)-- 서울대학교 융합과학기술대학원 융합과학부, 2017. 8. 이교구.Automatic music transcription refers to an automatic extraction of musical attributes such as notes from an audio signal to a symbolic level. The symbolized music data are applicable for various purposes such as music education and production by providing higher-level information to both consumers and creators. Although the singing voice is the easiest one to listen and play among various music signals, traditional transcription methods for musical instruments are not suitable due to the acoustic complexity in the human voice. The main goal of this thesis is to develop a fully-automatic singing transcription system that exceeds existing methods. We first take a look at some typical approaches for pitch tracking and onset detection, which are two fundamental tasks of music transcription, and then propose several methods for each task. In terms of pitch tracking, we examine the effect of data sampling on the performance of periodicity analysis of music signals. For onset detection, the local homogeneity in the harmonic structure is exploited through the cepstral analysis and unsupervised classification. The final transcription system includes feature extraction and probabilistic model of the harmonic structure, and note transition based on the hidden Markov model. It achieved the best performance (an F-measure of 82%) in the note-level evaluation including the state-of-the-art systems.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Definitions 5 1.2.1 Musical keywords 5 1.2.2 Scientific keywords 7 1.2.3 Representations 7 1.3 Problems in singing transcription 9 1.4 Topics of interest 10 1.5 Outline of the thesis 13 Chapter 2 Background 16 2.1 Pitch estimation 17 2.1.1 Time-domain methods 17 2.1.2 Frequency-domain methods 18 2.2 Note segmentation 20 2.2.1 Onset detection 20 2.2.2 Offset detection 23 2.3 Singing transcription 24 2.4 Evaluation methodology 26 2.4.1 Pitch estimation 26 2.4.2 Note segmentation 27 2.4.3 Dataset 28 2.5 Summary 31 Chapter 3 Periodicity Analysis by Sampling in the Time/Frequency Domain for Pitch Tracking 32 3.1 Introduction 32 3.2 Data sampling 34 3.3 Sampled ACF/DF in the time domain 37 3.4 Sampled ACF/DF in the frequency domain 38 3.5 Iterative F0 estimation 40 3.6 Experimental setup 42 3.7 Result 46 3.8 Summary 49 Chapter 4 Note Onset Detection based on Harmonic Cepstrum regularity 50 4.1 Introduction 50 4.2 Cepstral analysis 52 4.3 Harmonic cepstrum regularity 56 4.3.1 Harmonic quefrency selection 57 4.3.2 Sub-harmonic regularity function 58 4.3.3 Adaptive thresholding 59 4.3.4 Picking onsets 59 4.4 Experiments 61 4.4.1 Dataset description 61 4.4.2 Evaluation results 62 4.5 Summary 64 Chapter 5 Robust Singing Transcription System using Local Homogeneity in the Harmonic Structure 66 5.1 Introduction 66 5.2 F0 tracking 71 5.3 Feature extraction 72 5.4 Mixture model 76 5.5 Note detection 80 5.5.1 Transition boundary detection 81 5.5.2 Note boundary selection 83 5.5.3 Note pitch decision 84 5.6 Evaluation 86 5.6.1 Dataset 86 5.6.2 Criteria and measures 87 5.6.3 Experimental setup 89 5.7 Results and discussions 90 5.7.1 Failure analysis 95 5.8 Summary 97 Chapter 6 Conclusion and Future Work 99 6.1 Contributions 99 6.2 Future work 103 6.2.1 Precise partial tracking using instantaneous frequency 103 6.2.2 Linguistic model for note segmentation 105 Appendix 108 Derivation of the instantaneous frequency 108 Bibliography 110 초 록 124Docto

    An efficient phonation-driven control system using laryngeal bioimpedance and machine learning

    Get PDF
    The extraction and conversion of human voice information are crucial in several applications across multiple subject areas such as medicine, music technology and human-computer interaction. The presented research employs the variation of laryngeal bioimpedance, measured during phonation, for extracting and processing voice information. Compared to sound recordings and microphones, bioimpedance readings deliver a much simpler signal, allowing fast and computationally non-taxing processing. In the first stage of this research, a novel system for measuring laryngeal bioimpedance was designed and built. The circuit design was implemented with a multiplexed sensor system based on multiple electrode pairs to allow self-calibration of the sensors and increase usability and applicability. In the following stage, the resulting device was used to generate a novel dataset of laryngeal bioimpedance measurements for the distinction of speech and singing. This was then used in the training and deployment of an Artificial Neural Network using the Mel Frequency Cepstrum Coefficients of the recorded bioimpedance measurements. A real-time system for converting voice into digital control messages was developed and presented as the third stage of this research. The system was implemented using the MIDI protocol for using voice to control hardware and software electronic instruments. The thesis then concludes with the integration of the complete system. The conducted research results in a self-calibrating device for the measurement of laryngeal bioimpedance which delivers an fast and efficacious real-time voice-to-MIDI conversion. In addition, the creation of a unique dataset for the distinction of singing and speech allowed the deployment of real-time classification system. Collectively, the proposed system improves applicability and usability of laryngeal bioimpedance and expands the existing knowledge in the distinction of speech and singing

    Automatic transcription of traditional Turkish art music recordings: A computational ethnomusicology appraoach

    Get PDF
    Thesis (Doctoral)--Izmir Institute of Technology, Electronics and Communication Engineering, Izmir, 2012Includes bibliographical references (leaves: 96-109)Text in English; Abstract: Turkish and Englishxi, 131 leavesMusic Information Retrieval (MIR) is a recent research field, as an outcome of the revolutionary change in the distribution of, and access to the music recordings. Although MIR research already covers a wide range of applications, MIR methods are primarily developed for western music. Since the most important dimensions of music are fundamentally different in western and non-western musics, developing MIR methods for non-western musics is a challenging task. On the other hand, the discipline of ethnomusicology supplies some useful insights for the computational studies on nonwestern musics. Therefore, this thesis overcomes this challenging task within the framework of computational ethnomusicology, a new emerging interdisciplinary research domain. As a result, the main contribution of this study is the development of an automatic transcription system for traditional Turkish art music (Turkish music) for the first time in the literature. In order to develop such system for Turkish music, several subjects are also studied for the first time in the literature which constitute other contributions of the thesis: Automatic music transcription problem is considered from the perspective of ethnomusicology, an automatic makam recognition system is developed and the scale theory of Turkish music is evaluated computationally for nine makamlar in order to understand whether it can be used for makam detection. Furthermore, there is a wide geographical region such as Middle-East, North Africa and Asia sharing similarities with Turkish music. Therefore our study would also provide more relevant techniques and methods than the MIR literature for the study of these non-western musics

    Modelling Professional Singers: A Bayesian Machine Learning Approach with Enhanced Real-time Pitch Contour Extraction and Onset Processing from an Extended Dataset.

    Get PDF
    Singing signals are one of the input data that computer systems need to analyse, and singing is part of all the cultures in the world. However, although there have been several studies on audio signal processing during the last three decades, it is still an active research area because most of the available algorithms in the literature require improvement due to the complexity of audio/music signals. More efforts are needed for analysing sounds/music in a real-time environment since the algorithms should work only on the past data, while in an offline system, all the required data are available. In addition, the complexity of the data will be increased if the audio signals come from singing due to the unique features of singing signals (such as vocal system, vibration, pitch drift, and tuning approach) that make the signals different and more complicated than those from an instrument. This thesis is mainly focused on analysing singing signals and better understanding how trained- professional singers sing the pitch frequency and duration of the notes according to their position in a piece of music and the singing technique applied. To do this, it is discovered that by incorporating singing features, such as gender and BPM, a real-time pitch detection algorithm can be found to estimate fundamental frequencies with fewer errors. In addition, two novel algorithms were proposed, one for smoothing pitch contours and another for estimating onset, offset, and the transition between notes. These two algorithms showed better results as compared to several other state-of-the-art algorithms. Moreover, a new vocal dataset that included several annotations for 2688 singing files was published. Finally, this thesis presents two models for calculating pitches and the duration of notes according to their positions in a piece of music. In conclusion, optimizing results for pitch-oriented Music Information Retrieval (MIR) algorithms necessitates adapting/selecting them based on the unique characteristics of the signals. Achieving a universal algorithm that performs exceptionally well on all data types remains a formidable challenge given the current state of technology

    SMC 2009

    Full text link

    Un estudio de identificación por tarareo para cante flamenco

    Get PDF
    El flamenco como entidad musical tiene su base en la voz cantada, llamada “cante” en el argot flamenco, donde predomina una ornamentación barroca que suele ser improvisada. Esto provoca una serie de retos tecnológicos a la hora de automatizar las tareas de estudio. En el presente trabajo se aborda el diseño e implementación de un sistema de identificación de melodías mediante consultas realizadas sobre grabaciones de cante flamenco obtenidas en un trabajo de campo realizado con cantaores en Sevilla y Jerez. Se propone una estrategia que usa un algoritmo de similitud melódica basado en el método de Needleman-Wunsch. Los resultados obtenidos muestran la competencia del método que resulta ser el primero que aborda la tarea de query by humming para cante flamenco.Flamenco as a musical entity is based on the singing voice, called “cante” in the flamenco argot, where a baroque ornamentation predominates and it is usually the improvised part of the performance. This musical aesthetics causes a series of technological challenges. This paper deals with the design and implementation of a query by humming method for flamenco singing using a data base with recordings obtained in a field work carried out with singers in Seville and Jerez. A strategy using a new melodic similarity algorithm based on the Needleman-Wunsch method is proposed. We discuss the results obtained with our approach, the first one that addresses the query by humming problem for flamenco singing.Universidad de Sevilla. Grado en Ingeniería de Tecnologías Industriale

    Interpersonal synchronization in ensemble singing: the roles of visual contact and leadership, and evolution across rehearsals

    Get PDF
    Interpersonal synchronization between musicians in Western ensembles is a fundamental performance parameter, contributing to the expressiveness of ensemble performances. Synchronization might be affected by the visual contact between musicians, leadership, and rehearsals, although the nature of these relationships has not been fully investigated. This thesis centres on the synchronization between singers in a cappella singing ensembles, in relation to the roles of visual cues and leadership instruction in 12 duos, and the evolution of synchronization and leader-follower relationships emerging spontaneously across five rehearsals in a newly formed quintet. In addition, the developmental aspects of synchronization are investigated in parallel to tuning and verbal interactions, to contextualise synchronization within the wider scope of expressive performance behaviours. Three empirical investigations were conducted to study synchronization in singing ensembles, through a novel algorithm developed for this research, based on the application of electrolaryngography and acoustic analysis. Findings indicate that synchronisation is a complex issue in terms of performance and perception. Synchronization was better with visual contact between singers than without in singing duos, and improved across rehearsals in the quintet depending on the piece performed. Leadership instruction did not affect precision or consistency of synchronization in singing duos; however, when the upper voice was instructed to lead, the designated leader preceded the co-performer. Leadership changed across rehearsals, becoming equally distributed in the last rehearsal. Differences in the precision of synchronization related to altered visual contact were reflected in the perception of synchronization irrespective of the listeners’ music expertise, but the smaller asynchrony patterns measured across rehearsals were not. Synchronization in the quintet was not the result of rehearsal strategies targeted for the purpose of synchronization during rehearsal, but was paired with a tendency to tune horizontally towards equal temperament (ET), and to ET and just intonation in the vertical tuning of third intervals
    corecore