Search CORE

939 research outputs found

Analysis on Using Synthesized Singing Techniques in Assistive Interfaces for Visually Impaired to Study Music

Author: Jayaratne Dr. Lakshman
Ranasinghe Kavindu
Publication venue: GSTF Journal on Computing (JoC)
Publication date: 10/04/2016
Field of study

Tactile and auditory senses are the basic types of methods that visually impaired people sense the world. Their interaction with assistive technologies also focuses mainly on tactile and auditory interfaces. This research paper discuss about the validity of using most appropriate singing synthesizing techniques as a mediator in assistive technologies specifically built to address their music learning needs engaged with music scores and lyrics. Music scores with notations and lyrics are considered as the main mediators in musical communication channel which lies between a composer and a performer. Visually impaired music lovers have less opportunity to access this main mediator since most of them are in visual format. If we consider a music score, the vocal performer’s melody is married to all the pleasant sound producible in the form of singing. Singing best fits for a format in temporal domain compared to a tactile format in spatial domain. Therefore, conversion of existing visual format to a singing output will be the most appropriate nonlossy transition as proved by the initial research on adaptive music score trainer for visually impaired [1]. In order to extend the paths of this initial research, this study seek on existing singing synthesizing techniques and researches on auditory interfaces

GSTF Digital Library (GSTF-DL): Open Journal Systems (Global Science and Technology Forum)

Songs Search Using Human Humming Voice

Author: Abdul Rahman Mohamad Ariff
Publication venue: Universiti teknologi petronas
Publication date: 01/07/2007
Field of study

The system is developed to find songs stored in the database using human humming voice, whereby a sample of the humming voice is compared to songs stored in the system. The main function of the system is to find songs only by humming to the melody of the song. The scopes for this project are human humming voice, voice capture in WA V format, songs database, and MIDI file comparing algorithm. Methodologies used in this system are based on system analysis and design methodology comprising planning, analysis, design and implementation. Java programming language is used to build the system. The system has the functionality of humming voice recording and algorithms comparing both humming voice and song files in the system to fmd the right song. The intended result of this system is to display the titles of the song and similarity percentage between humming voice melody and songs in the system

UTPedia

Lyrics-to-Audio Alignment and its Application

Author: Fujihara Hiromasa
Goto Masataka
Publication venue: Dagstuhl Follow-Ups. Multimodal Music Processing
Publication date: 01/01/2012
Field of study

Automatic lyrics-to-audio alignment techniques have been drawing attention in the last years and various studies have been made in this field. The objective of lyrics-to-audio alignment is to estimate a temporal relationship between lyrics and musical audio signals and can be applied to various applications such as Karaoke-style lyrics display. In this contribution, we provide an overview of recent development in this research topic, where we put a particular focus on categorization of various methods and on applications

CiteSeerX

Dagstuhl Research Online Publication Server

조건부 자기회귀형 인공신경망을 이용한 제어 가능한 가창 음성 합성

Author: 이주헌
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 융합과학기술대학원 지능정보융합학과, 2022. 8. 이교구.Singing voice synthesis aims at synthesizing a natural singing voice from given input information. A successful singing synthesis system is important not only because it can significantly reduce the cost of the music production process, but also because it helps to more easily and conveniently reflect the creator's intentions. However, there are three challenging problems in designing such a system - 1) It should be possible to independently control the various elements that make up the singing. 2) It must be possible to generate high-quality sound sources, 3) It is difficult to secure sufficient training data. To deal with this problem, we first paid attention to the source-filter theory, which is a representative speech production modeling technique. We tried to secure training data efficiency and controllability at the same time by modeling a singing voice as a convolution of the source, which is pitch information, and filter, which is the pronunciation information, and designing a structure that can model each independently. In addition, we used a conditional autoregressive model-based deep neural network to effectively model sequential data in a situation where conditional inputs such as pronunciation, pitch, and speaker are given. In order for the entire framework to generate a high-quality sound source with a distribution more similar to that of a real singing voice, the adversarial training technique was applied to the training process. Finally, we applied a self-supervised style modeling technique to model detailed unlabeled musical expressions. We confirmed that the proposed model can flexibly control various elements such as pronunciation, pitch, timbre, singing style, and musical expression, while synthesizing high-quality singing that is difficult to distinguish from ground truth singing. Furthermore, we proposed a generation and modification framework that considers the situation applied to the actual music production process, and confirmed that it is possible to apply it to expand the limits of the creator's imagination, such as new voice design and cross-generation.가창 합성은 주어진 입력 악보로부터 자연스러운 가창 음성을 합성해내는 것을 목표로 한다. 가창 합성 시스템은 음악 제작 비용을 크게 줄일 수 있을 뿐만 아니라 창작자의 의도를 보다 쉽고 편리하게 반영할 수 있도록 돕는다. 하지만 이러한 시스템의 설계를 위해서는 다음 세 가지의 도전적인 요구사항이 존재한다. 1) 가창을 이루는 다양한 요소를 독립적으로 제어할 수 있어야 한다. 2) 높은 품질 수준 및 사용성을 달성해야 한다. 3) 충분한 훈련 데이터를 확보하기 어렵다. 이러한 문제에 대응하기 위해 우리는 대표적인 음성 생성 모델링 기법인 소스-필터 이론에 주목하였다. 가창 신호를 음정 정보에 해당하는 소스와 발음 정보에 해당하는 필터의 합성곱으로 정의하고, 이를 각각 독립적으로 모델링할 수 있는 구조를 설계하여 훈련 데이터 효율성과 제어 가능성을 동시에 확보하고자 하였다. 또한 우리는 발음, 음정, 화자 등 조건부 입력이 주어진 상황에서 시계열 데이터를 효과적으로 모델링하기 위하여 조건부 자기회귀 모델 기반의 심층신경망을 활용하였다. 마지막으로 레이블링 되어있지 않은 음악적 표현을 모델링할 수 있도록 우리는 자기지도학습 기반의 스타일 모델링 기법을 제안했다. 우리는 제안한 모델이 발음, 음정, 음색, 창법, 표현 등 다양한 요소를 유연하게 제어하면서도 실제 가창과 구분이 어려운 수준의 고품질 가창 합성이 가능함을 확인했다. 나아가 실제 음악 제작 과정을 고려한 생성 및 수정 프레임워크를 제안하였고, 새로운 목소리 디자인, 교차 생성 등 창작자의 상상력과 한계를 넓힐 수 있는 응용이 가능함을 확인했다.1 Introduction 1 1.1 Motivation 1 1.2 Problems in singing voice synthesis 4 1.3 Task of interest 8 1.3.1 Single-singer SVS 9 1.3.2 Multi-singer SVS 10 1.3.3 Expressive SVS 11 1.4 Contribution 11 2 Background 13 2.1 Singing voice 14 2.2 Source-filter theory 18 2.3 Autoregressive model 21 2.4 Related works 22 2.4.1 Speech synthesis 25 2.4.2 Singing voice synthesis 29 3 Adversarially Trained End-to-end Korean Singing Voice Synthesis System 31 3.1 Introduction 31 3.2 Related work 33 3.3 Proposed method 35 3.3.1 Input representation 35 3.3.2 Mel-synthesis network 36 3.3.3 Super-resolution network 38 3.4 Experiments 42 3.4.1 Dataset 42 3.4.2 Training 42 3.4.3 Evaluation 43 3.4.4 Analysis on generated spectrogram 46 3.5 Discussion 49 3.5.1 Limitations of input representation 49 3.5.2 Advantages of using super-resolution network 53 3.6 Conclusion 55 4 Disentangling Timbre and Singing Style with multi-singer Singing Synthesis System 57 4.1Introduction 57 4.2 Related works 59 4.2.1 Multi-singer SVS system 60 4.3 Proposed Method 60 4.3.1 Singer identity encoder 62 4.3.2 Disentangling timbre & singing style 64 4.4 Experiment 64 4.4.1 Dataset and preprocessing 64 4.4.2 Training & inference 65 4.4.3 Analysis on generated spectrogram 65 4.4.4 Listening test 66 4.4.5 Timbre & style classification test 68 4.5 Discussion 70 4.5.1 Query audio selection strategy for singer identity encoder 70 4.5.2 Few-shot adaptation 72 4.6 Conclusion 74 5 Expressive Singing Synthesis Using Local Style Token and Dual-path Pitch Encoder 77 5.1 Introduction 77 5.2 Related work 79 5.3 Proposed method 80 5.3.1 Local style token module 80 5.3.2 Dual-path pitch encoder 85 5.3.3 Bandwidth extension vocoder 85 5.4 Experiment 86 5.4.1 Dataset 86 5.4.2 Training 86 5.4.3 Qualitative evaluation 87 5.4.4 Dual-path reconstruction analysis 89 5.4.5 Qualitative analysis 90 5.5 Discussion 93 5.5.1 Difference between midi pitch and f0 93 5.5.2 Considerations for use in the actual music production process 94 5.6 Conclusion 95 6 Conclusion 97 6.1 Thesis summary 97 6.2 Limitations and future work 99 6.2.1 Improvements to a faster and robust system 99 6.2.2 Explainable and intuitive controllability 101 6.2.3 Extensions to common speech synthesis tools 103 6.2.4 Towards a collaborative and creative tool 104박

SNU Open Repository and Archive

Singing information processing: techniques and applications

Author: Molina Martinez Emilio
Publication venue: UMA Editorial
Publication date: 01/01/2017
Field of study

Por otro lado, se presenta un método para el cambio realista de intensidad de voz cantada. Esta transformación se basa en un modelo paramétrico de la envolvente espectral, y mejora sustancialmente la percepción de realismo al compararlo con software comerciales como Melodyne o Vocaloid. El inconveniente del enfoque propuesto es que requiere intervención manual, pero los resultados conseguidos arrojan importantes conclusiones hacia la modificación automática de intensidad con resultados realistas. Por último, se propone un método para la corrección de disonancias en acordes aislados. Se basa en un análisis de múltiples F0, y un desplazamiento de la frecuencia de su componente sinusoidal. La evaluación la ha realizado un grupo de músicos entrenados, y muestra un claro incremento de la consonancia percibida después de la transformación propuesta.La voz cantada es una componente esencial de la música en todas las culturas del mundo, ya que se trata de una forma increíblemente natural de expresión musical. En consecuencia, el procesado automático de voz cantada tiene un gran impacto desde la perspectiva de la industria, la cultura y la ciencia. En este contexto, esta Tesis contribuye con un conjunto variado de técnicas y aplicaciones relacionadas con el procesado de voz cantada, así como con un repaso del estado del arte asociado en cada caso. En primer lugar, se han comparado varios de los mejores estimadores de tono conocidos para el caso de uso de recuperación por tarareo. Los resultados demuestran que \cite{Boersma1993} (con un ajuste no obvio de parámetros) y \cite{Mauch2014}, tienen un muy buen comportamiento en dicho caso de uso dada la suavidad de los contornos de tono extraídos. Además, se propone un novedoso sistema de transcripción de voz cantada basada en un proceso de histéresis definido en tiempo y frecuencia, así como una herramienta para evaluación de voz cantada en Matlab. El interés del método propuesto es que consigue tasas de error cercanas al estado del arte con un método muy sencillo. La herramienta de evaluación propuesta, por otro lado, es un recurso útil para definir mejor el problema, y para evaluar mejor las soluciones propuestas por futuros investigadores. En esta Tesis también se presenta un método para evaluación automática de la interpretación vocal. Usa alineamiento temporal dinámico para alinear la interpretación del usuario con una referencia, proporcionando de esta forma una puntuación de precisión de afinación y de ritmo. La evaluación del sistema muestra una alta correlación entre las puntuaciones dadas por el sistema, y las puntuaciones anotadas por un grupo de músicos expertos

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

Vocal Synthetics: Designing for an Adaptable Singing Synthesizer

Author: Clarke Liam
Publication venue
Publication date: 01/01/2021
Field of study

Technological music tools such as digital audio workstations and electronic music instruments have enabled musicians without formal training to create music that is heard by millions of people. The automation by software and hardware can create compelling productions without limitations from performance ability. However, the automation of vocals is particularly difficult because beyond pitch and timbre, the vocalization of language requires additional parameters for control. As the production of a vocal synthesizer and its vocal palettes is complex, the current market sees these difficulties represented through products that have limited voices and do not adapt to vocal trends. This project demonstrates a tool that allows producers to use a simple typing interface for the input of words, allowing the output to be integrated and controlled by modern digital audio workstations. Using a machine learning solution, the tool is not dependent on large stores of audio data once a model is trained and since it contains a simple method to create new voices, it can keep up with evolving musical trends and vocal styles. The aim is to bring the human voice into the realm of digital music production enabling a music maker to include a large range of vocal styles within their production tool set. This paper outlines the design and development of the tool and culminates in a piece of music that illustrates the value of applying design thinking research strategies to an artistic and technical challenge

OCAD University Open Research Repository

Songs Search Using Human Humming Voice

Author: Abdul Rahman Mohamad Ariff
Publication venue: Universiti teknologi petronas
Publication date: 01/07/2007
Field of study

UTPedia