Search CORE

2,293 research outputs found

Singing voice correction using canonical time warping

Author: Chen Ming-Tso
Chi Tai-Shih
Luo Yin-Jyun
Su Li
Publication venue
Publication date: 23/11/2017
Field of study

Expressive singing voice correction is an appealing but challenging problem. A robust time-warping algorithm which synchronizes two singing recordings can provide a promising solution. We thereby propose to address the problem by canonical time warping (CTW) which aligns amateur singing recordings to professional ones. A new pitch contour is generated given the alignment information, and a pitch-corrected singing is synthesized back through the vocoder. The objective evaluation shows that CTW is robust against pitch-shifting and time-stretching effects, and the subjective test demonstrates that CTW prevails the other methods including DTW and the commercial auto-tuning software. Finally, we demonstrate the applicability of the proposed method in a practical, real-world scenario

arXiv.org e-Print Archive

Crossref

Polyphonic Sound Event Detection by using Capsule Neural Networks

Author: Gabrielli Leonardo
Principi Emanuele
Squartini Stefano
Vesperini Fabio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/01/2019
Field of study

Artificial sound event detection (SED) has the aim to mimic the human ability to perceive and understand what is happening in the surroundings. Nowadays, Deep Learning offers valuable techniques for this goal such as Convolutional Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has been recently introduced in the image processing field with the intent to overcome some of the known limitations of CNNs, specifically regarding the scarce robustness to affine transformations (i.e., perspective, size, orientation) and the detection of overlapped images. This motivated the authors to employ CapsNets to deal with the polyphonic-SED task, in which multiple sound events occur simultaneously. Specifically, we propose to exploit the capsule units to represent a set of distinctive properties for each individual sound event. Capsule units are connected through a so-called "dynamic routing" that encourages learning part-whole relationships and improves the detection performance in a polyphonic context. This paper reports extensive evaluations carried out on three publicly available datasets, showing how the CapsNet-based algorithm not only outperforms standard CNNs but also allows to achieve the best results with respect to the state of the art algorithms

arXiv.org e-Print Archive

IRIS UniversitÃ Politecnica delle Marche

Singing voice resynthesis using concatenative-based techniques

Author: Fonseca Nuno Miguel da Costa Santos
Publication venue
Publication date: 01/01/2011
Field of study

Tese de Doutoramento. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 201

Repositório Aberto da Universidade do Porto

Deep Learning for Audio Signal Processing

Author: Chang Shuo-yiin
Li Bo
Purwins Hendrik
Sainath Tara
Schlüter Jan
Virtanen Tuomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

arXiv.org e-Print Archive

VBN

Singing information processing: techniques and applications

Author: Molina Martinez Emilio
Publication venue: UMA Editorial
Publication date: 01/01/2017
Field of study

Por otro lado, se presenta un método para el cambio realista de intensidad de voz cantada. Esta transformación se basa en un modelo paramétrico de la envolvente espectral, y mejora sustancialmente la percepción de realismo al compararlo con software comerciales como Melodyne o Vocaloid. El inconveniente del enfoque propuesto es que requiere intervención manual, pero los resultados conseguidos arrojan importantes conclusiones hacia la modificación automática de intensidad con resultados realistas. Por último, se propone un método para la corrección de disonancias en acordes aislados. Se basa en un análisis de múltiples F0, y un desplazamiento de la frecuencia de su componente sinusoidal. La evaluación la ha realizado un grupo de músicos entrenados, y muestra un claro incremento de la consonancia percibida después de la transformación propuesta.La voz cantada es una componente esencial de la música en todas las culturas del mundo, ya que se trata de una forma increíblemente natural de expresión musical. En consecuencia, el procesado automático de voz cantada tiene un gran impacto desde la perspectiva de la industria, la cultura y la ciencia. En este contexto, esta Tesis contribuye con un conjunto variado de técnicas y aplicaciones relacionadas con el procesado de voz cantada, así como con un repaso del estado del arte asociado en cada caso. En primer lugar, se han comparado varios de los mejores estimadores de tono conocidos para el caso de uso de recuperación por tarareo. Los resultados demuestran que \cite{Boersma1993} (con un ajuste no obvio de parámetros) y \cite{Mauch2014}, tienen un muy buen comportamiento en dicho caso de uso dada la suavidad de los contornos de tono extraídos. Además, se propone un novedoso sistema de transcripción de voz cantada basada en un proceso de histéresis definido en tiempo y frecuencia, así como una herramienta para evaluación de voz cantada en Matlab. El interés del método propuesto es que consigue tasas de error cercanas al estado del arte con un método muy sencillo. La herramienta de evaluación propuesta, por otro lado, es un recurso útil para definir mejor el problema, y para evaluar mejor las soluciones propuestas por futuros investigadores. En esta Tesis también se presenta un método para evaluación automática de la interpretación vocal. Usa alineamiento temporal dinámico para alinear la interpretación del usuario con una referencia, proporcionando de esta forma una puntuación de precisión de afinación y de ritmo. La evaluación del sistema muestra una alta correlación entre las puntuaciones dadas por el sistema, y las puntuaciones anotadas por un grupo de músicos expertos

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

Recommended from our members

Roadmap for Music Information ReSearch

Author: Benetos E.
Chudy M.
Dixon S.
Flexer A.
Gomez E.
Gouyon F.
Herrera P.
Jorda S.
Magas M.
Paytuvi O.
Peeters G.
Schlüter J.
Serra X.
Vinet H.
Widmer G.
Publication venue: MIRES Consortium
Publication date: 01/01/2013
Field of study

City Research Online

UPF Digital Repository

Analysis on Using Synthesized Singing Techniques in Assistive Interfaces for Visually Impaired to Study Music

Author: Jayaratne Dr. Lakshman
Ranasinghe Kavindu
Publication venue: GSTF Journal on Computing (JoC)
Publication date: 10/04/2016
Field of study

Tactile and auditory senses are the basic types of methods that visually impaired people sense the world. Their interaction with assistive technologies also focuses mainly on tactile and auditory interfaces. This research paper discuss about the validity of using most appropriate singing synthesizing techniques as a mediator in assistive technologies specifically built to address their music learning needs engaged with music scores and lyrics. Music scores with notations and lyrics are considered as the main mediators in musical communication channel which lies between a composer and a performer. Visually impaired music lovers have less opportunity to access this main mediator since most of them are in visual format. If we consider a music score, the vocal performer’s melody is married to all the pleasant sound producible in the form of singing. Singing best fits for a format in temporal domain compared to a tactile format in spatial domain. Therefore, conversion of existing visual format to a singing output will be the most appropriate nonlossy transition as proved by the initial research on adaptive music score trainer for visually impaired [1]. In order to extend the paths of this initial research, this study seek on existing singing synthesizing techniques and researches on auditory interfaces

GSTF Digital Library (GSTF-DL): Open Journal Systems (Global Science and Technology Forum)

PLXTRM : prediction-led eXtended-guitar tool for real-time music applications and live performance

Author: Bernardini Bressan Federica
Degrave Jonas
Leman Marc
Nijs Luc
Vets Tim
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2017
Field of study

peer reviewedThis article presents PLXTRM, a system tracking picking-hand micro-gestures for real-time music applications and live performance. PLXTRM taps into the existing gesture vocabulary of the guitar player. On the first level, PLXTRM provides a continuous controller that doesn’t require the musician to learn and integrate extrinsic gestures, avoiding additional cognitive load. Beyond the possible musical applications using this continuous control, the second aim is to harness PLXTRM’s predictive power. Using a reservoir network, string onsets are predicted within a certain time frame, based on the spatial trajectory of the guitar pick. In this time frame, manipulations to the audio signal can be introduced, prior to the string actually sounding, ’prefacing’ note onsets. Thirdly, PLXTRM facilitates the distinction of playing features such as up-strokes vs. down-strokes, string selections and the continuous velocity of gestures, and thereby explores new expressive possibilities

Crossref

Ghent University Academic Bibliography

Open Repository and Bibliography - Luxembourg