1,064 research outputs found
A Tutorial on Speech Synthesis Models
For Speech Synthesis, the understanding of the physical and mathematical models of speech is essential. Hence, Speech Modeling is a large field, and is well documented in literature. The aim in this paper is to provide a background review of several speech models used in speech synthesis, specifically the Source Filter Model, Linear Prediction Model, Sinusoidal Model, and Harmonic/Noise Model. The most important models of speech signals will be described starting from the earlier ones up until the last ones, in order to highlight major improvements over these models. It would be desirable a parametric model of speech, that is relatively simple, flexible, high quality, and robust in re-synthesis. Emphasis will be given in Harmonic / Noise Model, since it seems to be more promising and robust model of speech. (C) 2015 The Authors. Published by Elsevier B.V
Singing information processing: techniques and applications
Por otro lado, se presenta un método para el cambio realista de intensidad de voz cantada. Esta transformación se basa en un modelo paramétrico de la envolvente espectral, y mejora sustancialmente la percepción de realismo al compararlo con software comerciales como Melodyne o Vocaloid. El inconveniente del enfoque propuesto es que requiere intervención manual, pero los resultados conseguidos arrojan importantes conclusiones hacia la modificación automática de intensidad con resultados realistas.
Por último, se propone un método para la corrección de disonancias en acordes aislados. Se basa en un análisis de múltiples F0, y un desplazamiento de la frecuencia de su componente sinusoidal. La evaluación la ha realizado un grupo de músicos entrenados, y muestra un claro incremento de la consonancia percibida después de la transformación propuesta.La voz cantada es una componente esencial de la música en todas las culturas del mundo, ya que se trata de una forma increÃblemente natural de expresión musical. En consecuencia, el procesado automático de voz cantada tiene un gran impacto desde la perspectiva de la industria, la cultura y la ciencia. En este contexto, esta Tesis contribuye con un conjunto variado de técnicas y aplicaciones relacionadas con el procesado de voz cantada, asà como con un repaso del estado del arte asociado en cada caso.
En primer lugar, se han comparado varios de los mejores estimadores de tono conocidos para el caso de uso de recuperación por tarareo. Los resultados demuestran que \cite{Boersma1993} (con un ajuste no obvio de parámetros) y \cite{Mauch2014}, tienen un muy buen comportamiento en dicho caso de uso dada la suavidad de los contornos de tono extraÃdos.
Además, se propone un novedoso sistema de transcripción de voz cantada basada en un proceso de histéresis definido en tiempo y frecuencia, asà como una herramienta para evaluación de voz cantada en Matlab. El interés del método propuesto es que consigue tasas de error cercanas al estado del arte con un método muy sencillo. La herramienta de evaluación propuesta, por otro lado, es un recurso útil para definir mejor el problema, y para evaluar mejor las soluciones propuestas por futuros investigadores.
En esta Tesis también se presenta un método para evaluación automática de la interpretación vocal. Usa alineamiento temporal dinámico para alinear la interpretación del usuario con una referencia, proporcionando de esta forma una puntuación de precisión de afinación y de ritmo. La evaluación del sistema muestra una alta correlación entre las puntuaciones dadas por el sistema, y las puntuaciones anotadas por un grupo de músicos expertos
New time-frequency domain pitch estimation methods for speed signals under low levels of SNR
The major objective of this research is to develop novel pitch estimation methods capable of handling speech signals in practical situations where only noise-corrupted speech observations are available. With this objective in mind, the estimation task is carried out in two different approaches. In the first approach, the noisy speech observations are directly employed to develop two new time-frequency domain pitch estimation methods. These methods are based on extracting a pitch-harmonic and finding the corresponding harmonic number required for pitch estimation. Considering that voiced speech is the output of a vocal tract system driven by a sequence of pulses separated by the pitch period, in the second approach, instead of using the noisy speech directly for pitch estimation, an excitation-like signal (ELS) is first generated from the noisy speech or its noise- reduced version. In the first approach, at first, a harmonic cosine autocorrelation (HCAC) model of clean speech in terms of its pitch-harmonics is introduced. In order to extract a pitch-harmonic, we propose an optimization technique based on least-squares fitting of the autocorrelation function (ACF) of the noisy speech to the HCAC model. By exploiting the extracted pitch-harmonic along with the fast Fourier transform (FFT) based power spectrum of noisy speech, we then deduce a harmonic measure and a harmonic-to-noise-power ratio (HNPR) to determine the desired harmonic number of the extracted pitch-harmonic. In the proposed optimization, an initial estimate of the pitch-harmonic is obtained from the maximum peak of the smoothed FFT power spectrum. In addition to the HCAC model, where the cross-product terms of different harmonics are neglected, we derive a compact yet accurate harmonic sinusoidal autocorrelation (HSAC) model for clean speech signal. The new HSAC model is then used in the least-squares model-fitting optimization technique to extract a pitch-harmonic. In the second approach, first, we develop a pitch estimation method by using an excitation-like signal (ELS) generated from the noisy speech. To this end, a technique is based on the principle of homomorphic deconvolution is proposed for extracting the vocal-tract system (VTS) parameters from the noisy speech, which are utilized to perform an inverse-filtering of the noisy speech to produce a residual signal (RS). In order to reduce the effect of noise on the RS, a noise-compensation scheme is introduced in the autocorrelation domain. The noise-compensated ACF of the RS is then employed to generate a squared Hilbert envelope (SHE) as the ELS of the voiced speech. With a view to further overcome the adverse effect of noise on the ELS, a new symmetric normalized magnitude difference function of the ELS is proposed for eventual pitch estimation. Cepstrum has been widely used in speech signal processing but has limited capability of handling noise. One potential solution could be the introduction of a noise reduction block prior to pitch estimation based on the conventional cepstrum, a framework already available in many practical applications, such as mobile communication and hearing aids. Motivated by the advantages of the existing framework and considering the superiority of our ELS to the speech itself in providing clues for pitch information, we develop a cepstrum-based pitch estimation method by using the ELS obtained from the noise-reduced speech. For this purpose, we propose a noise subtraction scheme in frequency domain, which takes into account the possible cross-correlation between speech and noise and has advantages of noise being updated with time and adjusted at each frame. The enhanced speech thus obtained is utilized to extract the vocal-tract system (VTS) parameters via the homomorphic deconvolution technique. A residual signal (RS) is then produced by inverse-filtering the enhanced speech with the extracted VTS parameters. It is found that, unlike the previous ELS-based method, the squared Hilbert envelope (SHE) computed from the RS of the enhanced speech without noise compensation, is sufficient to represent an ELS. Finally, in order to tackle the undesirable effect of noise of the ELS at a very low SNR and overcome the limitation of the conventional cepstrum in handling different types of noises, a time-frequency domain pseudo cepstrum of the ELS of the enhanced speech, incorporating information of both magnitude and phase spectra of the ELS, is proposed for pitch estimation. (Abstract shortened by UMI.
Contributions to automatic multiple F0 detection in polyphonic music signals
Multiple fundamental frequency estimation, or multi-pitch estimation (MPE), is a key problem in automatic music transcription (AMT) and many other related audio processing tasks. Applications of AMT are numerous, ranging from musical genre classification to automatic piano tutoring, and these form a significant part of musical information retrieval tasks. Current AMT systems still perform considerably below human experts, and there is a consensus that the development of an automated system for full transcription of polyphonic music regardless of its complexity is still an open problem. The goal of this work is to propose contributions for the automatic detection of multiple fundamental frequencies in polyphonic music signals. A reference MPE method is chosen to be studied and implemented, and a modification is proposed to improve the performance of the system. Lastly, three refinement strategies are proposed to be incorporated into the modified method, in order to increase the quality of the results. Experimental tests reveal that such refinements improve the overall performance of the system, even if each one performs differently according to signal characteristics.Estimação de múltiplas frequências fundamentais (MPE, do inglês multipitch estimation) é um problema importante na área de transcrição musical automática (TMA) e em muitas outras tarefas relacionadas a processamento de áudio. Aplicações de TMA são diversas, desde classificação de gêneros musicais ao aprendizado automático de piano, as quais consistem em uma parcela significativa de tarefas de extração de informação musical. Métodos atuais de TMA ainda possuem um desempenho consideravelmente ruim quando comparados aos de profissionais da área, e há um consenso que o desenvolvimento de um sistema automatizado para a transcrição completa de música polifônica independentemente de sua complexidade ainda é um problema em aberto. O objetivo deste trabalho é propor contribuições para a detecção automática de múltiplas frequências fundamentais em sinais de música polifônica. Um método de referência para MPEé primeiramente escolhido para ser estudado e implementado, e uma modificação é proposta para melhorar o desempenho do sistema. Por fim, três estratégias de refinamento são propostas para serem incorporadas ao método modificado, com o objetivo de aumentar a qualidade dos resultados. Testes experimentais mostram que tais refinamentos melhoram em média o desempenho do sistema, embora cada um atue de uma maneira diferente de acordo com a natureza dos sinais
Recent Advances in Signal Processing
The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity
Advanced signal processing techniques for pitch synchronous sinusoidal speech coders
Recent trends in commercial and consumer demand have led to the increasing use of multimedia applications in mobile and Internet telephony. Although audio, video and data communications are becoming more prevalent, a major application is and will remain the transmission of speech. Speech coding techniques suited to these new trends must be developed, not only to provide high quality speech communication but also to minimise the required bandwidth for speech, so as to maximise that available for the new audio, video and data services. The majority of current speech coders employed in mobile and Internet applications employ a Code Excited Linear Prediction (CELP) model. These coders attempt to reproduce the input speech signal and can produce high quality synthetic speech at bit rates above 8 kbps. Sinusoidal speech coders tend to dominate at rates below 6 kbps but due to limitations in the sinusoidal speech coding model, their synthetic speech quality cannot be significantly improved even if their bit rate is increased. Recent developments have seen the emergence and application of Pitch Synchronous (PS) speech coding techniques to these coders in order to remove the limitations of the sinusoidal speech coding model. The aim of the research presented in this thesis is to investigate and eliminate the factors that limit the quality of the synthetic speech produced by PS sinusoidal coders. In order to achieve this innovative signal processing techniques have been developed. New parameter analysis and quantisation techniques have been produced which overcome many of the problems associated with applying PS techniques to sinusoidal coders. In sinusoidal based coders, two of the most important elements are the correct formulation of pitch and voicing values from the' input speech. The techniques introduced here have greatly improved these calculations resulting in a higher quality PS sinusoidal speech coder than was previously available. A new quantisation method which is able to reduce the distortion from quantising speech spectral information has also been developed. When these new techniques are utilised they effectively raise the synthetic speech quality of sinusoidal coders to a level comparable to that produced by CELP based schemes, making PS sinusoidal coders a promising alternative at low to medium bit rates.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Recommended from our members
Bayesian methods in music modelling
This thesis presents several hierarchical generative Bayesian models of musical signals designed to improve the accuracy of existing multiple pitch detection systems and other musical signal processing applications whilst remaining feasible for real-time computation. At the lowest level the signal is modelled as a set of overlapping sinusoidal basis functions. The parameters of these basis functions are built into a prior framework based on principles known from musical theory and the physics of musical instruments. The model of a musical note optionally includes phenomena such as frequency and amplitude modulations, damping, volume, timbre and inharmonicity. The occurrence of note onsets in a performance of a piece of music is controlled by an underlying tempo process and the alignment of the timings to the underlying score of the music.
A variety of applications are presented for these models under differing inference constraints. Where full Bayesian inference is possible, reversible-jump Markov Chain Monte Carlo is employed to estimate the number of notes and partial frequency components in each frame of music. We also use approximate techniques such as model selection criteria and variational Bayes methods for inference in situations where computation time is limited or the amount of data to be processed is large. For the higher level score parameters, greedy search and conditional modes algorithms are found to be sufficiently accurate.
We emphasize the links between the models and inference algorithms developed in this thesis with that in existing and parallel work, and demonstrate the effects of making modifications to these models both theoretically and by means of experimental results
Implementation and optimization of the synthesis of musical instrument tones using frequency modulation
Im Bereich der elektronischen Musik hat die Frequenzmodulation (FM) als eine
effiziente Methode zur Klangsynthese in jüngster Zeit enorm an Bedeutung gewonnen.
In der vorliegenden Arbeit werden Methoden zur Grundfrequenzschätzung und
zur FM-Synthese für Musikinstrumentenklänge untersucht, bewertet und optimiert.
Dazu wurde im Rahmen dieser Arbeit eine FM Analyse- und Syntheseumgebung
entwickelt, in welcher die hier betrachteten Verfahren implementiert wurden.
Zur Grundfrequenzschätzung in Musiksignalen wurde ein neuartiges Verfahren auf
Basis von Harmonic Pattern Match (HPM) entwickelt, welches eine höhere Schätzungsgenauigkeit
als bisher verwendete Verfahren bietet. Hierzu wird nach Festlegung
einer geeigneten Teilmenge der Spektraldaten die Autokorrelation sowohl im Zeitals
auch im Frequenzbereich analysiert, um Kandidaten für die Grundfrequenz des
Signals zu bestimmen. Anschließend wird die Übereinstimmung jedes dieser Kandidaten
mit dem Profil der Harmonischen des Musiksignals nach einem effizienten
Verfahren analysiert. Das vorgeschlagene Verfahren wurde analysiert und im Kontext
mit anderen Verfahren zur Grundfrequenzschätzung bewertet. Die praktische
Anwendbarkeit des HPM Verfahrens konnte gezeigt werden.
Zur Implementierung einer FM Synthese wird ein Verfahren zur Approximation
eines Spektrums auf Basis Genetischer Algorithmen (GA) vorgestellt. Die Problemstellung
des GA einschließlich eines Verfahrens zur Bestimmung optimaler FMParameter
wird beschrieben. Des Weiteren wurden im Hinblick auf eine optimierte
FM-Synthese die Anforderungen an das Trägersignal sowie an den Modulator untersucht,
mit dem Ziel einer Vorab-Festlegung des Parameterraums für akkurate
Syntheseresultate. Mit dem Ziel einer Datenreduktion bei der FM-Synthese wurde
eine stückweise lineare Approximation der Einhüllenden des Trägersignals entwickelt.
Einen weiteren Aspekt der Optimierung stellt die Verknüpfung von Formanten in der
Matching-Prozedur dar, wobei die Harmonischen der Formanten mit entsprechenden
Faktoren gewichtet werden. Auf diese Weise wird eine deutlich genauere Approximation
des Timbres des zu synthetisierenden Klangs erreicht. Hierzu wurden
die Schätzung der spektralen Einhüllenden und die Extraktion der Formanten
analysiert und implementiert. Die im Rahmen dieser Arbeit entwickelte Testumgebung
ermöglicht die Schätzung der Parameter und die Analyse und Bewertung der
so erzeugten FM-Syntheseresultate.Frequency modulation (FM) as an efficient method to synthesize musical sounds is
of great importance in the area of computer music. In this thesis, the estimation
of fundamental frequency, the FM synthesis procedure of musical instrument tones
and the optimization on FM synthesis were analysed, evaluated, improved and implemented.
A FM analysis and synthesis environment was developed, in which the
presented work in this thesis were implemented.
For the estimation of fundamental frequency of music signals, an algorithm based on
harmonic pattern match (HPM) was designed to achieve more reliable estimation
accuracy. After defining the spectrum subset, the autocorrelation was applied on the
spectrum subset to exploiting candidates of fundamental frequency, and an efficient
mechanism to evaluate the match between each candidate and the harmonic pattern
of the musical signal was designed. Evaluation of the proposed algorithm and several
other estimation algorithms was performed.
For the implementation of FM synthesis, the matching procedure of spectra using
genetic algorithm (GA) was described, including the definition of the task in GA
and the searching procedure of optimized FM parameters through GA. For the optimization
on FM synthesis, the requirements of carrier and modulator were analysed
and the parameter space was examined, based on which a method for the predetermination
of parameter space was designed to achieve accurate synthesis results. For
data reduction in FM synthesis, the piecewise linear approximation of the carrier
amplitude envelope was designed.
Further step on the FM synthesis optimization was implemented by the combination
of formants in the spectra matching procedure, in which the formant harmonics
were emphasized by the weighting coefficients to achieve more accurate timbre of
the synthesized sounds. The spectral envelope estimation and the formant extraction
were analysed and implemented. For the analysis and implementation of FM
synthesis, a testing environment program was developed, offering the functionality
of parameter estimation and performance evaluation in FM synthesis
- …