479 research outputs found
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
Fractal based speech recognition and synthesis
Transmitting a linguistic message is most often the primary purpose of speech comÂmunication and the recognition of this message by machine that would be most useful.
This research consists of two major parts. The first part presents a novel and promisÂing approach for estimating the degree of recognition of speech phonemes and makes use of a new set of features based fractals. The main methods of computing the fracÂtal dimension of speech signals are reviewed and a new speaker-independent speech recognition system developed at De Montfort University is described in detail. FiÂnally, a Least Square Method as well as a novel Neural Network algorithm is employed to derive the recognition performance of the speech data.
The second part of this work studies the synthesis of speech words, which is based mainly on the fractal dimension to create natural sounding speech. The work shows that by careful use of the fractal dimension together with the phase of the speech signal to ensure consistent intonation contours, natural-sounding speech synthesis is achievable with word level speech. In order to extend the flexibility of this framework, we focused on the filtering and the compression of the phase to maintain and produce natural sounding speech. A ‘naturalness level’ is achieved as a result of the fractal characteristic used in the synthesis process. Finally, a novel speech synthesis system based on fractals developed at De Montfort University is discussed.
Throughout our research simulation experiments were performed on continuous speech data available from the Texas Instrument Massachusetts institute of technology ( TIMIT) database, which is designed to provide the speech research community with a standarised corpus for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition system
Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy
The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference
Playing Technique and Violin Timbre: Detecting Bad Playing
For centuries, luthiers have committed to working towards better understanding and improving the sound characteristics and playability of violins. With advances in technology and signal processing, studies attempting to define a violin’s sound qualityvia physical characteristics and resonance patterns have ensued. Existing work has primarily focused on physical aspects reflecting an instrument’s sound quality. In the music information retrieval domain, advances have been made in areas suchas instrument identification tasks. Although much research has been completed on finding suitable features from which musical instruments can be represented, little work has focused on the violin’s complete timbre space and the effect a player has on the sound produced. This thesis specifically focuses on representing violin timbre such that a computer can detect the sound associated with a beginner from that of a professional standard player and detect typical beginner playing faults based on analysis of thewaveform signal only. Work has been limited to nine playing faults considered by professional musicians to be typical of beginner violinists. In order to achieve these goals, it was necessary to create a suitable dataset consisting of an equal number of beginner and professional standard legato notesamples. Feature extraction was then carried out by taking features from the time, spectral and cepstral domains. Selected features were then used to represent the samples in a classifier based on their efficacy at reflecting change within the violin’s timbrespace. The dataset underwent the scrutiny of professional standard stringed instrumentplayers via listening tests from which the target audience’s perception was captured. This information was verified and normalised before use as a priori labels in the classifier. Based on different feature representations, classification of violin notesreflecting perceived sound quality is presented in this thesis. The results show that it is possible to get a computer to determine between beginner and professional standard player legato notes and to detect playing faults. This thesis involves a thoroughunderstanding of violin playing, its perception, suitable analysis methods, feature extraction, representation and classification
Singing information processing: techniques and applications
Por otro lado, se presenta un método para el cambio realista de intensidad de voz cantada. Esta transformación se basa en un modelo paramétrico de la envolvente espectral, y mejora sustancialmente la percepción de realismo al compararlo con software comerciales como Melodyne o Vocaloid. El inconveniente del enfoque propuesto es que requiere intervención manual, pero los resultados conseguidos arrojan importantes conclusiones hacia la modificación automática de intensidad con resultados realistas.
Por último, se propone un método para la corrección de disonancias en acordes aislados. Se basa en un análisis de múltiples F0, y un desplazamiento de la frecuencia de su componente sinusoidal. La evaluación la ha realizado un grupo de músicos entrenados, y muestra un claro incremento de la consonancia percibida después de la transformación propuesta.La voz cantada es una componente esencial de la música en todas las culturas del mundo, ya que se trata de una forma increÃblemente natural de expresión musical. En consecuencia, el procesado automático de voz cantada tiene un gran impacto desde la perspectiva de la industria, la cultura y la ciencia. En este contexto, esta Tesis contribuye con un conjunto variado de técnicas y aplicaciones relacionadas con el procesado de voz cantada, asà como con un repaso del estado del arte asociado en cada caso.
En primer lugar, se han comparado varios de los mejores estimadores de tono conocidos para el caso de uso de recuperación por tarareo. Los resultados demuestran que \cite{Boersma1993} (con un ajuste no obvio de parámetros) y \cite{Mauch2014}, tienen un muy buen comportamiento en dicho caso de uso dada la suavidad de los contornos de tono extraÃdos.
Además, se propone un novedoso sistema de transcripción de voz cantada basada en un proceso de histéresis definido en tiempo y frecuencia, asà como una herramienta para evaluación de voz cantada en Matlab. El interés del método propuesto es que consigue tasas de error cercanas al estado del arte con un método muy sencillo. La herramienta de evaluación propuesta, por otro lado, es un recurso útil para definir mejor el problema, y para evaluar mejor las soluciones propuestas por futuros investigadores.
En esta Tesis también se presenta un método para evaluación automática de la interpretación vocal. Usa alineamiento temporal dinámico para alinear la interpretación del usuario con una referencia, proporcionando de esta forma una puntuación de precisión de afinación y de ritmo. La evaluación del sistema muestra una alta correlación entre las puntuaciones dadas por el sistema, y las puntuaciones anotadas por un grupo de músicos expertos
Query by Example of Speaker Audio Signals using Power Spectrum and MFCCs
Search engine is the popular term for an information retrieval (IR) system. Typically, search engine can be based on full-text indexing. Changing the presentation from the text data to multimedia data types make an information retrieval process more complex such as a retrieval of image or sounds in large databases. This paper introduces the use of language and text independent speech as input queries in a large sound database by using Speaker identification algorithm. The method consists of 2 main processing first steps, we separate vocal and non-vocal identification after that vocal be used to speaker identification for audio query by speaker voice. For the speaker identification and audio query by process, we estimate the similarity of the example signal and the samples in the queried database by calculating the Euclidian distance between the Mel frequency cepstral coefficients (MFCC) and Energy spectrum of acoustic features. The simulations show that the good performance with a sustainable computational cost and obtained the average accuracy rate more than 90%
- …