102 research outputs found
Final Report to NSF of the Standards for Facial Animation Workshop
The human face is an important and complex communication channel. It is a very familiar and sensitive object of human perception. The facial animation field has increased greatly in the past few years as fast computer graphics workstations have made the modeling and real-time animation of hundreds of thousands of polygons affordable and almost commonplace. Many applications have been developed such as teleconferencing, surgery, information assistance systems, games, and entertainment. To solve these different problems, different approaches for both animation control and modeling have been developed
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
Subsidia: Tools and Resources for Speech Sciences
Este libro, resultado de la colaboraciĂłn de investigadores expertos en sus respectivas ĂĄreas, pretende ser una ayuda a la comunidad cientĂfica en tanto en cuanto recopila y describe una serie de materiales de gran utilidad para seguir avanzando en la investigaciĂł
Recommended from our members
Modelling and Animation using Partial Differential Equations. Geometric modelling and computer animation of virtual characters using elliptic partial differential equations.
This work addresses various applications pertaining to the design, modelling and animation of parametric surfaces using elliptic Partial Differential Equations (PDE) which are produced via the PDE method. Compared with traditional surface generation techniques, the PDE method is an effective technique that can represent complex three-dimensional (3D) geometries in terms of a relatively small set of parameters. A PDE-based surface can be produced from a set of pre-configured curves that are used as the boundary conditions to solve a number of PDE. An important advantage of using this method is that most of the information required to define a surface is contained at its boundary. Thus, complex surfaces can be computed using only a small set of design parameters.
In order to exploit the advantages of this methodology various applications were developed that vary from the interactive design of aircraft configurations to the animation of facial expressions in a computer-human interaction system that utilizes an artificial intelligence (AI) bot for real time conversation. Additional applications of generating cyclic motions for PDE based human character integrated in a Computer-Aided Design (CAD) package as well as developing techniques to describe a given mesh geometry by a set of boundary conditions, required to evaluate the PDE method, are presented. Each methodology presents a novel approach for interacting with parametric surfaces obtained by the PDE method. This is due to the several advantages this surface generation technique has to offer. Additionally, each application developed in this thesis focuses on a specific target that delivers efficiently various operations in the design, modelling and animation of such surfaces.The project files will not be available online
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
Models and Analysis of Vocal Emissions for Biomedical Applications
The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis
Multi-parametric source-filter separation of speech and prosodic voice restoration
In this thesis, methods and models are developed and presented aiming at the estimation, restoration and transformation of the characteristics of human speech. During a first period of the thesis, a concept was developed that allows restoring prosodic voice features and reconstruct more natural sounding speech from pathological voices using a multi-resolution approach. Inspired from observations with respect to this approach, the necessity of a novel method for the separation of speech into voice source and articulation components emerged in order to improve the perceptive quality of the restored speech signal. This work subsequently represents the main part of this work and therefore is presented first in this thesis. The proposed method is evaluated on synthetic, physically modelled, healthy and pathological speech. A robust, separate representation of source and filter characteristics has applications in areas that go far beyond the reconstruction of alaryngeal speech. It is potentially useful for efficient speech coding, voice biometrics, emotional speech synthesis, remote and/or non-invasive voice disorder diagnosis, etc. A key aspect of the voice restoration method is the reliable separation of the speech signal into voice source and articulation for it is mostly the voice source that requires replacement or enhancement in alaryngeal speech. Observations during the evaluation of above method highlighted that this separation is insufficient with currently known methods. Therefore, the main part of this thesis is concerned with the modelling of voice and vocal tract and the estimation of the respective model parameters. Most methods for joint source filter estimation known today represent a compromise between model complexity, estimation feasibility and estimation efficiency. Typically, single-parametric models are used to represent the source for the sake of tractable optimization or multi-parametric models are estimated using inefficient grid searches over the entire parameter space. The novel method presented in this work proposes advances in the direction of efficiently estimating and fitting multi-parametric source and filter models to healthy and pathological speech signals, resulting in a more reliable estimation of voice source and especially vocal tract coefficients. In particular, the proposed method is exhibits a largely reduced bias in the estimated formant frequencies and bandwidths over a large variety of experimental conditions such as environmental noise, glottal jitter, fundamental frequency, voice types and glottal noise. The methods appears to be especially robust to environmental noise and improves the separation of deterministic voice source components from the articulation. Alaryngeal speakers often have great difficulty at producing intelligible, not to mention prosodic, speech. Despite great efforts and advances in surgical and rehabilitative techniques, currently known methods, devices and modes of speech rehabilitation leave pathological speakers with a lack in the ability to control key aspects of their voice. The proposed multiresolution approach presented at the end of this thesis provides alaryngeal speakers an intuitive manner to increase prosodic features in their speech by reconstructing a more intelligible, more natural and more prosodic voice. The proposed method is entirely non-invasive. Key prosodic cues are reconstructed and enhanced at different temporal scales by inducing additional volatility estimated from other, still intact, speech features. The restored voice source is thus controllable in an intuitive way by the alaryngeal speaker. Despite the above mentioned advantages there is also a weak point of the proposed joint source-filter estimation method to be mentioned. The proposed method exhibits a susceptibility to modelling errors of the glottal source. On the other hand, the proposed estimation framework appears to be well suited for future research on exactly this topic. A logical continuation of this work is the leverage the efficiency and reliability of the proposed method for the development of new, more accurate glottal source models
Modeling of Polish Intonation for Statistical-Parametric Speech Synthesis
WydziaĆ NeofilologiiBieĆŒÄ
ca praca prezentuje prĂłbÄ budowy neurobiologicznie umotywowanego modelu mapowaĆ pomiÄdzy wysokopoziomowymi dyskretnymi kategoriami lingwistycznymi a ciÄ
gĆym sygnaĆem czÄstotliwoĆci podstawowej w polskiej neutralnej mowie czytanej, w oparciu o konwolucyjne sieci neuronowe. Po krĂłtkim wprowadzeniu w problem badawczy w kontekĆcie intonacji, syntezy mowy oraz luki pomiÄdzy fonetykÄ
a fonologiÄ
, praca przedstawia opis uczenia modelu na podstawie specjalnego korpusu mowy oraz ewaluacjÄ naturalnoĆci konturu F0 generowanego przez wyuczony model za pomocÄ
eksperymentĂłw percepcyjnych typu ABX oraz MOS przy uĆŒyciu specjalnie w tym celu zbudowanego resyntezatora Neural Source Filter. NastÄpnie, prezentowane sÄ
wyniki eksploracji fonologiczno-fonetycznych mapowaĆ wyuczonych przez model. W tym celu wykorzystana zostaĆa
jedna z tzw. metod wyjaĆniajÄ
cych dla sztucznej inteligencji â Layer-wise Relevance Propagation.
W pracy przedstawione zostaĆy wyniki powstaĆej na tej podstawie obszernej analizy iloĆciowej
istotnoĆci dla konturu czÄstotliwoĆci podstawowej kaĆŒdej z 1297 specjalnie wygenerowanych
lingwistycznych kategorii wejĆciowych modelu oraz ich wielorakich grupowaĆ na rĂłĆŒnorodnych poziomach abstrakcji. PracÄ koĆczy dogĆÄbna analiza oraz interpretacja uzyskanych wynikĂłw oraz rozwaĆŒania na temat mocnych oraz sĆabych stron zastosowanych metod, a takĆŒe lista proponowanych usprawnieĆ.This work presents an attempt to build a neurobiologically inspired Convolutional Neural
Network-based model of the mappings between discrete high-level linguistic categories into a
continuous signal of fundamental frequency in Polish neutral read speech. After a brief
introduction of the current research problem in the context of intonation, speech synthesis and the
phonetic-phonology gap, the work goes on to describe the training of the model on a special speech corpus, and an evaluation of the naturalness of the F0 contour produced by the trained model through ABX and MOS perception experiments conducted with help of a specially built Neural Source Filter resynthesizer. Finally, an in-depth exploration of the phonology-to-phonetics mappings learned by the model is presented; the Layer-wise Relevance Propagation explainability method was used to perform an extensive quantitative analysis of the relevance of 1297 specially engineered linguistic input features and
their groupings at various levels of abstraction for the specific contours of the fundamental frequency.
The work ends with an in-depth interpretation of these results and a discussion of the advantages
and disadvantages of the current method, and lists a number of potential future improvements.Badania przedstawione w pracy zostaĆy czËeÂŽsciowo zrealizowane w ramach grantu badawczego Harmonia nr UMO-2014/14/M/HS2/00631 przyznanego przez Narodowe Centrum Nauki
- âŠ