479 research outputs found

    A FACIAL ANIMATION FRAMEWORK WITH EMOTIVE/EXPRESSIVE CAPABILITIES

    Get PDF
    LUCIA is an MPEG-4 facial animation system developed at ISTC-CNR.. It works on standard Facial Animation Parameters and speaks with the Italian version of FESTIVAL TTS. To achieve an emotive/expressive talking head LUCIA was build from real human data physically extracted by ELITE optotracking movement analyzer. LUCIA can copy a real human by reproducing the movements of passive markers positioned on his face and recorded by the ELITE device or can be driven by an emotional XML tagged input text, thus realizing a true audio/visual emotive/expressive synthesis. Synchronization between visual and audio data is very important in order to create the correct WAV and FAP files needed for the animation. LUCIA\u27s voice is based on the ISTC Italian version of FESTIVAL-MBROLA packages, modified by means of an appropriate APML/VSML tagged language. LUCIA is available in two different versions: an open source framework and the "work in progress" WebG

    LUCIA: An open source 3D expressive avatar for multimodal h.m.i.

    Get PDF
    LUCIA is an MPEG-4 facial animation system developed at ISTC-CNR . It works on standard Facial Animation Parameters and speaks with the Italian version of FESTIVAL TTS. To achieve an emotive/expressive talking head LUCIA was build from real human data physically extracted by ELITE optotracking movement analyzer. LUCIA can copy a real human by reproducing the movements of passive markers positioned on his face and recorded by the ELITE device or can be driven by an emotional XML tagged input text, thus realizing a true audio/visual emotive/expressive synthesis. Synchronization between visual and audio data is very important in order to create the correct WAV and FAP files needed for the animation. LUCIA\u27s voice is based on the ISTC Italian version of FESTIVAL-MBROLA packages, modified by means of an appropriate APML/VSML tagged language. LUCIA is available in two dif-ferent versions: an open source framework and the "work in progress" WebGL

    Brain mechanisms of acoustic communication in humans and nonhuman primates: An evolutionary perspective

    Get PDF
    Any account of “what is special about the human brain” (Passingham 2008) must specify the neural basis of our unique ability to produce speech and delineate how these remarkable motor capabilities could have emerged in our hominin ancestors. Clinical data suggest that the basal ganglia provide a platform for the integration of primate-general mechanisms of acoustic communication with the faculty of articulate speech in humans. Furthermore, neurobiological and paleoanthropological data point at a two-stage model of the phylogenetic evolution of this crucial prerequisite of spoken language: (i) monosynaptic refinement of the projections of motor cortex to the brainstem nuclei that steer laryngeal muscles, presumably, as part of a “phylogenetic trend” associated with increasing brain size during hominin evolution; (ii) subsequent vocal-laryngeal elaboration of cortico-basal ganglia circuitries, driven by human-specific FOXP2 mutations.;>This concept implies vocal continuity of spoken language evolution at the motor level, elucidating the deep entrenchment of articulate speech into a “nonverbal matrix” (Ingold 1994), which is not accounted for by gestural-origin theories. Moreover, it provides a solution to the question for the adaptive value of the “first word” (Bickerton 2009) since even the earliest and most simple verbal utterances must have increased the versatility of vocal displays afforded by the preceding elaboration of monosynaptic corticobulbar tracts, giving rise to enhanced social cooperation and prestige. At the ontogenetic level, the proposed model assumes age-dependent interactions between the basal ganglia and their cortical targets, similar to vocal learning in some songbirds. In this view, the emergence of articulate speech builds on the “renaissance” of an ancient organizational principle and, hence, may represent an example of “evolutionary tinkering” (Jacob 1977)

    INTERFACE Toolkit: A New Tool for Building IVAs

    Full text link

    Sounding the body: the role of the Valsalva mechanism in the emergence of the linguistic sign

    Get PDF
    The main aim of this study, conducted within STEELS, a gestural theory of the origins of speech, is to set out a proposal as to the possible role of the Valsalva mechanism in the emergence of the linguistic sign. STEELS posits that in the earliest forms of speech developed by Homo, vocomimetic laryngeal resonances of nonlinguistic origin were integrated into LV (laryngeal + vowel) protosyllables referring back to oro-naso-laryngeal (ONL) actions such as breathing, sneezing and coughing. It further posits that these protosyllables were conceptually mapped to non-ONL bodily actions making use of the Valsalva manoeuvre, such as lifting, birthing, and defecating. This claim, which stems from a submorphemic analysis of certain Proto-Indo-European “body-part” roots projected back, within a gestural framework, to the emergence of speech, suggests that the vocomimetic protosyllables posited would have become (self-)referential through a neurocognitive process of recurrent, somatotopically-driven pattern-extraction.Le but principal de cette étude, menée dans le cadre de la TSG, théorie gestuelle des origines du langage articulé, est d’explorer les contours de l’éventuel rôle qu’a pu jouer le mécanisme de Valsalva dans l’émergence du signe linguistique. La TSG postule que dans les premières conformations du langage développées par Homo, des résonances laryngales à caractère vocomimétique d’origine non linguistique ont pu être incorporées dans des protosyllabes de type LV (laryngale + voyelle) renvoyant auto-référentiellement à des actions bucco-naso-laryngales (BNL) telles que respirer, éternuer ou tousser. Elle postule également que ces protosyllabes ont pu être projetées sur des actions corporelles autres que BNL faisant appel à la manœuvre de Valsalva, telles que soulever, enfanter ou déféquer. Cette affirmation, fondée sur une analyse submorphémique de certaines racines du proto-indo-européen renvoyant au corps, rétroprojetée dans une perspective gestuelle jusqu’à l’émergence du langage articulé, laisse penser que les protosyllabes vocomimétiques postulées seraient devenues (auto-)référentielles au moyen d’un processus neurocognitif impliquant l’extraction de schémas récurrents de traits formels somatotopiquement mu

    A Neurocognitive Approach to the Study of Private Speech

    Get PDF
    The paper presents the current state of the art of research identifying the neurophysiological and neuroanatomical substrates of private speech, both in typical and clinical (or atypical) populations. First, it briefly describes the evolution of private speech research, which goes from classic traditions as the naturalistic and referential paradigms to the neurocognitive approach. An overview of the neurophysiological (e.g., event-related potentials or ERPs) and neuroimaging techniques (e.g., functional magnetic resonance imaging or fMRI) is also presented. The next three sections review empirical works about the neurocognitive basis of private speech, across three groups of techniques: ERPs; fMRI/MRI; and other neuroimaging techniques (positron emission tomography [PET], magnetoencephalogram [MEG], and repetitive transcranial magnetic stimulation [rTMS]). Such neurocognitive research analyzes the neural activity of individuals during a variety of task settings, including spontaneous and instructed overt and inner private speech use, subvocal verbalizations, and silent and overt reading. The fifth section focuses on electrophysiological and neuroimaging studies of private speech in atypical populations, for example: schizophrenia, pure alexia, hearing impairment, blindness, social phobia, alexithymia, Parkinson, and multiple sclerosis. The neurocognitive study of the various forms of private speech appears to be very promising in the understanding of these pathologies. Lastly, the advances and new challenges in the field are discussed.Este trabajo presenta el estado actual de la investigación que identifica los sustratos neurofisiológicos y neuroanatómicos del lenguaje privado, tanto en poblaciones típicas como en clínicas (o atípicas). Primero describe brevemente la evolución de la investigación del lenguaje privado, que van desde las tradiciones clásicas como los paradigmas naturalistas y referenciales al abordaje neurocognitivo. También se presenta una revisión de las técnicas neurofisiológicas (por ejemplo, potenciales relacionados con eventos o ERPs) y de neuroimagen (por ejemplo, imagen de resonancia magnética funcional o fMRI). Las siguientes tres secciones revisan los trabajos empíricos sobre la base neurocognitiva del lenguaje privado a través de tres grupos de técnicas: ERPs; fMRI/MRI; y otras técnicas de neuroimagen (tomografía de emisión de positrones [PET], magnetoencefalograma [MEG] y la estimulación magnética repetitiva transcraneal [rTMS]). Esta investigación neurocognitiva analiza la actividad neuronal de los individuos durante diversas tareas, incluyendo el uso del lenguaje privado espontáneo y observable bajo instrucciones y el lenguaje privado interno, las verbalizaciones subvocales y la lectura silenciosa y observable. La quinta sección se centra en los estudios electrofisiológicos y de neuroimágenes del lenguaje privado en poblaciones atípicas, por ejemplo, esquizofrenia, alexia pura, hipoacusia, ceguera, fobia social, alexithymia, Parkinson, y esclerosis múltiple. El estudio neurocognitivo de varias formas del lenguaje privado parece muy prometedor para la comprensión de estas patologías. Por último, se comentan los avances y los nuevos retos en el campo

    Concatenative speech synthesis: a Framework for Reducing Perceived Distortion when using the TD-PSOLA Algorithm

    Get PDF
    This thesis presents the design and evaluation of an approach to concatenative speech synthesis using the Titne-Domain Pitch-Synchronous OverLap-Add (I'D-PSOLA) signal processing algorithm. Concatenative synthesis systems make use of pre-recorded speech segments stored in a speech corpus. At synthesis time, the `best' segments available to synthesise the new utterances are chosen from the corpus using a process known as unit selection. During the synthesis process, the pitch and duration of these segments may be modified to generate the desired prosody. The TD-PSOLA algorithm provides an efficient and essentially successful solution to perform these modifications, although some perceptible distortion, in the form of `buzzyness', may be introduced into the speech signal. Despite the popularity of the TD-PSOLA algorithm, little formal research has been undertaken to address this recognised problem of distortion. The approach in the thesis has been developed towards reducing the perceived distortion that is introduced when TD-PSOLA is applied to speech. To investigate the occurrence of this distortion, a psychoacoustic evaluation of the effect of pitch modification using the TD-PSOLA algorithm is presented. Subjective experiments in the form of a set of listening tests were undertaken using word-level stimuli that had been manipulated using TD-PSOLA. The data collected from these experiments were analysed for patterns of co- occurrence or correlations to investigate where this distortion may occur. From this, parameters were identified which may have contributed to increased distortion. These parameters were concerned with the relationship between the spectral content of individual phonemes, the extent of pitch manipulation, and aspects of the original recordings. Based on these results, a framework was designed for use in conjunction with TD-PSOLA to minimise the possible causes of distortion. The framework consisted of a novel speech corpus design, a signal processing distortion measure, and a selection process for especially problematic phonemes. Rather than phonetically balanced, the corpus is balanced to the needs of the signal processing algorithm, containing more of the adversely affected phonemes. The aim is to reduce the potential extent of pitch modification of such segments, and hence produce synthetic speech with less perceptible distortion. The signal processingdistortion measure was developed to allow the prediction of perceptible distortion in pitch-modified speech. Different weightings were estimated for individual phonemes,trained using the experimental data collected during the listening tests.The potential benefit of such a measure for existing unit selection processes in a corpus-based system using TD-PSOLA is illustrated. Finally, the special-case selection process was developed for highly problematic voiced fricative phonemes to minimise the occurrence of perceived distortion in these segments. The success of the framework, in terms of generating synthetic speech with reduced distortion, was evaluated. A listening test showed that the TD-PSOLA balanced speech corpus may be capable of generating pitch-modified synthetic sentences with significantly less distortion than those generated using a typical phonetically balanced corpus. The voiced fricative selection process was also shown to produce pitch-modified versions of these phonemes with less perceived distortion than a standard selection process. The listening test then indicated that the signal processing distortion measure was able to predict the resulting amount of distortion at the sentence-level after the application of TD-PSOLA, suggesting that it may be beneficial to include such a measure in existing unit selection processes. The framework was found to be capable of producing speech with reduced perceptible distortion in certain situations, although the effects seen at the sentence-level were less than those seen in the previous investigative experiments that made use of word-level stimuli. This suggeststhat the effect of the TD-PSOLA algorithm cannot always be easily anticipated due to the highly dynamic nature of speech, and that the reduction of perceptible distortion in TD-PSOLA-modified speech remains a challenge to the speech community

    The Making of a Spanish Orchestra of Language Based on Robson\u27s The Orchestra of Language

    Get PDF
    A monograph presented to the faculty of the Department of English at Morehead State University in partial fulfillment of the requirements for the Degree of Master of Arts by Denise Munizaga Lagos in August of 1971
    corecore