866 research outputs found

    Capture, Learning, and Synthesis of 3D Speaking Styles

    Full text link
    Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input - even speech in languages other than English - and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201

    Bilingual Frequency in a Favorable Context (BFFC) in the Italian dialectal area. Theoretical preliminaries to the analysis of geminate lateral retroflexion and voiceless plosives aspiration in Antona (MS)

    Get PDF
    The paper builds a theoretical framework for the application of the Bilingual Frequency in Favorable Context (BFFC) formula to the peculiar Italian linguistic setting. BFFC was first devised as a usage-based tool to weight the frequency effect of non-varying cognate words against the probability of variation phenomena in bilingual settings. Since Italian dialects are sister languages of the standard variety, speakers can be considered bilingual. However, no dialectal frequency corpora for the extraction of essential BFFC components are available. The paper suggests overcoming this hurdle using subjective frequency estimates and testing BFFC effectiveness through a picture-naming task and acceptability ratings. A critical overview of the phonetic features of interest is also presented, advancing proposals for future analyses

    Categorization of sounds

    Get PDF
    The authors conducted 4 experiments to test the decision-bound, prototype, and distribution theories for the categorization of sounds. They used as stimuli sounds varying in either resonance frequency or duration. They created different experimental conditions by varying the variance and overlap of 2 stimulus distributions used in a training phase and varying the size of the stimulus continuum used in the subsequent test phase. When resonance frequency was the stimulus dimension, the pattern of categorization-function slopes was in accordance with the decision-bound theory. When duration was the stimulus dimension, however, the slope pattern gave partial support for the decision-bound and distribution theories. The authors introduce a new categorization model combining aspects of decision-bound and distribution theories that gives a superior account of the slope patterns across the 2 stimulus dimensions

    The role of experience in processing foreign-accented speech

    Get PDF
    The present study examines the perceptual accommodation of the bilabial stop-consonant voicing contrast (i.e., /b/ vs. /p/), in several English- and Spanish-accented contexts, by native Spanish listeners with different degrees of experience with accented speech. In a series of four experiments, we confronted three potential mechanisms for the perceptual accommodation of foreign-accented sounds. According to the first mechanism (phonetic relaxation), listeners accommodate foreign-accented sounds by relaxing the phonetic boundary between native speech sound categories. According to the second mechanism (phonetic calibration), listeners accommodate foreign-accented sounds by adjusting the location of native perceptual boundaries according to the phonetic realization of native categories in the foreign-accented speech context. Finally, according to the third mechanism (phonetic switching), foreign-accented speech sounds are accommodated by switching to a non-native system of phonetic representations that was previously developed through long-term experience with the speech norm of the foreign accent. Experimental results indicate that Spanish listeners did not relax the phonetic boundary between /b/ and /p/ in an English-accented Spanish context (Experiments 1 and 3). However, they accommodated English-accented Spanish voicing differently, depending on their degree of experience with the English-accented speech norm. When Spanish listeners had little or no experience with the English norm, they calibrated the location of the perceptual boundary between /b/ and /p/ according to the Spanish or English phonetic realization of these sounds in the speech context (Experiment 4). Alternatively, when they had a high degree of experience with English-accented speech, they accommodated English-accented Spanish /b/ and /p/ by using an English-like system of phonetic representations that was not predictable from the phonetic realization of /b/ and /p/ in the speech context (Experiments 1 and 2). These experimental results contribute to a better understanding of the role played by non-native experience in the perceptual accommodation of foreign-accents. In particular, they indicate that native listeners may rely on previous long-term experience with the native language of the foreign-accented speaker to efficiently accommodate foreign-accented speech variability in a different way to which they accommodate speech variability from different native-accented speakers

    Prosodic detail in Neapolitan Italian

    Get PDF
    Recent findings on phonetic detail have been taken as supporting exemplar-based approaches to prosody. Through four experiments on both production and perception of both melodic and temporal detail in Neapolitan Italian, we show that prosodic detail is not incompatible with abstractionist approaches either. Specifically, we suggest that the exploration of prosodic detail leads to a refined understanding of the relationships between the richly specified and continuous varying phonetic information on one side, and coarse phonologically structured contrasts on the other, thus offering insights on how pragmatic information is conveyed by prosody

    Prosodic detail in Neapolitan Italian

    Get PDF
    Recent findings on phonetic detail have been taken as supporting exemplar-based approaches to prosody. Through four experiments on both production and perception of both melodic and temporal detail in Neapolitan Italian, we show that prosodic detail is not incompatible with abstractionist approaches either. Specifically, we suggest that the exploration of prosodic detail leads to a refined understanding of the relationships between the richly specified and continuous varying phonetic information on one side, and coarse phonologically structured contrasts on the other, thus offering insights on how pragmatic information is conveyed by prosody

    Prosodic detail in Neapolitan Italian

    Get PDF
    Recent findings on phonetic detail have been taken as supporting exemplar-based approaches to prosody. Through four experiments on both production and perception of both melodic and temporal detail in Neapolitan Italian, we show that prosodic detail is not incompatible with abstractionist approaches either. Specifically, we suggest that the exploration of prosodic detail leads to a refined understanding of the relationships between the richly specified and continuous varying phonetic information on one side, and coarse phonologically structured contrasts on the other, thus offering insights on how pragmatic information is conveyed by prosody

    Categorization of Sounds

    Get PDF
    This is the author's accepted manuscript. This article may not exactly replicate the final version published in the APA journal. It is not the copy of record. The original publication can be found at http://psycnet.apa.org/index.cfm?fa=search.displayrecord&uid=2006-08586-015.The authors conducted 4 experiments to test the decision-bound, prototype, and distribution theories for the categorization of sounds. They used as stimuli sounds varying in either resonance frequency or duration. They created different experimental conditions by varying the variance and overlap of 2 stimulus distributions used in a training phase and varying the size of the stimulus continuum used in the subsequent test phase. When resonance frequency was the stimulus dimension, the pattern of categorization-function slopes was in accordance with the decision-bound theory. When duration was the stimulus dimension, however, the slope pattern gave partial support for the decision-bound and distribution theories. The authors introduce a new categorization model combining aspects of decision-bound and distribution theories that gives a superior account of the slope patterns across the 2 stimulus dimensions

    The effects of English proficiency on the processing of Bulgarian-accented English by Bulgarian-English bilinguals

    Get PDF
    This dissertation explores the potential benefit of listening to and with one’s first-language accent, as suggested by the Interspeech Intelligibility Benefit Hypothesis (ISIB). Previous studies have not consistently supported this hypothesis. According to major second language learning theories, the listener’s second language proficiency determines the extent to which the listener relies on their first language phonetics. Hence, this thesis provides a novel approach by focusing on the role of English proficiency in the understanding of Bulgarian-accented English for Bulgarian-English bilinguals. The first experiment investigated whether evoking the listeners’ L1 Bulgarian phonetics would improve the speed of processing Bulgarian-accented English words, compared to Standard British English words, and vice versa. Listeners with lower English proficiency processed Bulgarian-accented English faster than SBE, while high proficiency listeners tended to have an advantage with SBE over Bulgarian accent. The second experiment measured the accuracy and reaction times (RT) in a lexical decision task with single-word stimuli produced by two L1 English speakers and two Bulgarian-English bilinguals. Listeners with high proficiency in English responded slower and less accurately to Bulgarian-accented speech compared to L1 English speech and compared to lower proficiency listeners. These accent preferences were also supported by the listener’s RT adaptation across the first experimental block. A follow-up investigation compared the results of L1 UK English listeners to the bilingual listeners with the highest proficiency in English. The L1 English listeners and the bilinguals processed both accents with similar speed, accuracy and adaptation patterns, showing no advantage or disadvantage for the bilinguals. These studies support existing models of second language phonetics. Higher proficiency in L2 is associated with lesser reliance on L1 phonetics during speech processing. In addition, the listeners with the highest English proficiency had no advantage when understanding Bulgarian-accented English compared to L1 English listeners, contrary to ISIB. Keywords: Bulgarian-English bilinguals, bilingual speech processing, L2 phonetic development, lexical decision, proficienc
    corecore