14 research outputs found

    Manipulasi Frekuensi Dasar Menggunakan Metode STRAIGHT Untuk Sintesis Suara Ucapan Ekspresif Dalam Bahasa Indonesia

    Full text link
    Fundamental frequency (F0) merupakan salah satu parameter dalam sinyal suara ucapan yang dapat mempengaruhi tinggi rendahnya tekanan (intonasi). Parameter lain dalam sinyal suara ucapan yang berpengaruh terhadap intonasi adalah power, komponen periodik dan tak periodik sinyal suara. Dalam penelitian ini, dilakukan manipulasi sinyal suara ucapan hanya pada parameter F0 sinyal suara ucapan berbahasa Indonesia, sedangkan parameter lain dianggap tetap. Manipulasi dilakukan dengan metode STRAIGHT. Kualitas hasil manipulasi suara ucapan dilakukan dengan metode MOS

    The sound of trustworthiness: acoustic-based modulation of perceived voice personality

    Get PDF
    When we hear a new voice we automatically form a "first impression" of the voice owner’s personality; a single word is sufficient to yield ratings highly consistent across listeners. Past studies have shown correlations between personality ratings and acoustical parameters of voice, suggesting a potential acoustical basis for voice personality impressions, but its nature and extent remain unclear. Here we used data-driven voice computational modelling to investigate the link between acoustics and perceived trustworthiness in the single word "hello". Two prototypical voice stimuli were generated based on the acoustical features of voices rated low or high in perceived trustworthiness, respectively, as well as a continuum of stimuli inter- and extrapolated between these two prototypes. Five hundred listeners provided trustworthiness ratings on the stimuli via an online interface. We observed an extremely tight relationship between trustworthiness ratings and position along the trustworthiness continuum (r = 0.99). Not only were trustworthiness ratings higher for the high- than the low-prototypes, but the difference could be modulated quasi-linearly by reducing or exaggerating the acoustical difference between the prototypes, resulting in a strong caricaturing effect. The f0 trajectory, or intonation, appeared a parameter of particular relevance: hellos rated high in trustworthiness were characterized by a high starting f0 then a marked decrease at mid-utterance to finish on a strong rise. These results demonstrate a strong acoustical basis for voice personality impressions, opening the door to multiple potential applications

    Similarities in face and voice cerebral processing

    Get PDF
    In this short paper I illustrate by a few selected examples several compelling similarities in the functional organization of face and voice cerebral processing: (1) Presence of cortical areas selective to face or voice stimuli, also observed in non-human primates, and causally related to perception; (2) Coding of face or voice identity using a “norm-based” scheme; (3) Personality inferences from faces and voices in a same Trustworthiness–Dominance “social space”

    Perceptual Continuity and Naturalness of Expressive Strength in Singing Voices Based on Speech Morphing

    Get PDF
    This paper experimentally shows the importance of perceptual continuity of the expressive strength in vocal timbre for natural change in vocal expression. In order to synthesize various and continuous expressive strengths with vocal timbre, we investigated gradually changing expressions by applying the STRAIGHT speech morphing algorithm to singing voices. Here, a singing voice without expression is used as the base of morphing, and singing voices with three different expressions are used as the target. Through statistical analyses of perceptual evaluations, we confirmed that the proposed morphing algorithm provides perceptual continuity of vocal timbre. Our results showed the following: (i) gradual strengths in absolute evaluations, and (ii) a perceptually linear strength provided by the calculation of corrected intervals of the morph ratio by the inverse (reciprocal) function of an equation that approximates the perceptual strength. Finally, we concluded that applying continuity was highly effective for achieving perceptual naturalness, judging from the results showing that (iii) our gradual transformation method can perform well for perceived naturalness

    Effects of emotional valence and arousal on the voice perception network

    Get PDF
    Several theories conceptualise emotions along two main dimensions: valence (a continuum from negative to positive) and arousal (a continuum that varies from low to high). These dimensions are typically treated as independent in many neuroimaging experiments, yet recent behavioural findings suggest that they are actually interdependent. This result has impact on neuroimaging design, analysis and theoretical development. We were interested in determining the extent of this interdependence both behaviourally and neuroanatomically, as well as teasing apart any activation that is specific to each dimension. While we found extensive overlap in activation for each dimension in traditional emotion areas (bilateral insulae, orbitofrontal cortex, amygdalae), we also found activation specific to each dimension with characteristic relationships between modulations of these dimensions and BOLD signal change. Increases in arousal ratings were related to increased activations predominantly in voice-sensitive cortices after variance explained by valence had been removed. In contrast, emotions of extreme valence were related to increased activations in bilateral voice-sensitive cortices, hippocampi, anterior and midcingulum and medial orbito- and superior frontal regions after variance explained by arousal had been accounted for. Our results therefore do not support a complete segregation of brain structures underpinning the processing of affective dimensions

    On the Representation of Speaker Information in Human Voices: An Adaptation Approach

    Get PDF
    Apart from being carriers of speech, human voices contain a wealth of social signals, for instance about a speaker’s gender, identity, or age, to name but a few. The present thesis is concerned with the way adaptation modifies the perception of gender and identity information from voices. Adaptation is a mechanism by which neural responses decrease after continuous or repetitive stimulation. It is revealed by transient perceptual aftereffects indicating contrastive coding of simple and complex stimulus properties. The three studies reported here investigate unimodal and crossmodal auditory voice aftereffects of adaptation to unfamiliar and personally familiar speakers. Specifically, study I (Exp. 1) shows that adaptation to unfamiliar voices of female or male speakers biases the perception of voice gender away from the adapting gender: test voices, as created by auditory morphing between male and female voices, are perceived as more male following adaptation to female voices and vice versa. The voice gender aftereffect (VGAE) survived at least a few minutes and suggests the existence of voice detectors tuned to female and male voice quality. The absence of voice aftereffects following adaptation to names (Exp. 2), faces (Exp. 3), or sinusoidal tones matched to F0 of adaptor voices (Exp. 4) further suggests that the VGAE is due to habituation of high-level auditory representations. Study II replicates behavioural findings of study I (Exp. 1) and further supports the notion of processing selectivity for female and male voices by providing electrophysiological evidence. Systematic adaptation-induced amplitude reductions in AEPs (N1, P2, and P3) were observed in response to otherwise identical test voices when test voices and adaptors had the same gender as opposed to different genders. This suggests that contrastive coding of voice gender is implemented by auditory cortex neurons and takes place within the first few hundred milliseconds from voice onset. Similar to the VGAE, auditory aftereffects of adaptation to voices or faces of personally familiar speakers caused contrastive aftereffects in listeners’ perception of voice identity (study III). Unimodal voice-to-voice aftereffects (Exp. 1) were more pronounced and more persistent than crossmodal face-to-voice aftereffects (Exp. 2) pointing to at least two perceptual mechanisms of voice identity adaptation: one related to auditory coding of voice characteristics and one related to multimodal coding of speaker identity. These results complement findings in face perception (z.B. Leopold et al., 2001; Webster et al., 2004) and suggest that adaptation is a ubiquitous mechanism that routinely influences the perception of non-linguistic social information from both faces and voices
    corecore