78 research outputs found

    Fluidity in the perception of auditory speech: Cross-modal recalibration of voice gender and vowel identity by a talking face

    Get PDF
    Article first published online: January 13, 2020Humans quickly adapt to variations in the speech signal. Adaptation may surface as recalibration, a learning effect driven by error-minimisation between a visual face and an ambiguous auditory speech signal, or as selective adaptation, a contrastive aftereffect driven by the acoustic clarity of the sound. Here, we examined whether these aftereffects occur for vowel identity and voice gender. Participants were exposed to male, female, or androgynous tokens of speakers pronouncing /e/, /ø/, (embedded in words with a consonant-vowel-consonant structure), or an ambiguous vowel halfway between /e/ and /ø/ dubbed onto the video of a male or female speaker pronouncing /e/ or /ø/. For both voice gender and vowel identity, we found assimilative aftereffects after exposure to auditory ambiguous adapter sounds, and contrastive aftereffects after exposure to auditory clear adapter sounds. This demonstrates that similar principles for adaptation in these dimensions are at play.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Gravitation Grant 024.001.006 of the Language in Interaction Consortium from Netherlands Organization for Scientific Research. The third author was supported by The Netherlands Organization for Scientific Research (NWO: VENI Grant 275-89-027)

    How visual cues to speech rate influence speech perception

    No full text
    Spoken words are highly variable and therefore listeners interpret speech sounds relative to the surrounding acoustic context, such as the speech rate of a preceding sentence. For instance, a vowel midway between short /ɑ/ and long /a:/ in Dutch is perceived as short /ɑ/ in the context of preceding slow speech, but as long /a:/ if preceded by a fast context. Despite the well-established influence of visual articulatory cues on speech comprehension, it remains unclear whether visual cues to speech rate also influence subsequent spoken word recognition. In two ‘Go Fish’-like experiments, participants were presented with audio-only (auditory speech + fixation cross), visual-only (mute videos of talking head), and audiovisual (speech + videos) context sentences, followed by ambiguous target words containing vowels midway between short /ɑ/ and long /a:/. In Experiment 1, target words were always presented auditorily, without visual articulatory cues. Although the audio-only and audiovisual contexts induced a rate effect (i.e., more long /a:/ responses after fast contexts), the visual-only condition did not. When, in Experiment 2, target words were presented audiovisually, rate effects were observed in all three conditions, including visual-only. This suggests that visual cues to speech rate in a context sentence influence the perception of following visual target cues (e.g., duration of lip aperture), which at an audiovisual integration stage bias participants’ target categorization responses. These findings contribute to a better understanding of how what we see influences what we hear

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    The integration of paralinguistic information from the face and the voice

    Get PDF
    We live in a world which bombards us with a huge amount of sensory information, even if we are not always aware of it. To successfully navigate, function and ultimately survive in our environment we use all of the cues available to us. Furthermore, we actually combine this information: doing so allows us not only to construct a richer percept of the objects around us, but actually increases the reliability of our decisions and sensory estimates. However, at odds with our naturally multisensory awareness of our surroundings, the literature addressing unisensory processes has always far exceeded that which examines the multimodal nature of perception. Arguably the most salient and relevant stimuli in our environment are other people. Our species is not designed to operate alone, and so we have evolved to be especially skilled in all those things which enable effective social interaction – this could be engaging in conversation, but equally as well recognising a family member, or understanding the current emotional state of a friend, and adjusting our behaviour appropriately. In particular, the face and the voice both provide us with a wealth of hugely relevant social information - linguistic, but also non-linguistic. In line with work conducted in other fields of multisensory perception, research on face and voice perception has mainly concentrated on each of these modalities independently, particularly face perception. Furthermore, the work that has addressed integration of these two sources by and large has concentrated on the audiovisual nature of speech perception. The work in this thesis is based on a theoretical model of voice perception which not only proposed a serial processing pathway of vocal information, but also emphasised the similarities between face and voice processing, suggesting that this information may interact. Significantly, these interactions were not just confined to speech processing, but rather encompassed all forms of information processing, whether this was linguistic or paralinguistic. Therefore, in this thesis, I concentrate on the interactions between, and integration of face-voice paralinguistic information. In Chapter 3 we conducted a general investigation of neural face-voice integration. A number of studies have attempted to identify the cerebral regions in which information from the face and voice combines; however, in addition to a large number of regions being proposed as integration sites, it is not known whether these regions are selective in the binding of these socially relevant stimuli. We identified firstly regions in the bilateral superior temporal sulcus (STS) which showed an increased response to person-related information – whether this was faces, voices, or faces and voices combined – in comparison to information from objects. A subsection of this region in the right posterior superior temporal sulcus (pSTS) also produced a significantly stronger response to audiovisual as compared to unimodal information. We therefore propose this as a potential people-selective, integrative region. Furthermore, a large portion of the right pSTS was also observed to be people-selective and heteromodal: that is, both auditory and visual information provoked a significant response above baseline. These results underline the importance of the STS region in social communication. Chapter 4 moved on to study the audiovisual perception of gender. Using a set of novel stimuli – which were not only dynamic but also morphed in both modalities – we investigated whether different combinations of gender information in the face and voice could affect participants’ perception of gender. We found that participants indeed combined both sources of information when categorising gender, with their decision being reflective of information contained in both modalities. However, this combination was not entirely equal: in this experiment, gender information from the voice appeared to dominate over that from the face, exerting a stronger modulating effect on categorisation. This result was supported by the findings from conditions which directed to attention, where we observed participants were able to ignore face but not voice information; and also reaction times results, where latencies were generally a reflection of voice morph. Overall, these results support interactions between face and voice in gender perception, but demonstrate that (due to a number of probable factors) one modality can exert more influence than another. Finally, in Chapter 5 we investigated the proposed interactions between affective content in the face and voice. Specifically, we used a ‘continuous carry-over’ design – again in conjunction with dynamic, morphed stimuli – which allowed us to investigate not only ‘direct’ effects of different sets of audiovisual stimuli (e.g., congruent, incongruent), but also adaptation effects (in particular, the effect of emotion expressed in one modality upon the response to emotion expressed in another modality). Parallel to behavioural results, which showed that the crossmodal context affected the time taken to categorise emotion, we observed a significant crossmodal effect in the right pSTS, which was independent of any within-modality adaptation. We propose that this result provides strong evidence that this region may be composed of similarly multisensory neurons, as opposed to two sets of interdigitised neurons responsive to information from one modality or the other. Furthermore, an analysis investigating stimulus congruence showed that the degree of incongruence modulated activity across the right STS, further inferring neural response in this region can be altered depending on the particular combination of affective information contained within the face and voice. Overall, both behavioural and cerebral results from this study suggested that participants integrated emotion from the face and voice

    FROM PHENOMENOLOGY OF LANGUAGE TO A THEORY OF SOCIOLOGICAL PRAXIS: PERCEPTION, IDEOLOGY, AND MEANING IN MULTIMODAL LINGUISTIC DISCOURSE

    Get PDF
    Linguistics has prioritized the auditory mode of transmission in language at the expense of written forms and their relevance to the social construction of meaning and identity. Due to the privilege of spoken language as the least-mediated form of symbolic expression, the significant role non-verbal linguistic communication plays in social life is often overlooked. Through the perspectives of cognitive and perceptual forms of epistemology, written forms of language can and do influence reception to non-verbal utterance in a socially significant manner. Ideologies of language predispose linguistic and anthropological research against considerations of written linguistic artifacts and their roles in constituting ascribed social meaning. Signed forms of utterance are constrained by standardization and grammaticality, which in turn iconize and erase written language variation. When written variation is intentionally produced, it creates perceptually derived, ideologically charged responses that affect social attitudes and discourses. I address the methods and foci of sociolinguistic research for their pertinence to non-spoken language. I then analyze variation in written language in the domains of audiovisual animated media and African American dialect literature to show how socially significant responses to written variation create stratification by constructing fictive speech classes which are indexed to real speech communities. This investigation aims to clarify how modes of language transmission share properties assumed to be domain-specific, as well as to warrant a reexamination of the phonocentric concept of language in linguistic anthropology. As written forms of language are central to digital media, traditional sociolinguistic research must account for the written word just as it does the spoken

    Authentic self, incongruent acoustics : a corpus-based sociophonetic analysis of nonbinary speech.

    Get PDF
    This thesis examines the ways six nonbinary speakers in Christchurch, New Zealand present their gender identity via speech. It examines their productions in reference to both established trends in the literature, as well as speech collected from ten binary speakers (5M, 5F) at the same time. It seeks to examine whether, in addition to encoding binary gender, speech also encodes nonbinary gender. Three hypotheses are proposed and tested across multiple linguistic variables. The first hypothesis regards acoustic incongruence, and posits that nonbinary speakers may assert their nonbinary identities via speech that utilises particular combinations of variables which create either ambiguity or dissonance in regards to established binary-gender norms. Ambiguous gender incongruence arises from the use of speech that is neither reliably perceived as female, nor reliably perceived as male. Dissonant gender incongruence arises from the use of speech that is reliably perceived as both male and female. The second hypothesis predicts that nonbinary speakers will show greater variation in speech based on immediate contextual factors, compared to binary speakers. This difference is hypothesised to be due to to nonbinary speakers paying greater attention to production, and the greater degree of variation in their own speech over time compared to binary speakers. Hypothesis 3 predicts that nonbinary speakers are not a uniform population, and that their use of incongruence will be influenced extensively by their individual condition, including their professed speech goals, history, and gender identity. The hypotheses are tested quantitatively in regards to five linguistic variables: Pitch, pitch range, monophthong production, Vowel Space Area (VSA), and intervocalic /t/ frication rates. The interaction between multiple variables together is also considered. In-depth examinations of the variation utilised by a single speaker in the form of "Spotlights" address the hypotheses from a qualitative perspective. Overall, the thesis finds some evidence for Hypothesis 1. In every linguistic variable examined, nonbinary speakers show some distinction from binary speakers that is not explained fully via speaker Assigned Sex at Birth (ASAB). Some binary speakers also seem to produce incongruence, particularly binary women and particularly within single variables. The small scale of the study presents a limitation in addressing Hypothesis 2, but avenues for future work are identified. The qualitative evidence provides strong support for Hypothesis 3, in the examination of individual nonbinary speakers and the way their measured productions support their professed speech goals and identities. Overall, this dissertation presents one of the first comparative analyses of nonbinary speech, and presents a number of novel approaches to examining phonetic data from a statistical perspective that still accommodates an analysis of individual agency and goals in identity building

    The Literariness of Media Art

    Get PDF
    “Language can be this incredibly forceful material—there’s something about it where if you can strip away its history, get to the materiality of it, it can rip into you like claws” (Hill in Vischer 1995, 11). This arresting image by media artist Gary Hill evokes the nearly physical force of language to hold recipients in its grip. That power seems to lie in the material of language itself, which, with a certain rawness, may captivate or touch, pounce on, or even harm its addressee. Hill’s choice of words is revealing: ‘rip into’ suggests not only a metaphorical emotional pull but also the literal physicality of linguistic attack. It is no coincidence that the statement comes from a media artist, since media artworks often use language to produce a strong sensorial stimulus. Media artworks not only manipulate language as a material in itself, but they also manipulate the viewer’s perceptual channels. The guises and effects of language as artistic material are the topic of this book, The Literariness of Media Art

    Out online:trans self-representation and community building on YouTube

    Get PDF

    Queering Translation, Translating the Queer

    Get PDF
    This groundbreaking work is the first full book-length publication to critically engage in the emerging field of research on the queer aspects of translation and interpreting studies. The volume presents a variety of theoretical and disciplinary perspectives through fifteen contributions from both established and up-and-coming scholars in the field to demonstrate the interconnectedness between translation and queer aspects of sex, gender, and identity. The book begins with the editors’ introduction to the state of the field, providing an overview of both current and developing lines of research, and builds on this foundation to look at this research more closely, grouped around three different sections: Queer Theorizing of Translation; Case Studies of Queer Translations and Translators; and Queer Activism and Translation. This interdisciplinary approach seeks to not only shed light on this promising field of research but also to promote cross fertilization between these disciplines towards further exploring the intersections between queer studies and translation studies, making this volume key reading for students and scholars interested in translation studies, queer studies, politics, and activism, and gender and sexuality studies
    • …
    corecore