4 research outputs found

    The importance of different facial areas for signalling visual prominence

    Get PDF

    The importance of different facial areas for signalling visual prominence

    Get PDF
    This article discusses the processing of facial markers of promi-nence in spoken utterances. In particular, it investigates which area of a speaker’s face contains the strongest cues to promi-nence, using stimuli with the entire face visible or versions in which participants could only see the upper or lower half, or the right or left part of the face. To compensate for potential ceiling effects, subjects were positioned at a distance of either 50cm, 250cm or 380cm from the screen which displayed the film frag-ments. The task of the subjects was to indicate for each stimulus which word they perceived as the most prominent one. Results show that, while prominence detection becomes more difficult at longer distances, the upper facial area has stronger cue value for prominence detection than the bottom part, and that the left part of the face is more important than the right part. Results of mirror-images of the original fragments show that this latter result is due both to a speaker and an observer effect. Index Terms: prominence, facial areas, audiovisual speech 1

    Síntesis Audiovisual Realista Personalizable

    Get PDF
    Es presenta un esquema únic per a la síntesi i anàlisi audiovisual personalitzable realista de seqüències audiovisuals de cares parlants i seqüències visuals de llengua de signes en àmbit domèstic. En el primer cas, amb animació totalment sincronitzada a través d'una font de text o veu; en el segon, utilitzant la tècnica de lletrejar paraules mitjançant la ma. Les seves possibilitats de personalització faciliten la creació de seqüències audiovisuals per part d'usuaris no experts. Les aplicacions possibles d'aquest esquema de síntesis comprenen des de la creació de personatges virtuals realistes per interacció natural o vídeo jocs fins vídeo conferència des de molt baix ample de banda i telefonia visual per a les persones amb problemes d'oïda, passant per oferir ajuda a la pronunciació i la comunicació a aquest mateix col·lectiu. El sistema permet processar seqüències llargues amb un consum de recursos molt reduït, sobre tot, en el referent a l'emmagatzematge, gràcies al desenvolupament d'un nou procediment de càlcul incremental per a la descomposició en valors singulars amb actualització de la informació mitja. Aquest procediment es complementa amb altres tres: el decremental, el de partició i el de composició.Se presenta un esquema único para la síntesis y análisis audiovisual personalizable realista de secuencias audiovisuales de caras parlantes y secuencias visuales de lengua de signos en entorno doméstico. En el primer caso, con animación totalmente sincronizada a través de una fuente de texto o voz; en el segundo, utilizando la técnica de deletreo de palabras mediante la mano. Sus posibilidades de personalización facilitan la creación de secuencias audiovisuales por parte de usuarios no expertos. Las aplicaciones posibles de este esquema de síntesis comprenden desde la creación de personajes virtuales realistas para interacción natural o vídeo juegos hasta vídeo conferencia de muy bajo ancho de banda y telefonía visual para las personas con problemas de oído, pasando por ofrecer ayuda en la pronunciación y la comunicación a este mismo colectivo. El sistema permite procesar secuencias largas con un consumo de recursos muy reducido gracias al desarrollo de un nuevo procedimiento de cálculo incremental para la descomposición en valores singulares con actualización de la información media.A shared framework for realistic and personalizable audiovisual synthesis and analysis of audiovisual sequences of talking heads and visual sequences of sign language is presented in a domestic environment. The former has full synchronized animation using a text or auditory source of information; the latter consists in finger spelling. Their personalization capabilities ease the creation of audiovisual sequences by non expert users. The applications range from realistic virtual avatars for natural interaction or videogames to low bandwidth videoconference and visual telephony for the hard of hearing, including help to speech therapists. Long sequences can be processed with reduced resources, specially storing ones. This is allowed thanks to the proposed scheme for the incremental singular value decomposition with mean preservation. This scheme is complemented with another three: the decremental, the split and the composed ones

    Effects of forensically-relevant facial concealment on acoustic and perceptual properties of consonants

    Get PDF
    This thesis offers a thorough investigation into the effects of forensically-relevant facial concealment on speech acoustics and perception. Specifically, it explores the extent to which selected acoustic-phonetic and auditory-perceptual properties of consonants are affected when the talker is wearing ‘facewear’ while speaking. In this context, the term ‘facewear’ refers to the various types of face-concealing garments and headgear that are worn by people in common daily communication situations; for work and leisure, or as an expression of religious, social and cultural affiliation (e.g. surgical masks, motorcycle helmets, ski and cycling masks, or full-face veils such as the niqāb). It also denotes the face or head coverings that are typically used as deliberate (visual) disguises during the commission of crimes and in situations of public disorder (e.g. balaclavas, hooded sweatshirts, or scarves). The present research centres on the question: does facewear influence the way that consonants are produced, transmitted, and perceived? To examine the effects of facewear on the acoustic speech signal, various intensity, spectral, and temporal properties of spoken English consonants were measured. It was found that facewear can considerably alter the acoustic-phonetic characteristics of consonants. This was likely to be the result of both deliberate and involuntary changes to the talker’s speech productions, and of sound energy absorption by the facewear material. The perceptual consequences of the acoustic modifications to speech were assessed by way of a consonant identification study and a talker discrimination study. The results of these studies showed that auditory-only and auditory-visual consonant intelligibility, as well as the discrimination of unfamiliar talkers, may be greatly compromised when the observer’s judgements are based on ‘facewear speech’. The findings reported in this thesis contribute to our understanding of how auditory and visual information interact during natural speech processing. Furthermore, the results have important practical implications for legal cases in which speech produced through facewear is of pivotal importance. Forensic speech scientists are therefore advised to take the possible effects of facewear on speech into account when interpreting the outcome of their acoustic and auditory analyses of evidential speech recordings, and when evaluating the reliability of earwitness testimony
    corecore