78,016 research outputs found

    The role of negative maternal affective states and infant temperament in early interactions between infants with cleft lip and their mothers

    Get PDF
    OBJECTIVES: The study examined the early interaction between mothers and their infants with cleft lip, assessing the role of maternal affective state and expressiveness and differences in infant temperament. METHODS: Mother-infant interactions were assessed in 25 2-month-old infants with cleft lip and 25 age-matched healthy infants. Self-report and behavioral observations were used to assess maternal depressive symptoms and expressions. Mothers rated infant temperament. RESULTS: Infants with cleft lip were less engaged and their mothers showed more difficulty in interaction than control group dyads. Mothers of infants with cleft lip displayed more negative affectivity, but did not report more self-rated depressive symptoms than control group mothers. No group differences were found in infant temperament. CONCLUSIONS: In order to support the mother's experience and facilitate her ongoing parental role, findings highlight the importance of identifying maternal negative affectivity during early interactions, even when they seem have little awareness of their depressive symptoms

    Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

    Full text link
    Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore

    The Role of Multiple Articulatory Channels of Sign-Supported Speech Revealed by Visual Processing

    Get PDF
    Purpose The use of sign-supported speech (SSS) in the education of deaf students has been recently discussed in relation to its usefulness with deaf children using cochlear implants. To clarify the benefits of SSS for comprehension, 2 eye-tracking experiments aimed to detect the extent to which signs are actively processed in this mode of communication. Method Participants were 36 deaf adolescents, including cochlear implant users and native deaf signers. Experiment 1 attempted to shift observers' foveal attention to the linguistic source in SSS from which most information is extracted, lip movements or signs, by magnifying the face area, thus modifying lip movements perceptual accessibility (magnified condition), and by constraining the visual field to either the face or the sign through a moving window paradigm (gaze contingent condition). Experiment 2 aimed to explore the reliance on signs in SSS by occasionally producing a mismatch between sign and speech. Participants were required to concentrate upon the orally transmitted message. Results In Experiment 1, analyses revealed a greater number of fixations toward the signs and a reduction in accuracy in the gaze contingent condition across all participants. Fixations toward signs were also increased in the magnified condition. In Experiment 2, results indicated less accuracy in the mismatching condition across all participants. Participants looked more at the sign when it was inconsistent with speech. Conclusions All participants, even those with residual hearing, rely on signs when attending SSS, either peripherally or through overt attention, depending on the perceptual conditions.Unión Europea, Grant Agreement 31674

    Jaw Rotation in Dysarthria Measured With a Single Electromagnetic Articulography Sensor

    Get PDF
    Purpose This study evaluated a novel method for characterizing jaw rotation using orientation data from a single electromagnetic articulography sensor. This method was optimized for clinical application, and a preliminary examination of clinical feasibility and value was undertaken. Method The computational adequacy of the single-sensor orientation method was evaluated through comparisons of jaw-rotation histories calculated from dual-sensor positional data for 16 typical talkers. The clinical feasibility and potential value of single-sensor jaw rotation were assessed through comparisons of 7 talkers with dysarthria and 19 typical talkers in connected speech. Results The single-sensor orientation method allowed faster and safer participant preparation, required lower data-acquisition costs, and generated less high-frequency artifact than the dual-sensor positional approach. All talkers with dysarthria, regardless of severity, demonstrated jaw-rotation histories with more numerous changes in movement direction and reduced smoothness compared with typical talkers. Conclusions Results suggest that the single-sensor orientation method for calculating jaw rotation during speech is clinically feasible. Given the preliminary nature of this study and the small participant pool, the clinical value of such measures remains an open question. Further work must address the potential confound of reduced speaking rate on movement smoothness

    A comparison of the development of audiovisual integration in children with autism spectrum disorders and typically developing children

    Get PDF
    This study aimed to investigate the development of audiovisual integration in children with Autism Spectrum Disorder (ASD). Audiovisual integration was measured using the McGurk effect in children with ASD aged 7–16 years and typically developing children (control group) matched approximately for age, sex, nonverbal ability and verbal ability. Results showed that the children with ASD were delayed in visual accuracy and audiovisual integration compared to the control group. However, in the audiovisual integration measure, children with ASD appeared to ‘catch-up’ with their typically developing peers at the older age ranges. The suggestion that children with ASD show a deficit in audiovisual integration which diminishes with age has clinical implications for those assessing and treating these children

    A survey on mouth modeling and analysis for Sign Language recognition

    Get PDF
    © 2015 IEEE.Around 70 million Deaf worldwide use Sign Languages (SLs) as their native languages. At the same time, they have limited reading/writing skills in the spoken language. This puts them at a severe disadvantage in many contexts, including education, work, usage of computers and the Internet. Automatic Sign Language Recognition (ASLR) can support the Deaf in many ways, e.g. by enabling the development of systems for Human-Computer Interaction in SL and translation between sign and spoken language. Research in ASLR usually revolves around automatic understanding of manual signs. Recently, ASLR research community has started to appreciate the importance of non-manuals, since they are related to the lexical meaning of a sign, the syntax and the prosody. Nonmanuals include body and head pose, movement of the eyebrows and the eyes, as well as blinks and squints. Arguably, the mouth is one of the most involved parts of the face in non-manuals. Mouth actions related to ASLR can be either mouthings, i.e. visual syllables with the mouth while signing, or non-verbal mouth gestures. Both are very important in ASLR. In this paper, we present the first survey on mouth non-manuals in ASLR. We start by showing why mouth motion is important in SL and the relevant techniques that exist within ASLR. Since limited research has been conducted regarding automatic analysis of mouth motion in the context of ALSR, we proceed by surveying relevant techniques from the areas of automatic mouth expression and visual speech recognition which can be applied to the task. Finally, we conclude by presenting the challenges and potentials of automatic analysis of mouth motion in the context of ASLR

    Chimpanzee faces under the magnifying glass: emerging methods reveal cross-species similarities and individuality

    Get PDF
    Independently, we created descriptive systems to characterize chimpanzee facial behavior, responding to a common need to have an objective, standardized coding system to ask questions about primate facial behaviors. Even with slightly different systems, we arrive at similar outcomes, with convergent conclusions about chimpanzee facial mobility. This convergence is a validation of the importance of the approach, and provides support for the future use of a facial action coding system for chimpanzees,ChimpFACS. Chimpanzees share many facial behaviors with those of humans. Therefore, processes and mechanisms that explain individual differences in facial activity can be compared with the use of a standardized systems such asChimpFACSandFACS. In this chapter we describe our independent methodological approaches, comparing how we arrived at our facial coding categories. We present some Action Descriptors (ADs) from Gaspar’s initial studies, especially focusing on an ethogram of chimpanzee and bonobo facial behavior, based on studies conducted between 1997 and 2004 at three chimpanzee colonies (The Detroit Zoo; Cleveland Metroparks Zoo; and Burger’s Zoo) and two bonobo colonies (The Columbus Zoo and Aquarium; The Milwaukee County Zoo). We discuss the potential significance of arising issues, the minor qualitative species differences that were found, and the larger quantitative differences in particular facial behaviors observed between species, e.g., bonobos expressed more movements containing particular action units (Brow Lowerer, Lip Raiser, Lip Corner Puller) compared with chimpanzees. The substantial interindividual variation in facial behavior within each species was most striking. Considering individual differences and the impact of development, we highlight the flexibility in facial activity of chimpanzees. We discuss the meaning of facial behaviors in nonhuman primates, addressing specifically individual attributes of Social Attraction, facial expressivity, and the connection of facial behavior to emotion. We do not rule out the communicative function of facial behavior, in which case an individual’s properties of facial behavior are seen as influencing his or her social life, but provide strong arguments in support of the role of facial behavior in the expression of internal states

    Language Identification Using Visual Features

    Get PDF
    Automatic visual language identification (VLID) is the technology of using information derived from the visual appearance and movement of the speech articulators to iden- tify the language being spoken, without the use of any audio information. This technique for language identification (LID) is useful in situations in which conventional audio processing is ineffective (very noisy environments), or impossible (no audio signal is available). Research in this field is also beneficial in the related field of automatic lip-reading. This paper introduces several methods for visual language identification (VLID). They are based upon audio LID techniques, which exploit language phonology and phonotactics to discriminate languages. We show that VLID is possible in a speaker-dependent mode by discrimi- nating different languages spoken by an individual, and we then extend the technique to speaker-independent operation, taking pains to ensure that discrimination is not due to artefacts, either visual (e.g. skin-tone) or audio (e.g. rate of speaking). Although the low accuracy of visual speech recognition currently limits the performance of VLID, we can obtain an error-rate of < 10% in discriminating between Arabic and English on 19 speakers and using about 30s of visual speech

    Speaker-following Video Subtitles

    Full text link
    We propose a new method for improving the presentation of subtitles in video (e.g. TV and movies). With conventional subtitles, the viewer has to constantly look away from the main viewing area to read the subtitles at the bottom of the screen, which disrupts the viewing experience and causes unnecessary eyestrain. Our method places on-screen subtitles next to the respective speakers to allow the viewer to follow the visual content while simultaneously reading the subtitles. We use novel identification algorithms to detect the speakers based on audio and visual information. Then the placement of the subtitles is determined using global optimization. A comprehensive usability study indicated that our subtitle placement method outperformed both conventional fixed-position subtitling and another previous dynamic subtitling method in terms of enhancing the overall viewing experience and reducing eyestrain
    corecore