191,625 research outputs found

    Language Identification Using Visual Features

    Get PDF
    Automatic visual language identification (VLID) is the technology of using information derived from the visual appearance and movement of the speech articulators to iden- tify the language being spoken, without the use of any audio information. This technique for language identification (LID) is useful in situations in which conventional audio processing is ineffective (very noisy environments), or impossible (no audio signal is available). Research in this field is also beneficial in the related field of automatic lip-reading. This paper introduces several methods for visual language identification (VLID). They are based upon audio LID techniques, which exploit language phonology and phonotactics to discriminate languages. We show that VLID is possible in a speaker-dependent mode by discrimi- nating different languages spoken by an individual, and we then extend the technique to speaker-independent operation, taking pains to ensure that discrimination is not due to artefacts, either visual (e.g. skin-tone) or audio (e.g. rate of speaking). Although the low accuracy of visual speech recognition currently limits the performance of VLID, we can obtain an error-rate of < 10% in discriminating between Arabic and English on 19 speakers and using about 30s of visual speech

    XML-driven exploitation of combined scalability in scalable H.264/AVC bitstreams

    Get PDF
    The heterogeneity in the contemporary multimedia environments requires a format-agnostic adaptation framework for the consumption of digital video content. Scalable bitstreams can be used in order to satisfy as many circumstances as possible. In this paper, the scalable extension on the H.264/AVC specification is used to obtain the parent bitstreams. The adaptation along the combined scalability axis of the bitstreams is done in a format-independent manner. Therefore, an abstraction layer of the bitstream is needed. In this paper, XML descriptions are used representing the high-level structure of the bitstreams by relying on the MPEG-21 Bitstream Syntax Description Language standard. The exploitation of the combined scalability is executed in the XML domain by implementing the adaptation process in a Streaming Transformation for XML (STX) stylesheet. The algorithm used in the transformation of the XML description is discussed in detail in this paper. From the performance measurements, one can conclude that the STX transformation in the XML domain and the generation of the corresponding adapted bitstream can be realized in real time

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    Capture, Learning, and Synthesis of 3D Speaking Styles

    Full text link
    Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input - even speech in languages other than English - and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.Comment: To appear in CVPR 201

    Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

    Full text link
    This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

    Promoting independent learning skills using video on digital language laboratories

    Get PDF
    This is the author's PDF version of an article published in Computer assisted language learning ©2006. The definitive version is available at http://www.informaworld.com/The article discusses the potential for developing independent learning skills using the digital language laboratory with particular reference to exploiting the increasingly available resource of digital video. It investigates the potential for recording and editing video clips from online sources and digitalising clips from analogue recordings and reflects on the current status quo regarding the complex copyright regulations in this area. It describes two pilot self-access programmes based on video clips which were undertaken with University College Chester undergraduates and reflects on the value of the experience for students in developing a wide range of language skills as well as independent learning skills using their feedback on the experience
    • …
    corecore