10,536 research outputs found

    An exploration of the potential of Automatic Speech Recognition to assist and enable receptive communication in higher education

    Get PDF
    The potential use of Automatic Speech Recognition to assist receptive communication is explored. The opportunities and challenges that this technology presents students and staff to provide captioning of speech online or in classrooms for deaf or hard of hearing students and assist blind, visually impaired or dyslexic learners to read and search learning material more readily by augmenting synthetic speech with natural recorded real speech is also discussed and evaluated. The automatic provision of online lecture notes, synchronised with speech, enables staff and students to focus on learning and teaching issues, while also benefiting learners unable to attend the lecture or who find it difficult or impossible to take notes at the same time as listening, watching and thinking

    Multimedia Interfaces for BSL Using Lip Readers

    Get PDF

    Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

    Full text link
    Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore

    Synote: Multimedia Annotation ‘Designed for all'

    No full text
    This paper describes the development and evaluation of Synote, a freely available web based application that makes multimedia web resources (e.g. podcasts) easier to access, search, manage, and exploit for all learners, teachers and other users through the creation of notes, bookmarks, tags, links, images and text captions synchronized to any part of the recording. Synote uniquely enables users to easily find, or associate their notes or resources with any part of a podcast or video recording available on the web and the students surveyed would like to be able to access all their lectures through Synot

    Synote: Designed for all Advanced Learning Technology for Disabled and Non-Disabled People

    No full text
    This paper describes the development and evaluation of Synote, a freely available accessible web based application that makes multimedia web resources (e.g. podcasts) easier to access, search, manage, and exploit for all learners, teachers and other users through the creation of accessible notes, bookmarks, tags, links, images and text captions synchronized to any part of the recording

    TwNC: a Multifaceted Dutch News Corpus

    Get PDF
    This contribution describes the Twente News Corpus (TwNC), a multifaceted corpus for Dutch that is being deployed in a number of NLP research projects among which tracks within the Dutch national research programme MultimediaN, the NWO programme CATCH, and the Dutch-Flemish programme STEVIN.\ud \ud The development of the corpus started in 1998 within a predecessor project DRUID and has currently a size of 530M words. The text part has been built from texts of four different sources: Dutch national newspapers, television subtitles, teleprompter (auto-cues) files, and both manually and automatically generated broadcast news transcripts along with the broadcast news audio. TwNC plays a crucial role in the development and evaluation of a wide range of tools and applications for the domain of multimedia indexing, such as large vocabulary speech recognition, cross-media indexing, cross-language information retrieval etc. Part of the corpus was fed into the Dutch written text corpus in the context of the Dutch-Belgian STEVIN project D-COI that was completed in 2007. The sections below will describe the rationale that was the starting point for the corpus development; it will outline the cross-media linking approach adopted within MultimediaN, and finally provide some facts and figures about the corpus

    A model for hypermedia learning environments based on electronic books

    Get PDF
    Designers of hypermedia learning environments could take advantage of a theoretical scheme which takes into account various kinds of learning activities and solves some of the problems associated with them. In this paper, we present a model which inherits a number of characteristics from hypermedia and electronic books. It can provide designers with the tools for creating hypermedia learning systems, by allowing the elements and functions involved in the definition of a specific application to be formally represented A practical example, CESAR, a hypermedia learning environment for hearing‐impaired children, is presented, and some conclusions derived from the use of the model are also shown

    Concurrent collaborative captioning

    No full text
    Captioned text transcriptions of the spoken word can benefit hearing impaired people, non native speakers, anyone if no audio is available (e.g. watching TV at an airport) and also anyone who needs to review recordings of what has been said (e.g. at lectures, presentations, meetings etc.) In this paper, a tool is described that facilitates concurrent collaborative captioning by correction of speech recognition errors to provide a sustainable method of making videos accessible to people who find it difficult to understand speech through hearing alone. The tool stores all the edits of all the users and uses a matching algorithm to compare users’ edits to check if they are in agreement

    BIBS: A Lecture Webcasting System

    Get PDF
    The Berkeley Internet Broadcasting System (BIBS) is a lecture webcasting system developed and operated by the Berkeley Multimedia Research Center. The system offers live remote viewing and on-demand replay of course lectures using streaming audio and video over the Internet. During the Fall 2000 semester 14 classes were webcast, including several large lower division classes, with a total enrollment of over 4,000 students. Lectures were played over 15,000 times per month during the semester. The primary use of the webcasts is to study for examinations. Students report they watch BIBS lectures because they did not understand material presented in lecture, because they wanted to review what the instructor said about selected topics, because they missed a lecture, and/or because they had difficulty understanding the speaker (e.g., non-native English speakers). Analysis of various survey data suggests that more than 50% of the students enrolled in some large classes view lectures and that as many as 75% of the lectures are played by members of the Berkeley community. Faculty attitudes vary about the virtues of lecture webcasting. Some question the use of this technology while others believe it is a valuable aid to education. Further study is required to accurately assess the pedagogical impact that lecture webcasts have on student learning
    • 

    corecore