10,536 research outputs found
An exploration of the potential of Automatic Speech Recognition to assist and enable receptive communication in higher education
The potential use of Automatic Speech Recognition to assist receptive communication is explored. The opportunities and challenges that this technology presents students and staff to provide captioning of speech online or in classrooms for deaf or hard of hearing students and assist blind, visually impaired or dyslexic learners to read and search learning material more readily by augmenting synthetic speech with natural recorded real speech is also discussed and evaluated. The automatic provision of online lecture notes, synchronised with speech, enables staff and students to focus on learning and teaching issues, while also benefiting learners unable to attend the lecture or who find it difficult or impossible to take notes at the same time as listening, watching and thinking
Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed
Speechreading or lipreading is the technique of understanding and getting
phonetic features from a speaker's visual features such as movement of lips,
face, teeth and tongue. It has a wide range of multimedia applications such as
in surveillance, Internet telephony, and as an aid to a person with hearing
impairments. However, most of the work in speechreading has been limited to
text generation from silent videos. Recently, research has started venturing
into generating (audio) speech from silent video sequences but there have been
no developments thus far in dealing with divergent views and poses of a
speaker. Thus although, we have multiple camera feeds for the speech of a user,
but we have failed in using these multiple video feeds for dealing with the
different poses. To this end, this paper presents the world's first ever
multi-view speech reading and reconstruction system. This work encompasses the
boundaries of multimedia research by putting forth a model which leverages
silent video feeds from multiple cameras recording the same subject to generate
intelligent speech for a speaker. Initial results confirm the usefulness of
exploiting multiple camera views in building an efficient speech reading and
reconstruction system. It further shows the optimal placement of cameras which
would lead to the maximum intelligibility of speech. Next, it lays out various
innovative applications for the proposed system focusing on its potential
prodigious impact in not just security arena but in many other multimedia
analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul,
Republic of Kore
Synote: Multimedia Annotation âDesigned for all'
This paper describes the development and evaluation of Synote, a freely available web based application that makes multimedia web resources (e.g. podcasts) easier to access, search, manage, and exploit for all learners, teachers and other users through the creation of notes, bookmarks, tags, links, images and text captions synchronized to any part of the recording. Synote uniquely enables users to easily find, or associate their notes or resources with any part of a podcast or video recording available on the web and the students surveyed would like to be able to access all their lectures through Synot
Synote: Designed for all Advanced Learning Technology for Disabled and Non-Disabled People
This paper describes the development and evaluation of Synote, a freely available accessible web based application that makes multimedia web resources (e.g. podcasts) easier to access, search, manage, and exploit for all learners, teachers and other users through the creation of accessible notes, bookmarks, tags, links, images and text captions synchronized to any part of the recording
TwNC: a Multifaceted Dutch News Corpus
This contribution describes the Twente News Corpus (TwNC), a multifaceted corpus for Dutch that is being deployed in a number of NLP research projects among which tracks within the Dutch national research programme MultimediaN, the NWO programme CATCH, and the Dutch-Flemish programme STEVIN.\ud
\ud
The development of the corpus started in 1998 within a predecessor project DRUID and has currently a size of 530M words. The text part has been built from texts of four different sources: Dutch national newspapers, television subtitles, teleprompter (auto-cues) files, and both manually and automatically generated broadcast news transcripts along with the broadcast news audio. TwNC plays a crucial role in the development and evaluation of a wide range of tools and applications for the domain of multimedia indexing, such as large vocabulary speech recognition, cross-media indexing, cross-language information retrieval etc. Part of the corpus was fed into the Dutch written text corpus in the context of the Dutch-Belgian STEVIN project D-COI that was completed in 2007. The sections below will describe the rationale that was the starting point for the corpus development; it will outline the cross-media linking approach adopted within MultimediaN, and finally provide some facts and figures about the corpus
A model for hypermedia learning environments based on electronic books
Designers of hypermedia learning environments could take advantage of a theoretical scheme which takes into account various kinds of learning activities and solves some of the problems associated with them. In this paper, we present a model which inherits a number of characteristics from hypermedia and electronic books. It can provide designers with the tools for creating hypermedia learning systems, by allowing the elements and functions involved in the definition of a specific application to be formally represented A practical example, CESAR, a hypermedia learning environment for hearingâimpaired children, is presented, and some conclusions derived from the use of the model are also shown
Concurrent collaborative captioning
Captioned text transcriptions of the spoken word can benefit hearing impaired people, non native speakers, anyone if no audio is available (e.g. watching TV at an airport) and also anyone who needs to review recordings of what has been said (e.g. at lectures, presentations, meetings etc.) In this paper, a tool is described that facilitates concurrent collaborative captioning by correction of speech recognition errors to provide a sustainable method of making videos accessible to people who find it difficult to understand speech through hearing alone. The tool stores all the edits of all the users and uses a matching algorithm to compare usersâ edits to check if they are in agreement
BIBS: A Lecture Webcasting System
The Berkeley Internet Broadcasting System (BIBS) is a lecture webcasting system developed and operated by the Berkeley Multimedia Research Center. The system offers live remote viewing and on-demand replay of course lectures using streaming audio and video over the Internet. During the Fall 2000 semester 14 classes were webcast, including several large lower division classes, with a total enrollment of over 4,000 students. Lectures were played over 15,000 times per month during the semester. The primary use of the webcasts is to study for examinations. Students report they watch BIBS lectures because they did not understand material presented in lecture, because they wanted to review what the instructor said about selected topics, because they missed a lecture, and/or because they had difficulty understanding the speaker (e.g., non-native English speakers). Analysis of various survey data suggests that more than 50% of the students enrolled in some large classes view lectures and that as many as 75% of the lectures are played by members of the Berkeley community. Faculty attitudes vary about the virtues of lecture webcasting. Some question the use of this technology while others believe it is a valuable aid to education. Further study is required to accurately assess the pedagogical impact that lecture webcasts have on student learning
- âŠ