20,466 research outputs found
Indexing, browsing and searching of digital video
Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a âpiece â of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver
Combining heterogeneous inputs for the development of adaptive and multimodal interaction systems
In this paper we present a novel framework for the integration of visual sensor networks and speech-based interfaces. Our proposal follows the standard reference architecture in fusion systems (JDL), and combines different techniques related to Artificial Intelligence, Natural Language Processing and User Modeling to provide an enhanced interaction with their users. Firstly, the framework integrates a Cooperative Surveillance Multi-Agent System (CS-MAS), which includes several types of autonomous agents working in a coalition to track and make inferences on the positions of the targets. Secondly, enhanced conversational agents facilitate human-computer interaction by means of speech interaction. Thirdly, a statistical methodology allows modeling the user conversational behavior, which is learned from an initial corpus and improved with the knowledge acquired from the successive interactions. A technique is proposed to facilitate the multimodal fusion of these information sources and consider the result for the decision of the next system action.This work was supported in part by Projects MEyC TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS S2009/TIC-1485Publicad
A FRAMEWORK FOR INTELLIGENT VOICE-ENABLED E-EDUCATION SYSTEMS
Although the Internet has received significant attention in recent years, voice is still the most convenient and natural way of communicating between human to human or
human to computer. In voice applications, users may have different needs which will require the ability of the system to reason, make decisions, be flexible and adapt to
requests during interaction. These needs have placed new requirements in voice application development such as use of advanced models, techniques and methodologies which take into account the needs of different users and environments. The ability of a system to behave close to human reasoning is often mentioned as one of the major requirements for the development of voice applications.
In this paper, we present a framework for an intelligent voice-enabled e-Education application and an adaptation of the framework for the development of a prototype Course Registration and Examination (CourseRegExamOnline) module. This study is a preliminary report of an ongoing e-Education project containing the following modules: enrollment, course registration and examination, enquiries/information, messaging/collaboration, e-Learning and library.
The CourseRegExamOnline module was developed using VoiceXML for the voice user interface(VUI), PHP for the web user interface (WUI), Apache as the middle-ware and MySQL database as back-end. The system would offer dual access modes using the VUI and WUI.
The framework would serve as a reference model for developing voice-based e-Education applications. The e-Education system when fully developed would meet the
needs of students who are normal users and those with certain forms of disabilities such as visual impairment, repetitive strain injury (RSI), etc, that make reading and
writing difficult
A Review of Verbal and Non-Verbal Human-Robot Interactive Communication
In this paper, an overview of human-robot interactive communication is
presented, covering verbal as well as non-verbal aspects of human-robot
interaction. Following a historical introduction, and motivation towards fluid
human-robot communication, ten desiderata are proposed, which provide an
organizational axis both of recent as well as of future research on human-robot
communication. Then, the ten desiderata are examined in detail, culminating to
a unifying discussion, and a forward-looking conclusion
Object Referring in Visual Scene with Spoken Language
Object referring has important applications, especially for human-machine
interaction. While having received great attention, the task is mainly attacked
with written language (text) as input rather than spoken language (speech),
which is more natural. This paper investigates Object Referring with Spoken
Language (ORSpoken) by presenting two datasets and one novel approach. Objects
are annotated with their locations in images, text descriptions and speech
descriptions. This makes the datasets ideal for multi-modality learning. The
approach is developed by carefully taking down ORSpoken problem into three
sub-problems and introducing task-specific vision-language interactions at the
corresponding levels. Experiments show that our method outperforms competing
methods consistently and significantly. The approach is also evaluated in the
presence of audio noise, showing the efficacy of the proposed vision-language
interaction methods in counteracting background noise.Comment: 10 pages, Submitted to WACV 201
The FĂschlĂĄr-News-Stories system: personalised access to an archive of TV news
The âFĂschlĂĄrâ systems are a family of tools for capturing, analysis, indexing, browsing, searching and summarisation of digital video information. FĂschlĂĄr-News-Stories, described in this paper, is one of those systems, and provides access to a growing archive of broadcast TV news. FĂschlĂĄr-News-Stories has several notable features including the fact that it automatically records TV news and segments a broadcast news program into stories, eliminating advertisements and credits at the start/end of the broadcast. FĂschlĂĄr-News-Stories supports access to individual stories via calendar lookup, text search through closed captions, automatically-generated links between related stories, and personalised access using a personalisation and recommender system based on collaborative filtering. Access to individual news stories is supported either by browsing keyframes with synchronised closed captions, or by playback of the recorded video. One strength of the FĂschlĂĄr-News-Stories system is that it is actually used, in practice, daily, to access news. Several aspects of the FĂschlĂĄr systems have been published before, bit in this paper we give a summary of the FĂschlĂĄr-News-Stories system in operation by following a scenario in which it is used and also outlining how the underlying system realises the functions it offers
On the Development of Adaptive and User-Centred Interactive Multimodal Interfaces
Multimodal systems have attained increased attention in recent years, which has made possible important
improvements in the technologies for recognition, processing, and generation of multimodal information.
However, there are still many issues related to multimodality which are not clear, for example, the
principles that make it possible to resemble human-human multimodal communication. This chapter
focuses on some of the most important challenges that researchers have recently envisioned for future
multimodal interfaces. It also describes current efforts to develop intelligent, adaptive, proactive, portable
and affective multimodal interfaces
- âŠ