224,893 research outputs found

    Using X+V to construct a non-proprietary speech browser for a public-domain SpeechWeb

    Get PDF
    A SpeechWeb is a collection of hyperlinked speech applications that are distributed over the Internet. Users access the speech applications through remote browsers, which accept human-voice-input and return synthesized-voice-output. In previous research, a new architecture (LRRP) has been proposed, which is ideally suited for building a Public-Domain SpeechWeb. However, a non-proprietary speech browser is needed for this architecture. In this thesis, we have solved several limitations of X+V, a programming language for developing Multimodal applications, and we have used X+V to build a viable Public-Domain SpeechWeb browser. Our browser has the following properties: real-time human-machine speech interaction; ease of installation and use; acceptable speech-recognition accuracy in a suitable environment; no cost, non-proprietary, ease of distribution; use of common communication protocol---CGI; ease of creation of speech applications; possibility to deploy on mobile devices.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2006 .M31. Source: Masters Abstracts International, Volume: 45-01, page: 0360. Thesis (M.Sc.)--University of Windsor (Canada), 2006

    Overview of VideoCLEF 2009: New perspectives on speech-based multimedia content enrichment

    Get PDF
    VideoCLEF 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment. For each task, video data (Dutch-language television, predominantly documentaries) accompanied by speech recognition transcripts were provided. The Subject Classification Task involved automatic tagging of videos with subject theme labels. The best performance was achieved by approaching subject tagging as an information retrieval task and using both speech recognition transcripts and archival metadata. Alternatively, classifiers were trained using either the training data provided or data collected from Wikipedia or via general Web search. The Affect Task involved detecting narrative peaks, defined as points where viewers perceive heightened dramatic tension. The task was carried out on the “Beeldenstorm” collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes, elevated speaking pitch, increased speaking intensity and radical visual changes. The Linking Task, also called “Finding Related Resources Across Languages,” involved linking video to material on the same subject in a different language. Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language “Beeldenstorm” collection and were expected to return target pages drawn from English-language Wikipedia. The best performing methods used the transcript of the speech spoken during the multimedia anchor to build a query to search an index of the Dutch language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. Participants also experimented with pseudo-relevance feedback, query translation and methods that targeted proper names

    Object Referring in Visual Scene with Spoken Language

    Full text link
    Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is more natural. This paper investigates Object Referring with Spoken Language (ORSpoken) by presenting two datasets and one novel approach. Objects are annotated with their locations in images, text descriptions and speech descriptions. This makes the datasets ideal for multi-modality learning. The approach is developed by carefully taking down ORSpoken problem into three sub-problems and introducing task-specific vision-language interactions at the corresponding levels. Experiments show that our method outperforms competing methods consistently and significantly. The approach is also evaluated in the presence of audio noise, showing the efficacy of the proposed vision-language interaction methods in counteracting background noise.Comment: 10 pages, Submitted to WACV 201

    Technology-based rehabilitation to improve communication after acquired brain injury

    Full text link
    The utilization of technology has allowed for several advances in aphasia rehabilitation for individuals with acquired brain injury. Thirty-one previous studies that provide technology-based language or language and cognitive rehabilitation are examined in terms of the domains addressed, the types of treatments that were provided, details about the methods and the results, including which types of outcomes are reported. From this, we address questions about how different aspects of the delivery of treatment can influence rehabilitation outcomes, such as whether the treatment was standardized or tailored, whether the participants were prescribed homework or not, and whether intensity was varied. Results differed by these aspects of treatment delivery but ultimately the studies demonstrated consistent improvement on various outcome measures. With these aspects of technology-based treatment in mind, the ultimate goal of personalized rehabilitation is discussed.This project was funded by the Coulter Foundation for Translational Research. (Coulter Foundation for Translational Research

    Technology Policy, Gender, and Cyberspace

    Get PDF
    Event based sampling occurs when the time instants are measured everytime the amplitude passes certain pre-defined levels. This is in contrast with classical signal processing where the amplitude is measured at regular time intervals. The signal processing problem is to separate the signal component from noise in both amplitude and time domains. Event based sampling occurs in a variety of applications. The purpose here is to explain the new types of signal processing problems that occur, and identify the need for processing in both the time and event domains. We focus on rotating axles, where amplitude disturbances are caused by vibrations and time disturbances from measurement equipment. As one application, we examine tire pressure monitoring in cars where suppression of time disturbance is of utmost importance
    corecore