1,005 research outputs found

    Activity detection in conversational sign language video for mobile telecommunication

    Get PDF
    The goal of the MobileASL project is to increase accessibility by making the mobile telecommunications network available to the signing Deaf community. Video cell phones enable Deaf users to communicate in their native language, American Sign Language (ASL). However, encoding and transmission of real-time video over cell phones is a powerintensive task that can quickly drain the battery. By recognizing activity in the conversational video, we can drop the frame rate during less important segments without significantly harming intelligibility, thus reducing the computational burden. This recognition must take place from video in real-time on a cell phone processor, on users that wear no special clothing. In this work, we quantify the power savings from droppin

    Evaluation of Motion Velocity as a Feature for Sign Language Detection

    Get PDF
    Popular video sharing websites contain a large collection of videos in various sign languages. These websites have the potential of being a significant source of knowledge sharing and communication for the members of the deaf and hard-of-hearing community. However, prior studies have shown that traditional keyword-based search does not do a good job of discovering these videos. Dr. Frank Shipman and others have been working towards building a distributed digital library by indexing the sign language videos available online. This system employs an automatic detector, based on visual features extracted from the video, for filtering out non-sign language content. Features such as the amount and location of hand movements, symmetry of motion etc. have been experimented with for this purpose. Caio Monteiro and his team designed a classifier which uses face detection to identify the region-of-interest (ROI) in a frame, and foreground segmentation to estimate amount of hand motion within the region. It was later improved upon by Karappa et al. by dividing the ROI using polar coordinates and estimating motion in each division to form a composite feature set. This thesis work examines another visual feature associated with the signing activity i.e. speed of hand movements. Speed based features performed better compared to the foreground-based features for a complex dataset of SL and non-SL videos. The F1 score showed a jump from 0.73 to 0.78. However, for a second dataset consisting of videos with single signers and static backgrounds, the classification scores dipped. More consistent performance improvements were observed when features from the two feature sets were used in conjunction. F1 score of 0.76 was observed for the complex dataset. For the second dataset, the F1 score changed from 0.85 to 0.86. Another associated problem is identifying the sign language in a video. The impact of speed of motion on the problem of classifying American Sign Language versus British Sign Language was found to be minimal. We concluded that it is the location of motion which influences this problem more than either the speed or the amount of motion. Non-speed related analyses of sign language detection were also explored. Since the American Sign Language alphabet is one-handed, it was expected that videos with left-handed signing might be falsely identified as British Sign Language, which has a two-handed alphabet. We briefly studied this issue with respect to our corpus of ASL and BSL videos and discovered that our classifier design does not suffer from this issue. Apart from this, we explored speeding up the classification process by computing symmetry of motion in the ROI on selected keyframes as a single feature for classification. The resulting feature extraction was significantly faster but the precision and recall values depreciated to 59% and 62% respectively for a F1 score of .61

    Autonomous interactive intermediaries : social intelligence for mobile communication agents

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2005.Includes bibliographical references (p. 151-167).Today's cellphones are passive communication portals. They are neither aware of our conversational settings, nor of the relationship between caller and callee, and often interrupt us at inappropriate times. This thesis is about adding elements of human style social intelligence to our mobile communication devices in order to make them more socially acceptable to both user and local others. I suggest the concept of an Autonomous Interactive Intermediary that assumes the role of an actively mediating party between caller, callee, and co-located people. In order to behave in a socially appropriate way, the Intermediary interrupts with non-verbal cues and attempts to harvest 'residual social intelligence' from the calling party, the called person, the people close by, and its current location. For example, the Intermediary obtains the user's conversational status from a decentralized network of autonomous body-worn sensor nodes. These nodes detect conversational groupings in real time, and provide the Intermediary with the user's conversation size and talk-to-listen ratio. The Intermediary can 'poll' all participants of a face-to-face conversation about the appropriateness of a possible interruption by slightly vibrating their wirelessly actuated finger rings.(cont.) Although the alerted people do not know if it is their own cellphone that is about to interrupt, each of them can veto the interruption anonymously by touching his/her ring. If no one vetoes, the Intermediary may interrupt. A user study showed significantly more vetoes during a collaborative group-focused setting than during a less group oriented setting. The Intermediary is implemented as a both a conversational agent and an animatronic device. The animatronics is a small wireless robotic stuffed animal in the form of a squirrel, bunny, or parrot. The purpose of the embodiment is to employ intuitive non-verbal cues such as gaze and gestures to attract attention, instead of ringing or vibration. Evidence suggests that such subtle yet public alerting by animatronics evokes significantly different reactions than ordinary telephones and are seen as less invasive by others present when we receive phone calls. The Intermediary is also a dual conversational agent that can whisper and listen to the user, and converse with a caller, mediating between them in real time.(cont.) The Intermediary modifies its conversational script depending on caller identity, caller and user choices, and the conversational status of the user. It interrupts and communicates with the user when it is socially appropriate, and may break down a synchronous phone call into chunks of voice instant messages.by Stefan Johannes Walter Marti.Ph.D

    MediaSync: Handbook on Multimedia Synchronization

    Get PDF
    This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences

    Developing a Sign Language Video Collection via Metadata and Video Classifiers

    Get PDF
    Video sharing sites have become a central tool for the storage and dissemination of sign language content. Sign language videos have many purposes, including sharing experiences or opinions, teaching and practicing a sign language, etc. However, due to limitations of term-based search, these videos can be hard to locate. This results in a diminished value of these sites for the deaf or hard-of-hearing community. As a result, members of the community frequently engage in a push-style delivery of content, sharing direct links to sign language videos with other members of the sign language community. To address this problem, we propose the Sign Language Digital Library (SLaDL). SLaDL is composed of two main sub-systems, a crawler that collects potential videos for inclusion into the digital library corpus, and an automatic classification system that detects and identifies sign language presence in the crawled videos. These components attempt to filter out videos that do not include sign language from the collection and to organize sign language videos based on different languages. This dissertation explores individual and combined components of the classification system. The components form a cascade of multimodal classifiers aimed at achieving high accuracy when classifying potential videos while minimizing the computational effort. A web application coordinates the execution of these two subsystems and enables user interaction (browsing and searching) with the library corpus. Since the collection of the digital library is automatically curated by the cascading classifier, the number of irrelevant results is expected to be drastically lower when compared to general-purpose video sharing sites. iii Video sharing sites have become a central tool for the storage and dissemination of sign language content. Sign language videos have many purposes, including sharing experiences or opinions, teaching and practicing a sign language, etc. However, due to limitations of term-based search, these videos can be hard to locate. This results in a diminished value of these sites for the deaf or hard-of-hearing community. As a result, members of the community frequently engage in a push-style delivery of content, sharing direct links to sign language videos with other members of the sign language community. To address this problem, we propose the Sign Language Digital Library (SLaDL). SLaDL is composed of two main sub-systems, a crawler that collects potential videos for inclusion into the digital library corpus, and an automatic classification system that detects and identifies sign language presence in the crawled videos. These components attempt to filter out videos that do not include sign language from the collection and to organize sign language videos based on different languages. This dissertation explores individual and combined components of the classification system. The components form a cascade of multimodal classifiers aimed at achieving high accuracy when classifying potential videos while minimizing the computational effort. A web application coordinates the execution of these two subsystems and enables user interaction (browsing and searching) with the library corpus. Since the collection of the digital library is automatically curated by the cascading classifier, the number of irrelevant results is expected to be drastically lower when compared to general-purpose video sharing sites. The evaluation involved a series of experiments focused on specific components of the system, and on analyzing how to best configure SLaDL. In the first set of experiments, we investigated three different crawling approaches, assessing how they compared in terms of both finding a large quantity of sign language videos and expanding the variety of videos in the collection. Secondly, we evaluated the performance of different approaches to multimodal classification in terms of precision, recall, F1 score, and computational costs. Lastly, we incorporated the best multimodal approach into cascading classifiers to reduce computation while preserving accuracy. We experimented with four different cascading configurations and analyzed their performance for the detection and identification of signed content. Given our findings of each experiment, we proposed the set up for an instantiation of SLaDL
    • …
    corecore