Search CORE

65 research outputs found

Comment: Equal Access Requires Full Captioning of Music and Song Lyrics for the Deaf and Hard of Hearing

Author: Choi Frances
Publication venue: Digital Commons at Loyola Marymount University and Loyola Law School
Publication date: 01/04/2017
Field of study

bepress Legal Repository

Loyola Marymount University

Speech Recognition System of Slovenian Broadcast News

Author: Sepesy Maučec Mirjam
Žgank Andrej
Publication venue: 'IntechOpen'
Publication date: 13/06/2011
Field of study

IntechOpen

Digital library of University of Maribor

TechNews digests: Jan - Mar 2010

Author
Publication venue: British Educational Communications and Technology Agency (BECTA)
Publication date: 01/01/2010
Field of study

TechNews is a technology, news and analysis service aimed at anyone in the education sector keen to stay informed about technology developments, trends and issues. TechNews focuses on emerging technologies and other technology news. TechNews service : digests september 2004 till May 2010 Analysis pieces and News combined publish every 2 to 3 month

Digital Education Resource Archive

Modeling of Filled Pauses and Onomatopoeas for Spontaneous Speech Recognition

Author: Andrej Zgank
Mirjam Sepesy Maucec
Publication venue: 'IntechOpen'
Publication date: 16/08/2010
Field of study

IntechOpen

Live Television in a Digital Library

Author: Roüast Maxime
Publication venue: 'University of Waikato'
Publication date: 08/11/2011
Field of study

Nowadays nearly everyone has access to digital television with a growing number of channels available for free. However due to the nature of broadcasting, this huge mass of information that reaches us is not, for the main part, organised—it is principally a succession of images and sound transmitted in a flow of data. Compare this with digital libraries which are powerful at organising a large but fixed set of documents. This project brings together these two concepts by concurrently capturing all the available live television channels, and segments them into files which are then imported into a digital video library. The system leverages off the information contained in the electronic program guide and the video recordings to generate metadata suitable for the digital library. By combining these two concepts together this way, the aim of this work is to look beyond what is currently available in the digital TV set top boxes on the market today and explore the full potential—unencumbered by commercial market constraints—to what the raw technology can provide

Research Commons@Waikato

First Steps Into Late-Deafness: An Introductory Manual For Newly Deafened Adults

Author: Shannon Candis
Publication venue
Publication date: 01/01/2006
Field of study

Thesis (M.A.) University of Alaska Fairbanks, 2006Late-deafened adults are individuals who lose their hearing in adolescence or adulthood. Whether the hearing loss is sudden or progressive, it forces immense psychosocial changes upon the individual, disrupting relationships and work, and impacting every area of the person's life. This manual serves as a guidebook for the newly deafened adult, giving her understanding, empathy and a road map to help make sense of the adjustment process. The first chapters detail what to expect during visits to the ear specialist and audiologist, and discuss the grieving process and the impact of deafness on identity formation. Information on how to develop new ways of communicating and how to build a support network is shared. An introduction to cochlear implantation, assistive technology, and legal rights for late-deafened adults follows. The manual closes with interviews of three late-deafened adults who share their journey into late-deafness

ScholarWorks@UA

Recommended from our members

Digital-high definition television servicing curriculum for Santa Ana Community College

Author: Schmidt David Glenn
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2002
Field of study

The purpose of this project was to develop a semester length community college curriculum for a course in the theory and servicing of digital-high definition television for the students in the service technology field of electronics at Santa Ana Community College in Santa Ana, California. Additionally, it is designed with the current electronic service industry in mind

CSUSB ScholarWorks

Signal processing for improved MPEG-based communication systems

Author: Eerenberg O.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2015
Field of study

Repository TU/e

Pure OAI Repository

Digital television applications

Author: Peng Chengyuan
Publication venue: Teknillinen korkeakoulu
Publication date: 15/11/2002
Field of study

Studying development of interactive services for digital television is a leading edge area of work as there is minimal research or precedent to guide their design. Published research is limited and therefore this thesis aims at establishing a set of computing methods using Java and XML technology for future set-top box interactive services. The main issues include middleware architecture, a Java user interface for digital television, content representation and return channel communications. The middleware architecture used was made up of an Application Manager, Application Programming Interface (API), a Java Virtual Machine, etc., which were arranged in a layered model to ensure the interoperability. The application manager was designed to control the lifecycle of Xlets; manage set-top box resources and remote control keys and to adapt the graphical device environment. The architecture of both application manager and Xlet forms the basic framework for running multiple interactive services simultaneously in future set-top box designs. User interface development is more complex for this type of platform (when compared to that for a desktop computer) as many constraints are set on the look and feel (e.g., TV-like and limited buttons). Various aspects of Java user interfaces were studied and my research in this area focused on creating a remote control event model and lightweight drawing components using the Java Abstract Window Toolkit (AWT) and Java Media Framework (JMF) together with Extensible Markup Language (XML). Applications were designed aimed at studying the data structure and efficiency of the XML language to define interactive content. Content parsing was designed as a lightweight software module based around two parsers (i.e., SAX parsing and DOM parsing). The still content (i.e., text, images, and graphics) and dynamic content (i.e., hyperlinked text, animations, and forms) can then be modeled and processed efficiently. This thesis also studies interactivity methods using Java APIs via a return channel. Various communication models are also discussed that meet the interactivity requirements for different interactive services. They include URL, Socket, Datagram, and SOAP models which applications can choose to use in order to establish a connection with the service or broadcaster in order to transfer data. This thesis is presented in two parts: The first section gives a general summary of the research and acts as a complement to the second section, which contains a series of related publications.reviewe

Aaltodoc Publication Archive

Multi-modal Video Content Understanding

Author: Iashin Vladimir
Publication venue: Tampere University
Publication date: 12/05/2023
Field of study

Video is an important format of information. Humans use videos for a variety of purposes such as entertainment, education, communication, information sharing, and capturing memories. To this date, humankind accumulated a colossal amount of video material online which is freely available. Manual processing at this scale is simply impossible. To this end, many research efforts have been dedicated to the automatic processing of video content. At the same time, human perception of the world is multi-modal. A human uses multiple senses to understand the environment and objects, and their interactions. When watching a video, we perceive the content via both audio and visual modalities, and removing one of these modalities results in less immersive experience. Similarly, if information in both modalities does not correspond, it may create a sense of dissonance. Therefore, joint modelling of multiple modalities (such as audio, visual, and text) within one model is an active research area. In the last decade, the fields of automatic video understanding and multi-modal modelling have seen exceptional progress due to the ubiquitous success of deep learning models and, more recently, transformer-based architectures in particular. Our work draws on these advances and pushes the state-of-the-art of multi-modal video understanding forward. Applications of automatic multi-modal video processing are broad and exciting! For instance, the content-based textual description of a video (video captioning) may allow a visually- or auditory-impaired person to understand the content and, thus, engage in brighter social interactions. However, prior work in video content description relies on the visual input alone, missing vital information only available in the audio stream. To this end, we proposed two novel multi-modal transformer models that encode audio and visual interactions simultaneously. More specifically, first, we introduced a late-fusion multi-modal transformer that is highly modular and allows the processing of an arbitrary set of modalities. Second, an efficient bi-modal transformer was presented to encode audio-visual cues starting from the lower network layers allowing more rich audio-visual features and stronger performance as a result. Another application is the automatic visually-guided sound generation that might help professional sound (foley) designers who spend hours searching a database for relevant audio for a movie scene. Previous approaches for automatic conditional audio generation support only one class (e. g. “dog barking”), while real-life applications may require generation for hundreds of data classes and one would need to train one model for every data class which can be infeasible. To bridge this gap, we introduced a novel two-stage model that, first, efficiently encodes audio as a set of codebook vectors (i. e. trains to make “building blocks”) and, then, learns to sample these audio vectors given visual inputs to make a relevant audio track for this visual input. Moreover, we studied the automatic evaluation of the conditional audio generation model and proposed metrics that measure both quality and relevance of the generated samples. Finally, as video editing is becoming more common among non-professionals due to the increased popularity of such services as YouTube, automatic assistance during video editing grows in demand, e. g. off-sync detection between audio and visual tracks. Prior work in audio-visual synchronization was devoted to solving the task on lip-syncing datasets with “dense” signals, such as interviews and presentations. In such videos, synchronization cues occur “densely” across time, and it is enough to process just a few tens of a second to synchronize the tracks. In contrast, opendomain videos mostly have only “sparse” cues that occur just once in a seconds-long video clip (e. g. “chopping wood”). To address this, we: a) proposed a novel dataset with “sparse” sounds; b) designed a model which can efficiently encode seconds-long audio-visual tracks in a small set of “learnable selectors” that is, then, used for synchronization. In addition, we explored the temporal artefacts that common audio and video compression algorithms leave in data streams. To prevent a model from learning to rely on these artefacts, we introduced a list of recommendations on how to mitigate them. This thesis provides the details of the proposed methodologies as well as a comprehensive overview of advances in relevant fields of multi-modal video understanding. In addition, we provide a discussion of potential research directions that can bring significant contributions to the field

Trepo - Institutional Repository of Tampere University