1,024 research outputs found
Improving Deaf Accessibility to Web-based Multimedia
Internet technologies have expanded rapidly over the past two decades, making information of all sorts more readily available. Not only are they more cost-effective than traditional media, these new media have contributed to quality and convenience. However, proliferation of video and audio media on the internet creates an inadvertent disadvantage for deaf Internet users. Despite technological and legislative milestones in recent decades in making television and movies more accessible, there has been little progress with online access. A major obstacle to providing captions for internet media is the high cost of captioning and transcribing services.
To respond to this problem, a possible solution lies in automatic speech recognition (ASR). This research investigates possible solutions to Web accessibility through utilization of ASR technologies. It surveys previous studies that employ visualization and ASR to determine their effectiveness in the context of deaf accessibility. Since there was no existing literature indicating the area of greatest need, a preliminary study identified an application that would serve as a case study for applying and evaluating speech visualization technology. A total of 20 deaf and hard-of-hearing participants were interviewed via video phone and their responses in American Sign Language were transcribed to English.
The most common theme was concern over a lack of accessibility for online news. The second study evaluated different presentation strategies for making online news videos more accessible. A total of 95 participants viewed four different caption styles. Each style was presented on different news stories with control for content level and delivery. In addition to pre-test and post-test questionnaires, both performance and preference measures were conducted.
Results from the study offer emphatic support for the hypothesis that captioning the online videos makes the Internet more accessible to the deaf users. Furthermore, the findings lend strong evidence to the idea of utilizing automatic captions to make videos comprehensible to the deaf viewers at a fraction of the cost. The color-coded captions that used highlighting to reflect the accuracy ratings were found neither to be beneficial nor detrimental; however, when asked directly about the benefit of color-coding there was support for the concept. Further development and research will be necessary to find the appropriate solution
Sound Visualization for Deaf Assistance Using Mobile Computing
This thesis presents a new approach to the visualization of sound for deaf assistance that simultaneously illustrates important dynamic sound properties and the recognized sound icons in an easy readable view. .In order to visualize general sounds efficiently, the MFCC sound features was utilized to represent robust discriminant properties of the sound. The problem of visualizing MFCC vector that has 39 dimension was simplified by visualizing one-dimensional value, which is the result of comparing one reference MFCC vector with the input MFCC vector only. New similarity measure for MFCC feature vectors comparison was proposed that outperforms existing local similarity measures due to their problem of one to one attribute value calculation that leaded to incorrect similarity decisions. Classification of input sound was performed and attached to the visualizing system to make the system more usable for users. Each time frame of sound is put under K-NN classification algorithm to detect short sound events. In addition, every one second the input sound is buffered and forwarded to Dynamic Time Warping (DTW) classification algorithm which is designed for dynamic time series classification. Both classifiers works in the same time and deliver their classification results to the visualization model. The application of the system was implemented using Java programming language to work on smartphones that run Android OS, so many considerations related to the complexity of algorithms is taken into account. The system was implemented to utilize the capabilities of the smartphones GPU to guarantee the smoothness and fastness of the rendering. The system design was built based on interviews with five deaf persons taking into account their preferred visualizing system. In addition to that, the same deaf persons tested the system and the evaluation of the system is carried out based on their interaction with the system. Our approach yields more accessible illustrations of sound and more suitable for casual and little expert users
Data and methods for a visual understanding of sign languages
Signed languages are complete and natural languages used as the first or preferred mode of communication by millions of people worldwide. However, they, unfortunately, continue to be marginalized languages. Designing, building, and evaluating models that work on sign languages presents compelling research challenges and requires interdisciplinary and collaborative efforts. The recent advances in Machine Learning (ML) and Artificial Intelligence (AI) has the power to enable better accessibility to sign language users and narrow down the existing communication barrier between the Deaf community and non-sign language users. However, recent AI-powered technologies still do not account for sign language in their pipelines. This is mainly because sign languages are visual languages, that use manual and non-manual features to convey information, and do not have a standard written form. Thus, the goal of this thesis is to contribute to the development of new technologies that account for sign language by creating large-scale multimodal resources suitable for training modern data-hungry machine learning models and developing automatic systems that focus on computer vision tasks related to sign language that aims at learning better visual understanding of sign languages.
Thus, in Part I, we introduce the How2Sign dataset, which is a large-scale collection of multimodal and multiview sign language videos in American Sign Language. In Part II, we contribute to the development of technologies that account for sign languages by presenting in Chapter 4 a framework called Spot-Align, based on sign spotting methods, to automatically annotate sign instances in continuous sign language. We further present the benefits of this framework and establish a baseline for the sign language recognition task on the How2Sign dataset. In addition to that, in Chapter 5 we benefit from the different annotations and modalities of the How2Sign to explore sign language video retrieval by learning cross-modal embeddings. Later in Chapter 6, we explore sign language video generation by applying Generative Adversarial Networks to the sign language domain and assess if and how well sign language users can understand automatically generated sign language videos by proposing an evaluation protocol based on How2Sign topics and English translationLes llengües de signes són llengües completes i naturals que utilitzen milions de persones de tot el món com mode de comunicació primer o preferit. Tanmateix, malauradament, continuen essent llengües marginades. Dissenyar, construir i avaluar tecnologies que funcionin amb les llengües de signes presenta reptes de recerca que requereixen d’esforços interdisciplinaris i col·laboratius. Els avenços recents en l’aprenentatge automà tic i la intel·ligència artificial (IA) poden millorar l’accessibilitat tecnològica dels signants, i alhora reduir la barrera de comunicació existent entre la comunitat sorda i les persones no-signants. Tanmateix, les tecnologies més modernes en IA encara no consideren les llengües de signes en les seves interfÃcies amb l’usuari. Això es deu principalment a que les llengües de signes són llenguatges visuals, que utilitzen caracterÃstiques manuals i no manuals per transmetre informació, i no tenen una forma escrita està ndard. Els objectius principals d’aquesta tesi són la creació de recursos multimodals a gran escala adequats per entrenar models d’aprenentatge automà tic per a llengües de signes, i desenvolupar sistemes de visió per computador adreçats a una millor comprensió automà tica de les llengües de signes. AixÃ, a la Part I presentem la base de dades How2Sign, una gran col·lecció multimodal i multivista de vÃdeos de la llengua de signes nord-americana. A la Part II, contribuïm al desenvolupament de tecnologia per a llengües de signes, presentant al capÃtol 4 una solució per anotar signes automà ticament anomenada Spot-Align, basada en mètodes de localització de signes en seqüències contÃnues de signes. Després, presentem els avantatges d’aquesta solució i proporcionem uns primers resultats per la tasca de reconeixement de la llengua de signes a la base de dades How2Sign. A continuació, al capÃtol 5 aprofitem de les anotacions i diverses modalitats de How2Sign per explorar la cerca de vÃdeos en llengua de signes a partir de l’entrenament d’incrustacions multimodals. Finalment, al capÃtol 6, explorem la generació de vÃdeos en llengua de signes aplicant xarxes adversà ries generatives al domini de la llengua de signes. Avaluem fins a quin punt els signants poden entendre els vÃdeos generats automà ticament, proposant un nou protocol d’avaluació basat en les categories dins de How2Sign i la traducció dels vÃdeos a l’anglès escritLas lenguas de signos son lenguas completas y naturales que utilizan millones de personas
de todo el mundo como modo de comunicación primero o preferido. Sin embargo, desgraciadamente,
siguen siendo lenguas marginadas. Diseñar, construir y evaluar tecnologÃas
que funcionen con las lenguas de signos presenta retos de investigación que requieren
esfuerzos interdisciplinares y colaborativos. Los avances recientes en el aprendizaje automático
y la inteligencia artificial (IA) pueden mejorar la accesibilidad tecnológica de
los signantes, al tiempo que reducir la barrera de comunicación existente entre la comunidad
sorda y las personas no signantes. Sin embargo, las tecnologÃas más modernas en
IA todavÃa no consideran las lenguas de signos en sus interfaces con el usuario. Esto
se debe principalmente a que las lenguas de signos son lenguajes visuales, que utilizan
caracterÃsticas manuales y no manuales para transmitir información, y carecen de una
forma escrita estándar. Los principales objetivos de esta tesis son la creación de recursos
multimodales a gran escala adecuados para entrenar modelos de aprendizaje automático
para lenguas de signos, y desarrollar sistemas de visión por computador dirigidos a una
mejor comprensión automática de las lenguas de signos.
AsÃ, en la Parte I presentamos la base de datos How2Sign, una gran colección multimodal
y multivista de vÃdeos de lenguaje la lengua de signos estadounidense. En la Part II,
contribuimos al desarrollo de tecnologÃa para lenguas de signos, presentando en el capÃtulo
4 una solución para anotar signos automáticamente llamada Spot-Align, basada en
métodos de localización de signos en secuencias continuas de signos. Después, presentamos
las ventajas de esta solución y proporcionamos unos primeros resultados por la tarea
de reconocimiento de la lengua de signos en la base de datos How2Sign. A continuación,
en el capÃtulo 5 aprovechamos de las anotaciones y diversas modalidades de How2Sign
para explorar la búsqueda de vÃdeos en lengua de signos a partir del entrenamiento de
incrustaciones multimodales. Finalmente, en el capÃtulo 6, exploramos la generación
de vÃdeos en lengua de signos aplicando redes adversarias generativas al dominio de la
lengua de signos. Evaluamos hasta qué punto los signantes pueden entender los vÃdeos
generados automáticamente, proponiendo un nuevo protocolo de evaluación basado en
las categorÃas dentro de How2Sign y la traducción de los vÃdeos al inglés escrito.Teoria del Senyal i Comunicacion
Data and methods for a visual understanding of sign languages
Signed languages are complete and natural languages used as the first or preferred mode of communication by millions of people worldwide. However, they, unfortunately, continue to be marginalized languages. Designing, building, and evaluating models that work on sign languages presents compelling research challenges and requires interdisciplinary and collaborative efforts. The recent advances in Machine Learning (ML) and Artificial Intelligence (AI) has the power to enable better accessibility to sign language users and narrow down the existing communication barrier between the Deaf community and non-sign language users. However, recent AI-powered technologies still do not account for sign language in their pipelines. This is mainly because sign languages are visual languages, that use manual and non-manual features to convey information, and do not have a standard written form. Thus, the goal of this thesis is to contribute to the development of new technologies that account for sign language by creating large-scale multimodal resources suitable for training modern data-hungry machine learning models and developing automatic systems that focus on computer vision tasks related to sign language that aims at learning better visual understanding of sign languages.
Thus, in Part I, we introduce the How2Sign dataset, which is a large-scale collection of multimodal and multiview sign language videos in American Sign Language. In Part II, we contribute to the development of technologies that account for sign languages by presenting in Chapter 4 a framework called Spot-Align, based on sign spotting methods, to automatically annotate sign instances in continuous sign language. We further present the benefits of this framework and establish a baseline for the sign language recognition task on the How2Sign dataset. In addition to that, in Chapter 5 we benefit from the different annotations and modalities of the How2Sign to explore sign language video retrieval by learning cross-modal embeddings. Later in Chapter 6, we explore sign language video generation by applying Generative Adversarial Networks to the sign language domain and assess if and how well sign language users can understand automatically generated sign language videos by proposing an evaluation protocol based on How2Sign topics and English translationLes llengües de signes són llengües completes i naturals que utilitzen milions de persones de tot el món com mode de comunicació primer o preferit. Tanmateix, malauradament, continuen essent llengües marginades. Dissenyar, construir i avaluar tecnologies que funcionin amb les llengües de signes presenta reptes de recerca que requereixen d’esforços interdisciplinaris i col·laboratius. Els avenços recents en l’aprenentatge automà tic i la intel·ligència artificial (IA) poden millorar l’accessibilitat tecnològica dels signants, i alhora reduir la barrera de comunicació existent entre la comunitat sorda i les persones no-signants. Tanmateix, les tecnologies més modernes en IA encara no consideren les llengües de signes en les seves interfÃcies amb l’usuari. Això es deu principalment a que les llengües de signes són llenguatges visuals, que utilitzen caracterÃstiques manuals i no manuals per transmetre informació, i no tenen una forma escrita està ndard. Els objectius principals d’aquesta tesi són la creació de recursos multimodals a gran escala adequats per entrenar models d’aprenentatge automà tic per a llengües de signes, i desenvolupar sistemes de visió per computador adreçats a una millor comprensió automà tica de les llengües de signes. AixÃ, a la Part I presentem la base de dades How2Sign, una gran col·lecció multimodal i multivista de vÃdeos de la llengua de signes nord-americana. A la Part II, contribuïm al desenvolupament de tecnologia per a llengües de signes, presentant al capÃtol 4 una solució per anotar signes automà ticament anomenada Spot-Align, basada en mètodes de localització de signes en seqüències contÃnues de signes. Després, presentem els avantatges d’aquesta solució i proporcionem uns primers resultats per la tasca de reconeixement de la llengua de signes a la base de dades How2Sign. A continuació, al capÃtol 5 aprofitem de les anotacions i diverses modalitats de How2Sign per explorar la cerca de vÃdeos en llengua de signes a partir de l’entrenament d’incrustacions multimodals. Finalment, al capÃtol 6, explorem la generació de vÃdeos en llengua de signes aplicant xarxes adversà ries generatives al domini de la llengua de signes. Avaluem fins a quin punt els signants poden entendre els vÃdeos generats automà ticament, proposant un nou protocol d’avaluació basat en les categories dins de How2Sign i la traducció dels vÃdeos a l’anglès escritLas lenguas de signos son lenguas completas y naturales que utilizan millones de personas
de todo el mundo como modo de comunicación primero o preferido. Sin embargo, desgraciadamente,
siguen siendo lenguas marginadas. Diseñar, construir y evaluar tecnologÃas
que funcionen con las lenguas de signos presenta retos de investigación que requieren
esfuerzos interdisciplinares y colaborativos. Los avances recientes en el aprendizaje automático
y la inteligencia artificial (IA) pueden mejorar la accesibilidad tecnológica de
los signantes, al tiempo que reducir la barrera de comunicación existente entre la comunidad
sorda y las personas no signantes. Sin embargo, las tecnologÃas más modernas en
IA todavÃa no consideran las lenguas de signos en sus interfaces con el usuario. Esto
se debe principalmente a que las lenguas de signos son lenguajes visuales, que utilizan
caracterÃsticas manuales y no manuales para transmitir información, y carecen de una
forma escrita estándar. Los principales objetivos de esta tesis son la creación de recursos
multimodales a gran escala adecuados para entrenar modelos de aprendizaje automático
para lenguas de signos, y desarrollar sistemas de visión por computador dirigidos a una
mejor comprensión automática de las lenguas de signos.
AsÃ, en la Parte I presentamos la base de datos How2Sign, una gran colección multimodal
y multivista de vÃdeos de lenguaje la lengua de signos estadounidense. En la Part II,
contribuimos al desarrollo de tecnologÃa para lenguas de signos, presentando en el capÃtulo
4 una solución para anotar signos automáticamente llamada Spot-Align, basada en
métodos de localización de signos en secuencias continuas de signos. Después, presentamos
las ventajas de esta solución y proporcionamos unos primeros resultados por la tarea
de reconocimiento de la lengua de signos en la base de datos How2Sign. A continuación,
en el capÃtulo 5 aprovechamos de las anotaciones y diversas modalidades de How2Sign
para explorar la búsqueda de vÃdeos en lengua de signos a partir del entrenamiento de
incrustaciones multimodales. Finalmente, en el capÃtulo 6, exploramos la generación
de vÃdeos en lengua de signos aplicando redes adversarias generativas al dominio de la
lengua de signos. Evaluamos hasta qué punto los signantes pueden entender los vÃdeos
generados automáticamente, proponiendo un nuevo protocolo de evaluación basado en
las categorÃas dentro de How2Sign y la traducción de los vÃdeos al inglés escrito.Postprint (published version
Recommended from our members
Enactivism and ethnomethodological conversation analysis as tools for expanding Universal Design for Learning: the case of visually impaired mathematics students
Blind and visually impaired mathematics students must rely on accessible materials such as tactile diagrams to learn mathematics. However, these compensatory materials are frequently found to offer students inferior opportunities for engaging in mathematical practice and do not allow sensorily heterogenous students to collaborate. Such prevailing problems of access and interaction are central concerns of Universal Design for Learning (UDL), an engineering paradigm for inclusive participation in cultural praxis like mathematics. Rather than directly adapt existing artifacts for broader usage, UDL process begins by interrogating the praxis these artifacts serve and then radically re-imagining tools and ecologies to optimize usability for all learners. We argue for the utility of two additional frameworks to enhance UDL efforts: (a) enactivism, a cognitive-sciences view of learning, knowing, and reasoning as modal activity; and (b) ethnomethodological conversation analysis (EMCA), which investigates participants’ multimodal methods for coordinating action and meaning. Combined, these approaches help frame the design and evaluation of opportunities for heterogeneous students to learn mathematics collaboratively in inclusive classrooms by coordinating perceptuo-motor solutions to joint manipulation problems. We contextualize the thesis with a proposal for a pluralist design for proportions, in which a pair of students jointly operate an interactive technological device
VizWiz
The lack of access to visual information like text labels, icons,and colors can cause frustration and decrease independence for blind people. Current access technology uses automatic approaches to address some problems in this space, but the technology is error-prone, limited in scope, and quite expensive. In this paper, we introduce VizWiz, a talking application for mobile phones that offers a new alternative to answering visual questions in nearly real-time—asking multiple people on the web. To support answering questions quickly, we introduce a general approach for intelligently recruiting human workers in advance called quikTurkit so that workers are available when new questions arrive. A field deployment with 11 blind participants illustrates that blind people can effectively use VizWiz to cheaply answer questions in their everyday lives, highlighting issues that automatic approaches will need to address to be useful. Finally, we illustrate the potential of using VizWiz as part of the participatory design of advanced tools by using it to build and evaluate VizWiz::LocateIt, an interactive mobile tool that helps blind people solve general visual search problems
Rafigh: A Living Media System for Motivating Target Application Use for Children
Digital living media systems combine living media such as plants, animals and fungi with computational components. In this dissertation, I respond to the question of how can digital living media systems better motivate children to use target applications (i.e., learning and/or therapeutic applications)? To address this question, I employed a participatory design approach where I incorporated input from children, parents, speech language pathologists and teachers into the design of a new system. Rafigh is a digital embedded system that uses the growth of a living mushrooms colony to provide positive reinforcements to children when they conduct target activities. The growth of the mushrooms is affected by the amount of water administered to them, which in turn corresponds to the time children spend on target applications.
I used an iterative design process to develop and evaluate three Rafigh prototypes. The evaluations showed that the system must be robust, customizable, and should include compelling engagement mechanisms to keep the children interested. I evaluated Rafigh using two case studies conducted in participants homes. In each case study, two siblings and their parent interacted with Rafigh over two weeks and the parents identified a series of target applications that Rafigh should motivate the children to use. The study showed that Rafigh motivated the children to spend significantly more time on target applications during the intervention phase and that it successfully engaged one out of two child participants in each case study who showed signs of responsibility, empathy and curiosity towards the living media. The study showed that the majority of participants described the relationship between using target applications and mushrooms growth correctly. Further, Rafigh encouraged more communication and collaboration between the participants. Rafighs slow responsivity did not impact the engagement of one out of two child participants in each case study and might even have contributed to their investment in the project. Finally, Rafighs presence as an ambient physical object allowed users to interact with it freely and as part of their home environment
Linguistically Motivated Sign Language Segmentation
Sign language segmentation is a crucial task in sign language processing
systems. It enables downstream tasks such as sign recognition, transcription,
and machine translation. In this work, we consider two kinds of segmentation:
segmentation into individual signs and segmentation into phrases, larger units
comprising several signs. We propose a novel approach to jointly model these
two tasks.
Our method is motivated by linguistic cues observed in sign language corpora.
We replace the predominant IO tagging scheme with BIO tagging to account for
continuous signing. Given that prosody plays a significant role in phrase
boundaries, we explore the use of optical flow features. We also provide an
extensive analysis of hand shapes and 3D hand normalization.
We find that introducing BIO tagging is necessary to model sign boundaries.
Explicitly encoding prosody by optical flow improves segmentation in shallow
models, but its contribution is negligible in deeper models. Careful tuning of
the decoding algorithm atop the models further improves the segmentation
quality.
We demonstrate that our final models generalize to out-of-domain video
content in a different signed language, even under a zero-shot setting. We
observe that including optical flow and 3D hand normalization enhances the
robustness of the model in this context.Comment: Accepted at EMNLP 2023 (Findings
Integrating Visual Thinking of Hearing Impaired Students in Inclusive Classroom
Education in inclusive classes is a national alternative program in the context of fulfilling education rights based on a non-discriminatory spirit. However, in the implementation of education, there are problems that students with disabilities have in the difficulties of getting an inclusive school that meets their needs. Even schools that are considered inclusive can only fulfill 5% of the obligations required by the government for disabled students, without the provision of adequate facilities and infrastructure, and only students with disabilities who have moderate or mild disabilities can be educated in these inclusive schools. The method in this research uses qualitative research with basic research in the form of analyzing the principles in education, therefore this research is not directly useful in practice but takes time to see the results. This study also uses primary and secondary data sources, and data collection techniques by observation, in-depth interviews, documentation and research instruments. While the data analysis technique uses qualitative explorer, triangulation, and content analysis. The first finding of the research shows that visual thinking is a method that can facilitate hearing impaired students in learning Islamic education in full regular inclusive classes. This is evidenced from the academic achievement of hearing impaired students, including the ability to remember, the ability to think, the ability to find solutions, the ability to express opinions. The second finding shows that integrating visual thinking method that implements students' needs through the approach of the collaboration visual thinking method and the visual thinking strategies approach, as learning solutions that move students actively as subjects and objects of learning. Keywords: Integrating Visual Thinking, hearing impaired students, Inclusive Educatio
Paralinguistic vocal control of interactive media: how untapped elements of voice might enhance the role of non-speech voice input in the user's experience of multimedia.
Much interactive media development, especially commercial development, implies the dominance of the visual modality, with sound as a limited supporting channel. The development of multimedia technologies such as augmented reality and virtual reality has further revealed a distinct partiality to visual media. Sound, however, and particularly voice, have many aspects which have yet to be adequately investigated. Exploration of these aspects may show that sound can, in some respects, be superior to graphics in creating immersive and expressive interactive experiences. With this in mind, this thesis investigates the use of non-speech voice characteristics as a complementary input mechanism in controlling multimedia applications. It presents a number of projects that employ the paralinguistic elements of voice as input to interactive media including both screen-based and physical systems. These projects are used as a means of exploring the factors that seem likely to affect users’ preferences and interaction patterns during non-speech voice control. This exploration forms the basis for an examination of potential roles for paralinguistic voice input. The research includes the conceptual and practical development of the projects and a set of evaluative studies. The work submitted for Ph.D. comprises practical projects (50 percent) and a written dissertation (50 percent). The thesis aims to advance understanding of how voice can be used both on its own and in combination with other input mechanisms in controlling multimedia applications. It offers a step forward in the attempts to integrate the paralinguistic components of voice as a complementary input mode to speech input applications in order to create a synergistic combination that might let the strengths of each mode overcome the weaknesses of the other
- …