4,191 research outputs found

    YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus

    Full text link
    Machine learning for sign languages is bottlenecked by data. In this paper, we present YouTube-ASL, a large-scale, open-domain corpus of American Sign Language (ASL) videos and accompanying English captions drawn from YouTube. With ~1000 hours of videos and >2500 unique signers, YouTube-ASL is ~3x as large and has ~10x as many unique signers as the largest prior ASL dataset. We train baseline models for ASL to English translation on YouTube-ASL and evaluate them on How2Sign, where we achieve a new finetuned state of the art of 12.39 BLEU and, for the first time, report zero-shot results

    AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

    Full text link
    Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences. For movies, this presents notable challenges -- AD must occur only during existing pauses in dialogue, should refer to characters by name, and ought to aid understanding of the storyline as a whole. To this end, we develop a new model for automatically generating movie AD, given CLIP visual features of the frames, the cast list, and the temporal locations of the speech; addressing all three of the 'who', 'when', and 'what' questions: (i) who -- we introduce a character bank consisting of the character's name, the actor that played the part, and a CLIP feature of their face, for the principal cast of each movie, and demonstrate how this can be used to improve naming in the generated AD; (ii) when -- we investigate several models for determining whether an AD should be generated for a time interval or not, based on the visual content of the interval and its neighbours; and (iii) what -- we implement a new vision-language model for this task, that can ingest the proposals from the character bank, whilst conditioning on the visual features using cross-attention, and demonstrate how this improves over previous architectures for AD text generation in an apples-to-apples comparison.Comment: ICCV2023. Project page: https://www.robots.ox.ac.uk/vgg/research/autoad

    Using serious games for learning sign language combining video, enhanced interactivity and VR technology

    Get PDF
    One in every six persons in the UK suffers a hearing loss, either as a condition they have been born with or they disordered they acquired during their life. 900,000 people in the UK are severely or profoundly deaf and based on a study by Action On Hearing Loss UK in 2013 only 17 percent of this population, can use the British Sign Language (BSL). That leaves a massive proportion of people with a hearing impediment who do not use sign language struggling in social interaction and suffering from emotional distress, and an even larger proportion of Hearing people who cannot communicate with those of the deaf community. This paper presents a serious game (SG) that aims to close the communication gap between able hearing people and people with a hearing impediment by providing a tool that facilitates BSL learning targeting adult population. The paper presents the theoretical framework supporting adult learning based on which a SG game using Virtual Reality (VR) technology has been developed. It explains the experimental framework of the study and presents the creation of the research instruments to facilitate the study comprising of a SG that integrates video and conventional video based educational material. It reports and analyses the study results that demonstrate the advantage of the SG in effectively supporting users learning a set of BSL signs and it presents qualitative outcomes that inform the further development of the game to serve learning needs. The paper closes with conclusions, directions for further development of this educational resource and future studies

    Using Serious Games for Learning British Sign Language Combining Video, Enhanced Interactivity, and VR Technology

    Get PDF
    One in every six persons in the UK suffers a hearing loss, either as a condition they have been born with or they disordered they acquired during their life. 900,000 people in the UK are severely or profoundly deaf and based on a study by Action On Hearing Loss UK in 2013 only 17 percent of this population, can use the British Sign Language (BSL). That leaves a massive proportion of people with a hearing impediment who do not use sign language struggling in social interaction and suffering from emotional distress, and an even larger proportion of Hearing people who cannot communicate with those of the deaf community. This paper presents a serious game (SG) that aims to close the communication gap between able hearing people and people with a hearing impediment by providing a tool that facilitates BSL learning targeting adult population. The paper presents the theoretical framework supporting adult learning based on which a SG game using Virtual Reality (VR) technology has been developed. It explains the experimental framework of the study and presents the creation of the research instruments to facilitate the study comprising of a SG that integrates video and conventional video based educational material. It reports and analyses the study results that demonstrate the advantage of the SG in effectively supporting users learning a set of BSL signs and it presents qualitative outcomes that inform the further development of the game to serve learning needs. The paper closes with conclusions, directions for further development of this educational resource and future studies

    Towards Student Engagement Analytics: Applying Machine Learning to Student Posts in Online Lecture Videos

    Get PDF
    The use of online learning environments in higher education is becoming ever more prevalent with the inception of MOOCs (Massive Open Online Courses) and the increase in online and flipped courses at universities. Although the online systems used to deliver course content make education more accessible, students often express frustration with the lack of assistance during online lecture videos. Instructors express concern that students are not engaging with the course material in online environments, and rely on affordances within these systems to figure out what students are doing. With many online learning environments storing log data about students usage of these systems, research into learning analytics, the measurement, collection, analysis, and reporting data about learning and their contexts, can help inform instructors about student learning in the online context. This thesis aims to lay the groundwork for learning analytics that provide instructors high-level student engagement data in online learning environments. Recent research has shown that instructors using these systems are concerned about their lack of awareness about student engagement, and educational psychology has shown that engagement is necessary for student success. Specifically, this thesis explores the feasibility of applying machine learning to categorize student posts by their level of engagement. These engagement categories are derived from the ICAP framework, which categorizes overt student behaviors into four tiers of engagement: Interactive, Constructive, Active, and Passive. Contributions include showing what natural language features are most indicative of engagement, exploring whether this machine learning method can be generalized to many courses, and using previous research to develop mockups of what analytics using data from this machine learning method might look like

    TPA-Net: Generate A Dataset for Text to Physics-based Animation

    Full text link
    Recent breakthroughs in Vision-Language (V&L) joint research have achieved remarkable results in various text-driven tasks. High-quality Text-to-video (T2V), a task that has been long considered mission-impossible, was proven feasible with reasonably good results in latest works. However, the resulting videos often have undesired artifacts largely because the system is purely data-driven and agnostic to the physical laws. To tackle this issue and further push T2V towards high-level physical realism, we present an autonomous data generation technique and a dataset, which intend to narrow the gap with a large number of multi-modal, 3D Text-to-Video/Simulation (T2V/S) data. In the dataset, we provide high-resolution 3D physical simulations for both solids and fluids, along with textual descriptions of the physical phenomena. We take advantage of state-of-the-art physical simulation methods (i) Incremental Potential Contact (IPC) and (ii) Material Point Method (MPM) to simulate diverse scenarios, including elastic deformations, material fractures, collisions, turbulence, etc. Additionally, high-quality, multi-view rendering videos are supplied for the benefit of T2V, Neural Radiance Fields (NeRF), and other communities. This work is the first step towards fully automated Text-to-Video/Simulation (T2V/S). Live examples and subsequent work are at https://sites.google.com/view/tpa-net

    Generative Disco: Text-to-Video Generation for Music Visualization

    Full text link
    Visuals are a core part of our experience of music, owing to the way they can amplify the emotions and messages conveyed through the music. However, creating music visualization is a complex, time-consuming, and resource-intensive process. We introduce Generative Disco, a generative AI system that helps generate music visualizations with large language models and text-to-image models. Users select intervals of music to visualize and then parameterize that visualization by defining start and end prompts. These prompts are warped between and generated according to the beat of the music for audioreactive video. We introduce design patterns for improving generated videos: "transitions", which express shifts in color, time, subject, or style, and "holds", which encourage visual emphasis and consistency. A study with professionals showed that the system was enjoyable, easy to explore, and highly expressive. We conclude on use cases of Generative Disco for professionals and how AI-generated content is changing the landscape of creative work

    Exploring a Culture of Learning with Technology: An Ethnographic Content Analysis of the Activity of Learning with Educational iPad Apps

    Get PDF
    This study explored the culture of learning with educational iPad apps using activity theory as a guiding framework. First, the top nine educational apps were tracked in the Top Charts section of Apple’s App Store for a duration of four months. The nine sampled apps, selected based on their frequency of appearance, included Toca Hair Salon 2, Stack the States, Endless Alphabet, Mickey Mouse Clubhouse: Wildlife Count Along, Wild Kratts Creature Power World Adventure, Wallykazam! Letter and Word Magic, Starfall Learn to Read, Dr. Panda’s Restaurant 2, and Bug Art. The descriptions, version updates, app content, and customer reviews for each app were digitized, coded, and analyzed in Dedoose using the Activity Checklist. Additionally instructional analysis diagrams were developed to provide insight into the user interface and actions. Results of the study were presented in the form of nine portraits. The overview and relevant instructional characteristics were detailed for each app. The final chapter examined the broader implications of the app experience. The technology, the instruction, the adult guide, and the App Store were identified as mediating factors that contributed to the dynamic app culture

    EBook Exploration: How EBooks Support Emergent Literacy

    Get PDF
    Abstract This research study explores how eBooks support young children’s emergent literacy development. Specifically, it focuses on what kinds and modes are available in eBooks for young children, how eBooks motivate or engage students to read and write and how they support students’ decoding and comprehension skills through a home-based qualitative active inquiry. This study took place during hour long tutoring sessions held twice per week with two elementary aged siblings in an Upstate New York middle class home. The collected data included informal and field notes, student artifacts, comprehension conversations, and student interviews. One student enjoyed reading the eBooks and was motivated by them while the other enjoyed reading paper books better and was not motivated by the eBooks. It was found that some features of eBooks support student’s decoding and comprehension, while some modes of eBooks did not. Pre-teaching of eReader features and previewing the eBook help student comprehend the stories. Student comprehension was aided by the narration features of the eReaders, however animations in TumbleBooks interfered with one student’s comprehension. Use of the Table of Contents and picture cues also contributed to their understanding of eBooks. Finding an eBook at Student One’s reading level was challenging. Both students lost track of the words on the page at times. Technological issues interfered with book reading several times. The Read to Me narration options helped both students with word decoding, especially the beginning reader. More research is needed on how eBooks support student’s decoding and on how beneficial the narration features on eBooks are to beginning readers
    • …
    corecore