233,301 research outputs found
Language as the Medium: Multimodal Video Classification through text only
Despite an exciting new wave of multimodal machine learning models, current
approaches still struggle to interpret the complex contextual relationships
between the different modalities present in videos. Going beyond existing
methods that emphasize simple activities or objects, we propose a new
model-agnostic approach for generating detailed textual descriptions that
captures multimodal video information. Our method leverages the extensive
knowledge learnt by large language models, such as GPT-3.5 or Llama2, to reason
about textual descriptions of the visual and aural modalities, obtained from
BLIP-2, Whisper and ImageBind. Without needing additional finetuning of
video-text models or datasets, we demonstrate that available LLMs have the
ability to use these multimodal textual descriptions as proxies for ``sight''
or ``hearing'' and perform zero-shot multimodal classification of videos
in-context. Our evaluations on popular action recognition benchmarks, such as
UCF-101 or Kinetics, show these context-rich descriptions can be successfully
used in video understanding tasks. This method points towards a promising new
research direction in multimodal classification, demonstrating how an interplay
between textual, visual and auditory machine learning models can enable more
holistic video understanding.Comment: Accepted at "What is Next in Multimodal Foundation Models?" (MMFM)
workshop at ICCV 202
The virtual path to academic transition: enabling international students to begin their transition to university study before they arrive
Institutions receiving international students for postgraduate study are now committing time and energy to the development of online transition resources to enable students to prepare for the demands of a different academic culture before they arrive. Important questions underlying such initiatives are identifying what kind of digital resources will both engage international students and be of most use to them in preparing for this transition, and how to effectively reach students. Current institutional initiatives are taking several forms. A popular model is to offer browsable advice/tips or FAQs about life and study at a particular institution together with, for example, video clips of other international students describing their experiences there. These may be open and web-hosted or accessible through a password protected area on an institutional website or VLE. Less commonly found are video and other media embedded in learning resources developed in the form of ‘learning objects’ which have been designed to offer key information through structured interactive learning activities supported with answers and feedback. Importantly, these also offer opportunities for language improvement at the same time since they are supported by help, feedback and transcripts. This case study focuses on a project to develop and deliver a pre-arrival online course of interactive learning resources for all incoming international students to one UK institution. Building on five years of experience in delivering pre-arrival, tutored online courses to pre-sessional course international students, the project team developed institution-specific learning objects and incorporated open resources from the website, ‘Prepare for Success’, developed by the same institution. The project seeks to deliver a self-access online course with three strands to it to address students’ concerns and needs. These are to prepare international students for the location in which they will be living and studying (the city of Southampton - its key features and amenities); to introduce them to practical aspects of British life and culture (e.g. setting up a bank account, shopping in a UK supermarket) and to familiarise them with key study skills and other aspects of UK academic culture which may present challenges for them (e.g. academic writing conventions; dealing with course reading lists). This paper will be of value to institutions embarking on similar ventures. It will describe the rationale for the online course; refer to the pedagogic approach taken; showcase course content, and report on the first phase of its delivery which begins in late spring 2011 <br/
Recommended from our members
JuxtaLearn D3.2 Performance Framework
This deliverable, D3.2, for Work Package 3 incorporating the pedagogy from WP2 and orchestration factors mapped in D3.1 reviews aspects of performance in the context of participative video making. It reviews literature on curiosity and engagement characteristics of interaction mechanisms for public displays and anticipates requirements for social network analysis of relevant public videos from WP6 task 6.3. Thus, to support JuxtaLearn performance it proposes a reflective performance framework that encompasses the material environment and objects required, the participants, and the knowledge needed
Language Learning and Interactive TV
The integration of engaging TV style content with the individualization and ‘intelligent’ content management offered by techniques from AI has the potential to provide learning environments that are both highly motivating and educationally sound. This paper describes why the area of language learning would be a particularly appropriate domain for interactive educational television to focus on. It also indicates some of the criteria to be fulfilled in order to provide optimal language learning conditions and how these might be satisfied using TV/Film content and techniques from AIED
- …