233,301 research outputs found

    Language as the Medium: Multimodal Video Classification through text only

    Full text link
    Despite an exciting new wave of multimodal machine learning models, current approaches still struggle to interpret the complex contextual relationships between the different modalities present in videos. Going beyond existing methods that emphasize simple activities or objects, we propose a new model-agnostic approach for generating detailed textual descriptions that captures multimodal video information. Our method leverages the extensive knowledge learnt by large language models, such as GPT-3.5 or Llama2, to reason about textual descriptions of the visual and aural modalities, obtained from BLIP-2, Whisper and ImageBind. Without needing additional finetuning of video-text models or datasets, we demonstrate that available LLMs have the ability to use these multimodal textual descriptions as proxies for ``sight'' or ``hearing'' and perform zero-shot multimodal classification of videos in-context. Our evaluations on popular action recognition benchmarks, such as UCF-101 or Kinetics, show these context-rich descriptions can be successfully used in video understanding tasks. This method points towards a promising new research direction in multimodal classification, demonstrating how an interplay between textual, visual and auditory machine learning models can enable more holistic video understanding.Comment: Accepted at "What is Next in Multimodal Foundation Models?" (MMFM) workshop at ICCV 202

    The virtual path to academic transition: enabling international students to begin their transition to university study before they arrive

    No full text
    Institutions receiving international students for postgraduate study are now committing time and energy to the development of online transition resources to enable students to prepare for the demands of a different academic culture before they arrive. Important questions underlying such initiatives are identifying what kind of digital resources will both engage international students and be of most use to them in preparing for this transition, and how to effectively reach students. Current institutional initiatives are taking several forms. A popular model is to offer browsable advice/tips or FAQs about life and study at a particular institution together with, for example, video clips of other international students describing their experiences there. These may be open and web-hosted or accessible through a password protected area on an institutional website or VLE. Less commonly found are video and other media embedded in learning resources developed in the form of ‘learning objects’ which have been designed to offer key information through structured interactive learning activities supported with answers and feedback. Importantly, these also offer opportunities for language improvement at the same time since they are supported by help, feedback and transcripts. This case study focuses on a project to develop and deliver a pre-arrival online course of interactive learning resources for all incoming international students to one UK institution. Building on five years of experience in delivering pre-arrival, tutored online courses to pre-sessional course international students, the project team developed institution-specific learning objects and incorporated open resources from the website, ‘Prepare for Success’, developed by the same institution. The project seeks to deliver a self-access online course with three strands to it to address students’ concerns and needs. These are to prepare international students for the location in which they will be living and studying (the city of Southampton - its key features and amenities); to introduce them to practical aspects of British life and culture (e.g. setting up a bank account, shopping in a UK supermarket) and to familiarise them with key study skills and other aspects of UK academic culture which may present challenges for them (e.g. academic writing conventions; dealing with course reading lists). This paper will be of value to institutions embarking on similar ventures. It will describe the rationale for the online course; refer to the pedagogic approach taken; showcase course content, and report on the first phase of its delivery which begins in late spring 2011 <br/

    Language Learning and Interactive TV

    Get PDF
    The integration of engaging TV style content with the individualization and ‘intelligent’ content management offered by techniques from AI has the potential to provide learning environments that are both highly motivating and educationally sound. This paper describes why the area of language learning would be a particularly appropriate domain for interactive educational television to focus on. It also indicates some of the criteria to be fulfilled in order to provide optimal language learning conditions and how these might be satisfied using TV/Film content and techniques from AIED
    • …
    corecore