Searching audiovisual media with natural language queries

Oncescu, Andreea-Maria

Searching audiovisual media with natural language queries

Authors: Andreea-Maria Oncescu
Publication date: 10 June 2025
Publisher
Doi

Abstract

In the modern era, individuals are flooded with information. Developments in storage technology have enabled vast information from sources like entertainment media and historical archives to be preserved cheaply at scale. Consequently, there is a pressing need for approaches that efficiently search through this content. In this thesis, we leverage developments in deep learning and various data sources to advance capabilities of retrieval systems in understanding the interactions between text, audio and video content. We do this through three core contributions. First, we focus on text-video retrieval with natural queries, collecting and benchmarking a high-quality dataset with detailed, well-localised descriptions and corresponding long videos. We showcase the advantages of using multiple experts e.g. object and audio classification models to improve text-video retrieval performance. Second, we propose new benchmarks for semantic text-audio retrieval using free-form text. We employ state-of-the-art multimodal text-video retrieval models for this task and investigate how useful visual support is for finding the correct audio file. Additionally, we propose a large free-form text-audio dataset to aid with training of large text-audio models. Lastly, we investigate if text-audio retrieval models understand temporal ordering of sound events. We then propose a new contrastive loss term to guide the model to focus on temporal cues. Finally, we employ Large Language Models' (LLMs) understanding of the world to leverage large text-video datasets for text-audio understanding. We show that LLMs are capable of proposing plausible descriptions for video soundtracks, starting from the visual-based descriptions of the video content. This is important, as it can be used to scale up current text-audio retrieval datasets

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Supporting member

Oxford University Research Archive

oai:ora.ox.ac.uk:uuid:496a54b9...

Last time updated on 12/06/2025