Multimodal Vision-Audio-Language Dataset

Choksi, Bhavin; Roig, Gemma; Schaumlöffel, Timothy

Multimodal Vision-Audio-Language Dataset

Authors: Bhavin Choksi
Gemma Roig
Timothy Schaumlöffel
Publication date: 31 October 2023
Publisher: Zenodo
Doi

Abstract

The Multimodal Vision-Audio-Language Dataset is a large-scale dataset for multimodal learning. It contains 2M video clips with corresponding audio and a textual description of the visual and auditory content. The dataset is an ensemble of existing datasets and fills the gap of missing modalities.Details can be found in the attached report.<h3>Annotation</h3>The annotation files are provided as Parquet files. They can be read using Python and the pandas and pyarrow library.The split into train, validation and test set follows the split of the original datasets.<h4>Installation</h4><blockquote>pip install pandas pyarrow</blockquote><h4>Example</h4><blockquote>import pandas as pd df = pd.read_parquet('annotation_train.parquet', engine='pyarrow') print(df.iloc[0])</blockquote><blockquote>dataset                  AudioSet filename                train/---2_BBVHAA.mp3captions_visual      [a man in a black hat and glasses.]captions_auditory  [a man speaks and dishes clank.]tags                       [Speech]</blockquote><h4>Description</h4>The annotation file consists of the following fields: filename: Name of the corresponding file (video or audio file) dataset: Source dataset associated with the data point captions_visual: A list of captions related to the visual content of the video. Can be NaN in case of no visual content captions_auditory: A list of captions related to the auditory content of the video tags: A list of tags, classifying the sound of a file. It can be NaN if no tags are provided<h3>Data files</h3>The raw data files for most datasets are not released due to licensing issues. They must be downloaded from the source. However, due to missing files, we provide them on request. Please contact us at [email protected]</p&gt

Similar works

Full text

Available Versions

ZENODO

oai:zenodo.org:10060785

Last time updated on 07/05/2024