Multimodal Vision-Audio-Language Dataset

Abstract

<p>The Multimodal Vision-Audio-Language Dataset is a large-scale dataset for multimodal learning. It contains 2M video clips with corresponding audio and a textual description of the visual and auditory content. The dataset is an ensemble of existing datasets and fills the gap of missing modalities.</p><p>Details can be found in the attached report.</p><h3><strong>Annotation</strong></h3><p>The annotation files are provided as Parquet files. They can be read using Python and the <i>pandas</i> and <i>pyarrow</i> library.</p><p>The split into train, validation and test set follows the split of the original datasets.</p><h4><strong>Installation</strong></h4><blockquote><p>pip install pandas pyarrow</p></blockquote><h4><strong>Example</strong></h4><blockquote><p>import pandas as pd<br>df = pd.read_parquet('annotation_train.parquet', engine='pyarrow')<br>print(df.iloc[0])</p></blockquote><blockquote><p>dataset                  AudioSet </p><p>filename                train/---2_BBVHAA.mp3</p><p>captions_visual      [a man in a black hat and glasses.]</p><p>captions_auditory  [a man speaks and dishes clank.]</p><p>tags                       [Speech]</p></blockquote><h4><strong>Description</strong></h4><p>The annotation file consists of the following fields:<br><br><i>filename</i>: Name of the corresponding file (video or audio file)<br><i>dataset</i>: Source dataset associated with the data point<br><i>captions_visual</i>: A list of captions related to the visual content of the video. Can be NaN in case of no visual content<br><i>captions_auditory</i>: A list of captions related to the auditory content of the video<br><i>tags</i>: A list of tags, classifying the sound of a file. It can be NaN if no tags are provided</p><h3><strong>Data files</strong></h3><p>The raw data files for most datasets are not released due to licensing issues. They must be downloaded from the source. However, due to missing files, we provide them on request. Please contact us at [email protected]</p&gt

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 07/05/2024