3 research outputs found
Enhancing Personalised Recommendations with the Use of Multimodal Information
Whenever we watch a TV show or movie, we process a substantial amount of information that is conveyed to us via various multimedia mediums, in particular: visual, textual, and audio. These data signify distinctive properties that aid in creating a unique motion picture experience. In effort to not only produce a more personalised recommender system, but also tackle the problem of popularity bias, we develop a system that incorporates the use of multimodal information. Specifically, we investigate the correlation between features that are extracted using state of the art techniques and deep learning models from visual characteristics, audio patterns and subtitles. The framework is evaluated on a dataset comprising of 145 BBC TV programmes against genre and user baselines. We demonstrate that personalised recommendations can not only be improved with the use of multimodal information, but also outperform genre and user-based models in terms of diversity, whilst maintaining matching levels of accuracy
Audiovisual, Genre, Neural and Topical Textual Embeddings for TV Programme Content Representation
TV programmes have their contents described by multiple means: textual subtitles, audiovisual files, and metadata such as genres. In order to represent these contents, we develop vectorial representations for their low-level multimodal features, group them with simple clustering techniques, and combine them using middle and late fusion. For textual features, we use LSI and Doc2Vec neural embeddings; for audio, MFCC's and Bags of Audio Words; for visual, SIFT, and Bags of Visual Words. We apply our model to a dataset of BBC TV programmes and use a standard recommender and pairwise similarity matrices of content vectors to estimate viewers' behaviours. The late fusion of genre, audio and video vectors with both of the textual embeddings significantly increase the precision and diversity of the results