Leveraging Multimodal Shapley Values to Address Multimodal Collapse and Improve Fine-Grained E-Commerce Product Classification

Abstract

Multimodal models can experience multimodal collapse, leading to sub-optimal performance on tasks like fine-grained e-commerce product classification. To address this, we introduce an approach that leverages multimodal Shapley values (MM-SHAP) to quantify the individual contributions of each modality to the model's predictions. By employing weighted stacked ensembles of unimodal and multimodal models, with weights derived from these Shapley values (MM-SHAP), we enhance the overall performance and mitigate the effects of multimodal collapse. Using this approach we improve previous results (F1-score) from 0.67 to 0.79

Similar works

This paper was published in Royal Holloway - Pure.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: info:eu-repo/semantics/openAccess