Internet memes have gained significant influence in communicating political,
psychological, and sociocultural ideas. While memes are often humorous, there
has been a rise in the use of memes for trolling and cyberbullying. Although a
wide variety of effective deep learning-based models have been developed for
detecting offensive multimodal memes, only a few works have been done on
explainability aspect. Recent laws like "right to explanations" of General Data
Protection Regulation, have spurred research in developing interpretable models
rather than only focusing on performance. Motivated by this, we introduce {\em
MultiBully-Ex}, the first benchmark dataset for multimodal explanation from
code-mixed cyberbullying memes. Here, both visual and textual modalities are
highlighted to explain why a given meme is cyberbullying. A Contrastive
Language-Image Pretraining (CLIP) projection-based multimodal shared-private
multitask approach has been proposed for visual and textual explanation of a
meme. Experimental results demonstrate that training with multimodal
explanations improves performance in generating textual justifications and more
accurately identifying the visual evidence supporting a decision with reliable
performance improvements.Comment: EACL202