Search CORE

9 research outputs found

Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity

Author: Elbayad Maha
Goswami Vedanuj
Maillard Jean
Murray Kenton
Xu Haoran
Publication venue
Publication date: 22/10/2023
Field of study

Mixture-of-experts (MoE) models that employ sparse activation have demonstrated effectiveness in significantly increasing the number of parameters while maintaining low computational requirements per token. However, recent studies have established that MoE models are inherently parameter-inefficient as the improvement in performance diminishes with an increasing number of experts. We hypothesize this parameter inefficiency is a result of all experts having equal capacity, which may not adequately meet the varying complexity requirements of different tokens or tasks. In light of this, we propose Stratified Mixture of Experts (SMoE) models, which feature a stratified structure and can assign dynamic capacity to different tokens. We demonstrate the effectiveness of SMoE on three multilingual machine translation benchmarks, containing 4, 15, and 94 language pairs, respectively. We show that SMoE outperforms multiple state-of-the-art MoE models with the same or fewer parameters.Comment: Accepted at Findings of EMNLP 202

arXiv.org e-Print Archive

Unsupervised image-to-video clothing transfer

Author: De La Torre Fernando
Goswami Vedanuj
Moreno-Noguer Francesc
Pumarola Peris Albert
Vicente Francisco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We present a system to photo-realistically transfer the clothing of a person in a reference image into another person in an unconstrained image or video. Our architecture is based on a GAN equipped with a physical memory that updates an initially incomplete texture map of the clothes that is progressively completed with the new inferred occluded parts. The system is trained in an unsupervised manner. The results are visually appealing and open the possibility to be used in the future as a quick virtual try-on clothing system.Peer Reviewe

UPCommons. Portal del coneixement obert de la UPC

The Hateful Memes Challenge: Competition Report

Author: Antoniou Georgios
Bull Peter
Chandra Shantanu
Escalante Hugo Jair
Firooz Hamed
Fitzpatrick Casey A.
Goswami Vedanuj
Hofmann Katja
Holla Nithin
Kiela Douwe
Lippe Phillip
Lipstein Greg
Mohan Aravind
Muennighoff Niklas
Nelli Tony
Ozertem Umut
Pantel Patrick
Parikh Devi
Rajamanickam Santhosh
Rose Jewgeni
Sandulescu Vlad
Shutova Ekaterina
Singh Amanpreet
Specia Lucia
Velioglu Riza
Yannakoudakis Helen
Zhu Ron
Publication venue: PMLR
Publication date: 01/01/2021
Field of study

Kiela D, Firooz H, Mohan A, et al. The Hateful Memes Challenge: Competition Report. In: Escalante HJ, Hofmann K, eds. Proceedings of the NeurIPS 2020 Competition and Demonstration Track. Proceedings of Machine Learning Research. Vol 133. PMLR; 2021: 344-360.Machine learning and artificial intelligence play an ever more crucial role in mitigating important societal problems, such as the prevalence of hate speech. We describe the Hateful Memes Challenge competition, held at NeurIPS 2020, focusing on multimodal hate speech. The aim of the challenge is to facilitate further research into multimodal reasoning and understanding

Publications at Bielefeld University