Search CORE

4 research outputs found

Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization

Author: Aßenmacher Matthias
Deiseroth Björn
Eichenberg Constantin
Gritsch Nikolas
Kersting Kristian
Meuer Max
Schramowski Patrick
Publication venue
Publication date: 13/11/2023
Field of study

Large Language Models (LLMs) have reshaped natural language processing with their impressive capabilities. Their ever-increasing size, however, raised concerns about their effective deployment and the need for LLM compressions. This study introduces the Divergent Token metrics (DTMs), a novel approach for assessing compressed LLMs, addressing the limitations of traditional perplexity or accuracy measures that fail to accurately reflect text generation quality. DTMs focus on token divergence, that allow deeper insights into the subtleties of model compression, i.p. when evaluating component's impacts individually. Utilizing the First Divergent Token metric (FDTM) in model sparsification reveals that a quarter of all attention components can be pruned beyond 90% on the Llama-2 model family, still keeping SOTA performance. For quantization FDTM suggests that over 80% of parameters can naively be transformed to int8 without special outlier management. These evaluations indicate the necessity of choosing appropriate compressions for parameters individually-and that FDTM can identify those-while standard metrics result in deteriorated outcomes

arXiv.org e-Print Archive

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

Author: Baldock Robert
Bellagente Marco
Brack Manuel
Cruz-Salinas Andres Felipe
Dai Andrew
Deiseroth Björn
Eichenberg Constantin
Friedrich Felix
Kersting Kristian
Nanda Souradeep
Oostermeijer Koen
Schramowski Patrick
Teufel Hannah
Weinbach Samuel
Publication venue
Publication date: 24/05/2023
Field of study

The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. MutliFusion leverages pre-trained models and aligns them for integration into a cohesive system, thereby avoiding the need for extensive training from scratch. Our experimental results demonstrate the efficient transfer of capabilities from individual modules to the downstream model. Specifically, the fusion of all independent components allows the image generation module to utilize multilingual, interleaved multimodal inputs despite being trained solely on monomodal data in a single language

arXiv.org e-Print Archive

Euroscepticism and Abstention

Crossref