4 research outputs found
Divergent Token Metrics: Measuring degradation to prune away LLM components -- and optimize quantization
Large Language Models (LLMs) have reshaped natural language processing with
their impressive capabilities. Their ever-increasing size, however, raised
concerns about their effective deployment and the need for LLM compressions.
This study introduces the Divergent Token metrics (DTMs), a novel approach for
assessing compressed LLMs, addressing the limitations of traditional perplexity
or accuracy measures that fail to accurately reflect text generation quality.
DTMs focus on token divergence, that allow deeper insights into the subtleties
of model compression, i.p. when evaluating component's impacts individually.
Utilizing the First Divergent Token metric (FDTM) in model sparsification
reveals that a quarter of all attention components can be pruned beyond 90% on
the Llama-2 model family, still keeping SOTA performance. For quantization FDTM
suggests that over 80% of parameters can naively be transformed to int8 without
special outlier management. These evaluations indicate the necessity of
choosing appropriate compressions for parameters individually-and that FDTM can
identify those-while standard metrics result in deteriorated outcomes
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
The recent popularity of text-to-image diffusion models (DM) can largely be
attributed to the intuitive interface they provide to users. The intended
generation can be expressed in natural language, with the model producing
faithful interpretations of text prompts. However, expressing complex or
nuanced ideas in text alone can be difficult. To ease image generation, we
propose MultiFusion that allows one to express complex and nuanced concepts
with arbitrarily interleaved inputs of multiple modalities and languages.
MutliFusion leverages pre-trained models and aligns them for integration into a
cohesive system, thereby avoiding the need for extensive training from scratch.
Our experimental results demonstrate the efficient transfer of capabilities
from individual modules to the downstream model. Specifically, the fusion of
all independent components allows the image generation module to utilize
multilingual, interleaved multimodal inputs despite being trained solely on
monomodal data in a single language