559 research outputs found
It is all about where you start: Text-to-image generation with seed selection
Text-to-image diffusion models can synthesize a large variety of concepts in
new compositions and scenarios. However, they still struggle with generating
uncommon concepts, rare unusual combinations, or structured concepts like hand
palms. Their limitation is partly due to the long-tail nature of their training
data: web-crawled data sets are strongly unbalanced, causing models to
under-represent concepts from the tail of the distribution. Here we
characterize the effect of unbalanced training data on text-to-image models and
offer a remedy. We show that rare concepts can be correctly generated by
carefully selecting suitable generation seeds in the noise space, a technique
that we call SeedSelect. SeedSelect is efficient and does not require
retraining the diffusion model. We evaluate the benefit of SeedSelect on a
series of problems. First, in few-shot semantic data augmentation, where we
generate semantically correct images for few-shot and long-tail benchmarks. We
show classification improvement on all classes, both from the head and tail of
the training data of diffusion models. We further evaluate SeedSelect on
correcting images of hands, a well-known pitfall of current diffusion models,
and show that it improves hand generation substantially
Clue: Cross-modal Coherence Modeling for Caption Generation
We use coherence relations inspired by computational models of discourse to
study the information needs and goals of image captioning. Using an annotation
protocol specifically devised for capturing image--caption coherence relations,
we annotate 10,000 instances from publicly-available image--caption pairs. We
introduce a new task for learning inferences in imagery and text, coherence
relation prediction, and show that these coherence annotations can be exploited
to learn relation classifiers as an intermediary step, and also train
coherence-aware, controllable image captioning models. The results show a
dramatic improvement in the consistency and quality of the generated captions
with respect to information needs specified via coherence relations.Comment: Accepted as a long paper to ACL 202
LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Over the past decade, Artificial Intelligence (AI) has had great success
recently and is being used in a wide range of academic and industrial fields.
More recently, LLMs have made rapid advancements that have propelled AI to a
new level, enabling even more diverse applications and industrial domains with
intelligence, particularly in areas like software engineering and natural
language processing. Nevertheless, a number of emerging trustworthiness
concerns and issues exhibited in LLMs have already recently received much
attention, without properly solving which the widespread adoption of LLMs could
be greatly hindered in practice. The distinctive characteristics of LLMs, such
as the self-attention mechanism, extremely large model scale, and
autoregressive generation schema, differ from classic AI software based on CNNs
and RNNs and present new challenges for quality analysis. Up to the present, it
still lacks universal and systematic analysis techniques for LLMs despite the
urgent industrial demand. Towards bridging this gap, we initiate an early
exploratory study and propose a universal analysis framework for LLMs, LUNA,
designed to be general and extensible, to enable versatile analysis of LLMs
from multiple quality perspectives in a human-interpretable manner. In
particular, we first leverage the data from desired trustworthiness
perspectives to construct an abstract model as an auxiliary analysis asset,
which is empowered by various abstract model construction methods. To assess
the quality of the abstract model, we collect and define a number of evaluation
metrics, aiming at both abstract model level and the semantics level. Then, the
semantics, which is the degree of satisfaction of the LLM w.r.t. the
trustworthiness perspective, is bound to and enriches the abstract model with
semantics, which enables more detailed analysis applications for diverse
purposes.Comment: 44 pages, 9 figure
DiffAlign : Few-shot learning using diffusion based synthesis and alignment
We address the problem of few-shot classification where the goal is to learn
a classifier from a limited set of samples. While data-driven learning is shown
to be effective in various applications, learning from less data still remains
challenging. To address this challenge, existing approaches consider various
data augmentation techniques for increasing the number of training samples.
Pseudo-labeling is commonly used in a few-shot setup, where approximate labels
are estimated for a large set of unlabeled images. We propose DiffAlign which
focuses on generating images from class labels. Specifically, we leverage the
recent success of the generative models (e.g., DALL-E and diffusion models)
that can generate realistic images from texts. However, naive learning on
synthetic images is not adequate due to the domain gap between real and
synthetic images. Thus, we employ a maximum mean discrepancy (MMD) loss to
align the synthetic images to the real images minimizing the domain gap. We
evaluate our method on the standard few-shot classification benchmarks:
CIFAR-FS, FC100, miniImageNet, tieredImageNet and a cross-domain few-shot
classification benchmark: miniImageNet to CUB. The proposed approach
significantly outperforms the stateof-the-art in both 5-shot and 1-shot setups
on these benchmarks. Our approach is also shown to be effective in the
zero-shot classification setu
A Survey of Language Model Confidence Estimation and Calibration
Language models (LMs) have demonstrated remarkable capabilities across a wide
range of tasks in various domains. Despite their impressive performance, the
reliability of their output is concerning and questionable regarding the demand
for AI safety. Assessing the confidence of LM predictions and calibrating them
across different tasks with the aim to align LM confidence with accuracy can
help mitigate risks and enable LMs to make better decisions. There have been
various works in this respect, but there has been no comprehensive overview of
this important research area. The present survey aims to bridge this gap. In
particular, we discuss methods and techniques for LM confidence estimation and
calibration, encompassing different LMs and various tasks. We further outline
the challenges of estimating the confidence for large language models and we
suggest some promising directions for future work.Comment: 16 pages, 1 page, 1 tabl
- …