204 research outputs found
Feedback-prop: Convolutional Neural Network Inference under Partial Evidence
We propose an inference procedure for deep convolutional neural networks
(CNNs) when partial evidence is available. Our method consists of a general
feedback-based propagation approach (feedback-prop) that boosts the prediction
accuracy for an arbitrary set of unknown target labels when the values for a
non-overlapping arbitrary set of target labels are known. We show that existing
models trained in a multi-label or multi-task setting can readily take
advantage of feedback-prop without any retraining or fine-tuning. Our
feedback-prop inference procedure is general, simple, reliable, and works on
different challenging visual recognition tasks. We present two variants of
feedback-prop based on layer-wise and residual iterative updates. We experiment
using several multi-task models and show that feedback-prop is effective in all
of them. Our results unveil a previously unreported but interesting dynamic
property of deep CNNs. We also present an associated technical approach that
takes advantage of this property for inference under partial evidence in
general visual recognition tasks.Comment: Accepted to CVPR 201
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
Language is increasingly being used to define rich visual recognition
problems with supporting image collections sourced from the web. Structured
prediction models are used in these tasks to take advantage of correlations
between co-occurring labels and visual input but risk inadvertently encoding
social biases found in web corpora. In this work, we study data and models
associated with multilabel object classification and visual semantic role
labeling. We find that (a) datasets for these tasks contain significant gender
bias and (b) models trained on these datasets further amplify existing bias.
For example, the activity cooking is over 33% more likely to involve females
than males in a training set, and a trained model further amplifies the
disparity to 68% at test time. We propose to inject corpus-level constraints
for calibrating existing structured prediction models and design an algorithm
based on Lagrangian relaxation for collective inference. Our method results in
almost no performance loss for the underlying recognition task but decreases
the magnitude of bias amplification by 47.5% and 40.5% for multilabel
classification and visual semantic role labeling, respectively.Comment: 11 pages, published in EMNLP 201
Variation of Gender Biases in Visual Recognition Models Before and After Finetuning
We introduce a framework to measure how biases change before and after
fine-tuning a large scale visual recognition model for a downstream task. Deep
learning models trained on increasing amounts of data are known to encode
societal biases. Many computer vision systems today rely on models typically
pretrained on large scale datasets. While bias mitigation techniques have been
developed for tuning models for downstream tasks, it is currently unclear what
are the effects of biases already encoded in a pretrained model. Our framework
incorporates sets of canonical images representing individual and pairs of
concepts to highlight changes in biases for an array of off-the-shelf
pretrained models across model sizes, dataset sizes, and training objectives.
Through our analyses, we find that (1) supervised models trained on datasets
such as ImageNet-21k are more likely to retain their pretraining biases
regardless of the target dataset compared to self-supervised models. We also
find that (2) models finetuned on larger scale datasets are more likely to
introduce new biased associations. Our results also suggest that (3) biases can
transfer to finetuned models and the finetuning objective and dataset can
impact the extent of transferred biases.Comment: 10 pages, 3 Figure
XVTP3D: Cross-view Trajectory Prediction Using Shared 3D Queries for Autonomous Driving
Trajectory prediction with uncertainty is a critical and challenging task for
autonomous driving. Nowadays, we can easily access sensor data represented in
multiple views. However, cross-view consistency has not been evaluated by the
existing models, which might lead to divergences between the multimodal
predictions from different views. It is not practical and effective when the
network does not comprehend the 3D scene, which could cause the downstream
module in a dilemma. Instead, we predicts multimodal trajectories while
maintaining cross-view consistency. We presented a cross-view trajectory
prediction method using shared 3D Queries (XVTP3D). We employ a set of 3D
queries shared across views to generate multi-goals that are cross-view
consistent. We also proposed a random mask method and coarse-to-fine
cross-attention to capture robust cross-view features. As far as we know, this
is the first work that introduces the outstanding top-down paradigm in BEV
detection field to a trajectory prediction problem. The results of experiments
on two publicly available datasets show that XVTP3D achieved state-of-the-art
performance with consistent cross-view predictions.Comment: 11 pages, 6 figures, accepted by IJCAI 2
Gender Biases in Automatic Evaluation Metrics for Image Captioning
Model-based evaluation metrics (e.g., CLIPScore and GPTScore) have
demonstrated decent correlations with human judgments in various language
generation tasks. However, their impact on fairness remains largely unexplored.
It is widely recognized that pretrained models can inadvertently encode
societal biases, thus employing these models for evaluation purposes may
inadvertently perpetuate and amplify biases. For example, an evaluation metric
may favor the caption "a woman is calculating an account book" over "a man is
calculating an account book," even if the image only shows male accountants. In
this paper, we conduct a systematic study of gender biases in model-based
automatic evaluation metrics for image captioning tasks. We start by curating a
dataset comprising profession, activity, and object concepts associated with
stereotypical gender associations. Then, we demonstrate the negative
consequences of using these biased metrics, including the inability to
differentiate between biased and unbiased generations, as well as the
propagation of biases to generation models through reinforcement learning.
Finally, we present a simple and effective way to mitigate the metric bias
without hurting the correlations with human judgments. Our dataset and
framework lay the foundation for understanding the potential harm of
model-based evaluation metrics, and facilitate future works to develop more
inclusive evaluation metrics.Comment: Accepted to EMNLP 202
Understanding In-Context Learning via Supportive Pretraining Data
In-context learning (ICL) improves language models' performance on a variety
of NLP tasks by simply demonstrating a handful of examples at inference time.
It is not well understood why ICL ability emerges, as the model has never been
specifically trained on such demonstrations. Unlike prior work that explores
implicit mechanisms behind ICL, we study ICL via investigating the pretraining
data. Specifically, we first adapt an iterative, gradient-based approach to
find a small subset of pretraining data that supports ICL. We observe that a
continued pretraining on this small subset significantly improves the model's
ICL ability, by up to 18%. We then compare the supportive subset constrastively
with random subsets of pretraining data and discover: (1) The supportive
pretraining data to ICL do not have a higher domain relevance to downstream
tasks. (2) The supportive pretraining data have a higher mass of rarely
occurring, long-tail tokens. (3) The supportive pretraining data are
challenging examples where the information gain from long-range context is
below average, indicating learning to incorporate difficult long-range context
encourages ICL. Our work takes a first step towards understanding ICL via
analyzing instance-level pretraining data. Our insights have a potential to
enhance the ICL ability of language models by actively guiding the construction
of pretraining data in the future.Comment: ACL 202
- …