53 research outputs found
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
As it is empirically observed that Vision Transformers (ViTs) are quite
insensitive to the order of input tokens, the need for an appropriate
self-supervised pretext task that enhances the location awareness of ViTs is
becoming evident. To address this, we present DropPos, a novel pretext task
designed to reconstruct Dropped Positions. The formulation of DropPos is
simple: we first drop a large random subset of positional embeddings and then
the model classifies the actual position for each non-overlapping patch among
all possible positions solely based on their visual appearance. To avoid
trivial solutions, we increase the difficulty of this task by keeping only a
subset of patches visible. Additionally, considering there may be different
patches with similar visual appearances, we propose position smoothing and
attentive reconstruction strategies to relax this classification problem, since
it is not necessary to reconstruct their exact positions in these cases.
Empirical evaluations of DropPos show strong capabilities. DropPos outperforms
supervised pre-training and achieves competitive results compared with
state-of-the-art self-supervised alternatives on a wide range of downstream
benchmarks. This suggests that explicitly encouraging spatial reasoning
abilities, as DropPos does, indeed contributes to the improved location
awareness of ViTs. The code is publicly available at
https://github.com/Haochen-Wang409/DropPos.Comment: Accepted by NeurIPS 202
Knowledge Editing for Large Language Models: A Survey
Large language models (LLMs) have recently transformed both the academic and
industrial landscapes due to their remarkable capacity to understand, analyze,
and generate texts based on their vast knowledge and reasoning ability.
Nevertheless, one major drawback of LLMs is their substantial computational
cost for pre-training due to their unprecedented amounts of parameters. The
disadvantage is exacerbated when new knowledge frequently needs to be
introduced into the pre-trained model. Therefore, it is imperative to develop
effective and efficient techniques to update pre-trained LLMs. Traditional
methods encode new knowledge in pre-trained LLMs through direct fine-tuning.
However, naively re-training LLMs can be computationally intensive and risks
degenerating valuable pre-trained knowledge irrelevant to the update in the
model. Recently, Knowledge-based Model Editing (KME) has attracted increasing
attention, which aims to precisely modify the LLMs to incorporate specific
knowledge, without negatively influencing other irrelevant knowledge. In this
survey, we aim to provide a comprehensive and in-depth overview of recent
advances in the field of KME. We first introduce a general formulation of KME
to encompass different KME strategies. Afterward, we provide an innovative
taxonomy of KME techniques based on how the new knowledge is introduced into
pre-trained LLMs, and investigate existing KME strategies while analyzing key
insights, advantages, and limitations of methods from each category. Moreover,
representative metrics, datasets, and applications of KME are introduced
accordingly. Finally, we provide an in-depth analysis regarding the
practicality and remaining challenges of KME and suggest promising research
directions for further advancement in this field.Comment: 33 page
Learning Domain-Aware Detection Head with Prompt Tuning
Domain adaptive object detection (DAOD) aims to generalize detectors trained
on an annotated source domain to an unlabelled target domain. However, existing
methods focus on reducing the domain bias of the detection backbone by
inferring a discriminative visual encoder, while ignoring the domain bias in
the detection head. Inspired by the high generalization of vision-language
models (VLMs), applying a VLM as the robust detection backbone following a
domain-aware detection head is a reasonable way to learn the discriminative
detector for each domain, rather than reducing the domain bias in traditional
methods. To achieve the above issue, we thus propose a novel DAOD framework
named Domain-Aware detection head with Prompt tuning (DA-Pro), which applies
the learnable domain-adaptive prompt to generate the dynamic detection head for
each domain. Formally, the domain-adaptive prompt consists of the
domain-invariant tokens, domain-specific tokens, and the domain-related textual
description along with the class label. Furthermore, two constraints between
the source and target domains are applied to ensure that the domain-adaptive
prompt can capture the domains-shared and domain-specific knowledge. A prompt
ensemble strategy is also proposed to reduce the effect of prompt disturbance.
Comprehensive experiments over multiple cross-domain adaptation tasks
demonstrate that using the domain-adaptive prompt can produce an effectively
domain-related detection head for boosting domain-adaptive object detection
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph
Cognitive research indicates that abstraction ability is essential in human
intelligence, which remains under-explored in language models. In this paper,
we present AbsPyramid, a unified entailment graph of 221K textual descriptions
of abstraction knowledge. While existing resources only touch nouns or verbs
within simplified events or specific domains, AbsPyramid collects abstract
knowledge for three components of diverse events to comprehensively evaluate
the abstraction ability of language models in the open domain. Experimental
results demonstrate that current LLMs face challenges comprehending abstraction
knowledge in zero-shot and few-shot settings. By training on our rich
abstraction knowledge, we find LLMs can acquire basic abstraction abilities and
generalize to unseen events. In the meantime, we empirically show that our
benchmark is comprehensive to enhance LLMs across two previous abstraction
tasks.Comment: Findings of NAACL202
TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining
A main goal of Argument Mining (AM) is to analyze an author's stance. Unlike
previous AM datasets focusing only on text, the shared task at the 10th
Workshop on Argument Mining introduces a dataset including both text and
images. Importantly, these images contain both visual elements and optical
characters. Our new framework, TILFA (A Unified Framework for Text, Image, and
Layout Fusion in Argument Mining), is designed to handle this mixed data. It
excels at not only understanding text but also detecting optical characters and
recognizing layout details in images. Our model significantly outperforms
existing baselines, earning our team, KnowComp, the 1st place in the
leaderboard of Argumentative Stance Classification subtask in this shared task.Comment: Accepted to the 10th Workshop on Argument Mining, co-located with
EMNLP 202
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization
Multi-armed bandit algorithms like Thompson Sampling (TS) can be used to
conduct adaptive experiments, in which maximizing reward means that data is
used to progressively assign participants to more effective arms. Such
assignment strategies increase the risk of statistical hypothesis tests
identifying a difference between arms when there is not one, and failing to
conclude there is a difference in arms when there truly is one. We tackle this
by introducing a novel heuristic algorithm, called TS-PostDiff (Posterior
Probability of Difference). TS-PostDiff takes a Bayesian approach to mixing TS
and Uniform Random (UR): the probability a participant is assigned using UR
allocation is the posterior probability that the difference between two arms is
'small' (below a certain threshold), allowing for more UR exploration when
there is little or no reward to be gained. We evaluate TS-PostDiff against
state-of-the-art strategies. The empirical and simulation results help
characterize the trade-offs of these approaches between reward, False Positive
Rate (FPR), and statistical power, as well as under which circumstances each is
effective. We quantify the advantage of TS-PostDiff in performing well across
multiple differences in arm means (effect sizes), showing the benefits of
adaptively changing randomization/exploration in TS in a "Statistically
Considerate" manner: reducing FPR and increasing statistical power when
differences are small or zero and there is less reward to be gained, while
exploiting more when differences may be large. This highlights important
considerations for future algorithm development and analysis to better balance
reward and statistical analysis
- …