92 research outputs found
ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference
Aligning language models to human expectations, e.g., being helpful and
harmless, has become a pressing challenge for large language models. A typical
alignment procedure consists of supervised fine-tuning and preference learning.
Most preference learning methods, such as RLHF and DPO, depend on pairwise
preference data, which inadequately address scenarios where human feedback is
point-wise, leading to potential information loss and suboptimal performance.
Addressing this gap, we introduce Point-wise Direct Preference Optimization, a
novel preference learning method designed to harness point-wise feedback
effectively. Our work also uncovers a novel connection between supervised
fine-tuning and point-wise preference learning, culminating in Unified Language
Model Alignment, a single-step method that unifies the alignment with human
demonstrations and point-wise preferences. Extensive experiments on point-wise
preference datasets with binary or continuous labels validate the effectiveness
of our methods. Our code and a new dataset with high-quality demonstration
samples on harmlessness are released
Towards Omni-supervised Referring Expression Segmentation
Referring Expression Segmentation (RES) is an emerging task in computer
vision, which segments the target instances in images based on text
descriptions. However, its development is plagued by the expensive segmentation
labels. To address this issue, we propose a new learning task for RES called
Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to
make full use of unlabeled, fully labeled and weakly labeled data, e.g.,
referring points or grounding boxes, for efficient RES training. To accomplish
this task, we also propose a novel yet strong baseline method for Omni-RES
based on the recently popular teacher-student learning, where the weak labels
are not directly transformed into supervision signals but used as a yardstick
to select and refine high-quality pseudo-masks for teacher-student learning. To
validate the proposed Omni-RES method, we apply it to a set of state-of-the-art
RES models and conduct extensive experiments on a bunch of RES datasets. The
experimental results yield the obvious merits of Omni-RES than the
fully-supervised and semi-supervised training schemes. For instance, with only
10% fully labeled data, Omni-RES can help the base model achieve 100% fully
supervised performance, and it also outperform the semi-supervised alternative
by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+,
respectively. More importantly, Omni-RES also enable the use of large-scale
vision-langauges like Visual Genome to facilitate low-cost RES training, and
achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO
Towards Efficient Visual Adaption via Structural Re-parameterization
Parameter-efficient transfer learning (PETL) is an emerging research spot
aimed at inexpensively adapting large-scale pre-trained models to downstream
tasks. Recent advances have achieved great success in saving storage costs for
various vision tasks by updating or injecting a small number of parameters
instead of full fine-tuning. However, we notice that most existing PETL methods
still incur non-negligible latency during inference. In this paper, we propose
a parameter-efficient and computationally friendly adapter for giant vision
models, called RepAdapter. Specifically, we prove that the adaption modules,
even with a complex structure, can be seamlessly integrated into most giant
vision models via structural re-parameterization. This property makes
RepAdapter zero-cost during inference. In addition to computation efficiency,
RepAdapter is more effective and lightweight than existing PETL methods due to
its sparse structure and our careful deployment. To validate RepAdapter, we
conduct extensive experiments on 27 benchmark datasets of three vision tasks,
i.e., image and video classifications and semantic segmentation. Experimental
results show the superior performance and efficiency of RepAdapter than the
state-of-the-art PETL methods. For instance, by updating only 0.6% parameters,
we can improve the performance of ViT from 38.8 to 55.1 on Sun397. Its
generalizability is also well validated by a bunch of vision models, i.e., ViT,
CLIP, Swin-Transformer and ConvNeXt. Our source code is released at
https://github.com/luogen1996/RepAdapter
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Prompt tuning is a parameter-efficient way to deploy large-scale pre-trained
models to downstream tasks by adding task-specific tokens. In terms of
vision-language pre-trained (VLP) models, prompt tuning often requires a large
number of learnable tokens to bridge the gap between the pre-training and
downstream tasks, which greatly exacerbates the already high computational
overhead. In this paper, we revisit the principle of prompt tuning for
Transformer-based VLP models and reveal that the impact of soft prompt tokens
can be actually approximated via independent information diffusion steps,
thereby avoiding the expensive global attention modeling and reducing the
computational complexity to a large extent. Based on this finding, we propose a
novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer
learning. To validate APT, we apply it to two representative VLP models, namely
ViLT and METER, and conduct extensive experiments on a bunch of downstream
tasks. Meanwhile, the generalization of APT is also validated on CLIP for image
classification. The experimental results not only show the superior performance
gains and computation efficiency of APT against the conventional prompt tuning
methods, e.g., +6.6% accuracy and -64.62% additional computation overhead on
METER, but also confirm its merits over other parameter-efficient transfer
learning approaches
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
Text-driven 3D stylization is a complex and crucial task in the fields of
computer vision (CV) and computer graphics (CG), aimed at transforming a bare
mesh to fit a target text. Prior methods adopt text-independent multilayer
perceptrons (MLPs) to predict the attributes of the target mesh with the
supervision of CLIP loss. However, such text-independent architecture lacks
textual guidance during predicting attributes, thus leading to unsatisfactory
stylization and slow convergence. To address these limitations, we present
X-Mesh, an innovative text-driven 3D stylization framework that incorporates a
novel Text-guided Dynamic Attention Module (TDAM). The TDAM dynamically
integrates the guidance of the target text by utilizing text-relevant spatial
and channel-wise attentions during vertex feature extraction, resulting in more
accurate attribute prediction and faster convergence speed. Furthermore,
existing works lack standard benchmarks and automated metrics for evaluation,
often relying on subjective and non-reproducible user studies to assess the
quality of stylized 3D assets. To overcome this limitation, we introduce a new
standard text-mesh benchmark, namely MIT-30, and two automated metrics, which
will enable future research to achieve fair and objective comparisons. Our
extensive qualitative and quantitative experiments demonstrate that X-Mesh
outperforms previous state-of-the-art methods.Comment: Technical repor
Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning
We study the budget allocation problem in online marketing campaigns that
utilize previously collected offline data. We first discuss the long-term
effect of optimizing marketing budget allocation decisions in the offline
setting. To overcome the challenge, we propose a novel game-theoretic offline
value-based reinforcement learning method using mixed policies. The proposed
method reduces the need to store infinitely many policies in previous methods
to only constantly many policies, which achieves nearly optimal policy
efficiency, making it practical and favorable for industrial usage. We further
show that this method is guaranteed to converge to the optimal policy, which
cannot be achieved by previous value-based reinforcement learning methods for
marketing budget allocation. Our experiments on a large-scale marketing
campaign with tens-of-millions users and more than one billion budget verify
the theoretical results and show that the proposed method outperforms various
baseline methods. The proposed method has been successfully deployed to serve
all the traffic of this marketing campaign.Comment: WSDM 23, Best Paper Candidat
Blood glucose level affects prognosis of patients who received intravenous thrombolysis after acute ischemic stroke? A meta-analysis
Background and objectivesIntravenous recombinant tissue plasminogen activator (rtPA) thrombolysis is an effective treatment for acute ischemic stroke. Hyperglycemia is a major risk factor for the occurrence, development, and prognosis of ischemic stroke. This meta-analysis purposefully estimates the association between hyperglycemia and poor prognosis in acute ischemic stroke patients receiving intravenous rtPA thrombolytic therapy.Materials and methodsAccording to the predefined inclusion criteria, we searched PubMed, Web of Science, and Cochrane Library databases. The association of high blood glucose(>140mg/dl) with symptomatic intracranial hemorrhage (sICH), poor clinical outcome and mortality at 90 days post-rtPA thrombolysis was studied using both a common effects model and a random effects model. Odds ratios (ORs) were plotted on forest plots.ResultsOf a total cohort of 2565 patients who received intravenous thrombolytic therapy, 721 had higher blood glucose. High glucose level significantly increased the odds of sICH (OR 1.80; 95% confidence interval(95%CI): 1.30- 2.50) and poor clinical outcome at 90 days (OR 1.82; 95%CI: 1.52-2.19), and all-cause mortality at 90 days (OR 2.51; 95%CI:1.65-3.82).ConclusionsIn our meta-analysis, high blood glucose was significantly associated with sICH, poor clinical outcome and higher mortality at 90 days
- …