216 research outputs found
Progressive-Hint Prompting Improves Reasoning in Large Language Models
The performance of Large Language Models (LLMs) in reasoning tasks depends
heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency
being critical methods that enhance this ability. However, these methods do not
fully exploit the answers generated by the LLM to guide subsequent responses.
This paper proposes a new prompting method, named Progressive-Hint Prompting
(PHP), that enables automatic multiple interactions between users and LLMs by
using previously generated answers as hints to progressively guide toward the
correct answers. PHP is orthogonal to CoT and self-consistency, making it easy
to combine with state-of-the-art techniques to further improve performance. We
conducted an extensive and comprehensive evaluation to demonstrate the
effectiveness of the proposed method. Our experimental results on six
benchmarks show that combining CoT and self-consistency with PHP significantly
improves accuracy while remaining highly efficient. For instance, with
text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding
compared to Complex CoT, and a 46.17% reduction in sample paths with
self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances
on SVAMP (91.9%), GSM8K (95.5%) and AQuA (79.9%).Comment: Tech Repor
An adaptive model checking test for functional linear model
Numerous studies have been devoted to the estimation and inference problems
for functional linear models (FLM). However, few works focus on model checking
problem that ensures the reliability of results. Limited tests in this area do
not have tractable null distributions or asymptotic analysis under
alternatives. Also, the functional predictor is usually assumed to be fully
observed, which is impractical. To address these problems, we propose an
adaptive model checking test for FLM. It combines regular moment-based and
conditional moment-based tests, and achieves model adaptivity via the dimension
of a residual-based subspace. The advantages of our test are manifold. First,
it has a tractable chi-squared null distribution and higher powers under the
alternatives than its components. Second, asymptotic properties under different
underlying models are developed, including the unvisited local alternatives.
Third, the test statistic is constructed upon finite grid points, which
incorporates the discrete nature of collected data. We develop the desirable
relationship between sample size and number of grid points to maintain the
asymptotic properties. Besides, we provide a data-driven approach to estimate
the dimension leading to model adaptivity, which is promising in sufficient
dimension reduction. We conduct comprehensive numerical experiments to
demonstrate the advantages the test inherits from its two simple components
T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
Despite the stunning ability to generate high-quality images by recent
text-to-image models, current approaches often struggle to effectively compose
objects with different attributes and relationships into a complex and coherent
scene. We propose T2I-CompBench, a comprehensive benchmark for open-world
compositional text-to-image generation, consisting of 6,000 compositional text
prompts from 3 categories (attribute binding, object relationships, and complex
compositions) and 6 sub-categories (color binding, shape binding, texture
binding, spatial relationships, non-spatial relationships, and complex
compositions). We further propose several evaluation metrics specifically
designed to evaluate compositional text-to-image generation. We introduce a new
approach, Generative mOdel fine-tuning with Reward-driven Sample selection
(GORS), to boost the compositional text-to-image generation abilities of
pretrained text-to-image models. Extensive experiments and evaluations are
conducted to benchmark previous methods on T2I-CompBench, and to validate the
effectiveness of our proposed evaluation metrics and GORS approach. Project
page is available at https://karine-h.github.io/T2I-CompBench/.Comment: Project page: https://karine-h.github.io/T2I-CompBench
Drag-A-Video: Non-rigid Video Editing with Point-based Interaction
Video editing is a challenging task that requires manipulating videos on both
the spatial and temporal dimensions. Existing methods for video editing mainly
focus on changing the appearance or style of the objects in the video, while
keeping their structures unchanged. However, there is no existing method that
allows users to interactively ``drag'' any points of instances on the first
frame to precisely reach the target points with other frames consistently
deformed. In this paper, we propose a new diffusion-based method for
interactive point-based video manipulation, called Drag-A-Video. Our method
allows users to click pairs of handle points and target points as well as masks
on the first frame of an input video. Then, our method transforms the inputs
into point sets and propagates these sets across frames. To precisely modify
the contents of the video, we employ a new video-level motion supervision to
update the features of the video and introduce the latent offsets to achieve
this update at multiple denoising timesteps. We propose a temporal-consistent
point tracking module to coordinate the movement of the points in the handle
point sets. We demonstrate the effectiveness and flexibility of our method on
various videos. The website of our work is available here:
https://drag-a-video.github.io/
Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation
Diffusion Transformers have recently shown remarkable effectiveness in
generating high-quality 3D point clouds. However, training voxel-based
diffusion models for high-resolution 3D voxels remains prohibitively expensive
due to the cubic complexity of attention operators, which arises from the
additional dimension of voxels. Motivated by the inherent redundancy of 3D
compared to 2D, we propose FastDiT-3D, a novel masked diffusion transformer
tailored for efficient 3D point cloud generation, which greatly reduces
training costs. Specifically, we draw inspiration from masked autoencoders to
dynamically operate the denoising process on masked voxelized point clouds. We
also propose a novel voxel-aware masking strategy to adaptively aggregate
background/foreground information from voxelized point clouds. Our method
achieves state-of-the-art performance with an extreme masking ratio of nearly
99%. Moreover, to improve multi-category 3D generation, we introduce
Mixture-of-Expert (MoE) in 3D diffusion model. Each category can learn a
distinct diffusion path with different experts, relieving gradient conflict.
Experimental results on the ShapeNet dataset demonstrate that our method
achieves state-of-the-art high-fidelity and diverse 3D point cloud generation
performance. Our FastDiT-3D improves 1-Nearest Neighbor Accuracy and Coverage
metrics when generating 128-resolution voxel point clouds, using only 6.5% of
the original training cost.Comment: Project Page: https://dit-3d.github.io/FastDiT-3D
DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
Diffusion models have proven to be highly effective in generating
high-quality images. However, adapting large pre-trained diffusion models to
new domains remains an open challenge, which is critical for real-world
applications. This paper proposes DiffFit, a parameter-efficient strategy to
fine-tune large pre-trained diffusion models that enable fast adaptation to new
domains. DiffFit is embarrassingly simple that only fine-tunes the bias term
and newly-added scaling factors in specific layers, yet resulting in
significant training speed-up and reduced model storage costs. Compared with
full fine-tuning, DiffFit achieves 2 training speed-up and only needs
to store approximately 0.12\% of the total model parameters. Intuitive
theoretical analysis has been provided to justify the efficacy of scaling
factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior
or competitive performances compared to the full fine-tuning while being more
efficient. Remarkably, we show that DiffFit can adapt a pre-trained
low-resolution generative model to a high-resolution one by adding minimal
cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of
3.02 on ImageNet 512512 benchmark by fine-tuning only 25 epochs from a
public pre-trained ImageNet 256256 checkpoint while being 30
more training efficient than the closest competitor.Comment: Tech Repor
Pathology Steered Stratification Network for Subtype Identification in Alzheimer's Disease
Alzheimer's disease (AD) is a heterogeneous, multifactorial neurodegenerative
disorder characterized by beta-amyloid, pathologic tau, and neurodegeneration.
There are no effective treatments for Alzheimer's disease at a late stage,
urging for early intervention. However, existing statistical inference
approaches of AD subtype identification ignore the pathological domain
knowledge, which could lead to ill-posed results that are sometimes
inconsistent with the essential neurological principles. Integrating systems
biology modeling with machine learning, we propose a novel pathology steered
stratification network (PSSN) that incorporates established domain knowledge in
AD pathology through a reaction-diffusion model, where we consider non-linear
interactions between major biomarkers and diffusion along brain structural
network. Trained on longitudinal multimodal neuroimaging data, the biological
model predicts long-term trajectories that capture individual progression
pattern, filling in the gaps between sparse imaging data available. A deep
predictive neural network is then built to exploit spatiotemporal dynamics,
link neurological examinations with clinical profiles, and generate subtype
assignment probability on an individual basis. We further identify an
evolutionary disease graph to quantify subtype transition probabilities through
extensive simulations. Our stratification achieves superior performance in both
inter-cluster heterogeneity and intra-cluster homogeneity of various clinical
scores. Applying our approach to enriched samples of aging populations, we
identify six subtypes spanning AD spectrum, where each subtype exhibits a
distinctive biomarker pattern that is consistent with its clinical outcome.
PSSN provides insights into pre-symptomatic diagnosis and practical guidance on
clinical treatments, which may be further generalized to other
neurodegenerative diseases
- …