142 research outputs found
Towards Accelerated Model Training via Bayesian Data Selection
Mislabeled, duplicated, or biased data in real-world scenarios can lead to
prolonged training and even hinder model convergence. Traditional solutions
prioritizing easy or hard samples lack the flexibility to handle such a variety
simultaneously. Recent work has proposed a more reasonable data selection
principle by examining the data's impact on the model's generalization loss.
However, its practical adoption relies on less principled approximations and
additional clean holdout data. This work solves these problems by leveraging a
lightweight Bayesian treatment and incorporating off-the-shelf zero-shot
predictors built on large-scale pre-trained models. The resulting algorithm is
efficient and easy-to-implement. We perform extensive empirical studies on
challenging benchmarks with considerable data noise and imbalance in the online
batch selection scenario, and observe superior training efficiency over
competitive baselines. Notably, on the challenging WebVision benchmark, our
method can achieve similar predictive performance with significantly fewer
training iterations than leading data selection methods
Clock Factorized Quantum Monte Carlo Method for Long-range Interacting Systems
Simulating long-range interacting systems is a challenging task due to its
computational complexity that the computational effort for each local update is
of order , where is the size of system. Recently, a
technique, called hereby the clock factorized quantum Monte Carlo method, was
developed on the basis of the so-called factorized Metropolis filter [Phys.
Rev. E 99 010105 (2019)]. In this work, we first explain step by step how the
clock factorized quantum Monte Carlo method is implemented to reduce the
computational overhead from to (1). In particular, the
core ingredients, including the concepts of bound probabilities and bound
rejection events, the tree-like data structure, and the fast algorithms for
sampling an extensive set of discrete and small probabilities, are elaborated.
Next, we show how the clock factorized quantum Monte Carlo method can be
flexibly implemented in various update strategies, like the Metropolis and
worm-type algorithms, and can be generalized to simulate quantum systems.
Finally, we demonstrate the high efficiency of the clock factorized quantum
Monte Carlo algorithms in the examples of the quantum Ising model and the
Bose-Hubbard model with long-range interactions and/or long-range hopping
amplitudes. We expect that the clock factorized quantum Monte Carlo algorithms
would find broad applications in statistical and condensed-matter physics
Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks
Text-to-image (T2I) diffusion models (DMs) have shown promise in generating
high-quality images from textual descriptions. The real-world applications of
these models require particular attention to their safety and fidelity, but
this has not been sufficiently explored. One fundamental question is whether
existing T2I DMs are robust against variations over input texts. To answer it,
this work provides the first robustness evaluation of T2I DMs against
real-world attacks. Unlike prior studies that focus on malicious attacks
involving apocryphal alterations to the input texts, we consider an attack
space spanned by realistic errors (e.g., typo, glyph, phonetic) that humans can
make, to ensure semantic consistency. Given the inherent randomness of the
generation process, we develop novel distribution-based attack objectives to
mislead T2I DMs. We perform attacks in a black-box manner without any knowledge
of the model. Extensive experiments demonstrate the effectiveness of our method
for attacking popular T2I DMs and simultaneously reveal their non-trivial
robustness issues. Moreover, we provide an in-depth analysis of our method to
show that it is not designed to attack the text encoder in T2I DMs solely
BayesAdapter: Being Bayesian, Inexpensively and Reliably, via Bayesian Fine-tuning
Despite their theoretical appealingness, Bayesian neural networks (BNNs) are
left behind in real-world adoption, due to persistent concerns on their
scalability, accessibility, and reliability. In this work, we aim to relieve
these concerns by developing the BayesAdapter framework for learning
variational BNNs. In particular, we propose to adapt the pre-trained
deterministic NNs to be BNNs via cost-effective Bayesian fine-tuning. To make
BayesAdapter more practical, we technically contribute 1) a modularized,
user-friendly implementation for the learning of variational BNNs under two
representative variational distributions, 2) a generally applicable strategy
for reducing the gradient variance in stochastic variational inference, 3) an
explanation for the unreliability issue of BNNs' uncertainty estimates, and a
corresponding prescription. Through extensive experiments on diverse
benchmarks, we show that BayesAdapter can consistently induce posteriors with
higher quality than the from-scratch variational inference and other
competitive baselines, especially in large-scale settings, yet significantly
reducing training overheads
Learning Sample Difficulty from Pre-trained Models for Reliable Prediction
Large-scale pre-trained models have achieved remarkable success in many
applications, but how to leverage them to improve the prediction reliability of
downstream models is undesirably under-explored. Moreover, modern neural
networks have been found to be poorly calibrated and make overconfident
predictions regardless of inherent sample difficulty and data uncertainty. To
address this issue, we propose to utilize large-scale pre-trained models to
guide downstream model training with sample difficulty-aware entropy
regularization. Pre-trained models that have been exposed to large-scale
datasets and do not overfit the downstream training classes enable us to
measure each training sample's difficulty via feature-space Gaussian modeling
and relative Mahalanobis distance computation. Importantly, by adaptively
penalizing overconfident prediction based on the sample difficulty, we
simultaneously improve accuracy and uncertainty calibration across challenging
benchmarks (e.g., +0.55% ACC and -3.7% ECE on ImageNet1k using ResNet34),
consistently surpassing competitive baselines for reliable prediction. The
improved uncertainty estimate further improves selective classification
(abstaining from erroneous predictions) and out-of-distribution detection.Comment: NeurIPS 202
Improved Operator Learning by Orthogonal Attention
Neural operators, as an efficient surrogate model for learning the solutions
of PDEs, have received extensive attention in the field of scientific machine
learning. Among them, attention-based neural operators have become one of the
mainstreams in related research. However, existing approaches overfit the
limited training data due to the considerable number of parameters in the
attention mechanism. To address this, we develop an orthogonal attention based
on the eigendecomposition of the kernel integral operator and the neural
approximation of eigenfunctions. The orthogonalization naturally poses a proper
regularization effect on the resulting neural operator, which aids in resisting
overfitting and boosting generalization. Experiments on six standard neural
operator benchmark datasets comprising both regular and irregular geometries
show that our method can outperform competing baselines with decent margins.Comment: 14 pages, 5 figure
- …