Search CORE

145 research outputs found

Towards Accelerated Model Training via Bayesian Data Selection

Author: Cui Peng
Deng Zhijie
Zhu Jun
Publication venue
Publication date: 21/08/2023
Field of study

Mislabeled, duplicated, or biased data in real-world scenarios can lead to prolonged training and even hinder model convergence. Traditional solutions prioritizing easy or hard samples lack the flexibility to handle such a variety simultaneously. Recent work has proposed a more reasonable data selection principle by examining the data's impact on the model's generalization loss. However, its practical adoption relies on less principled approximations and additional clean holdout data. This work solves these problems by leveraging a lightweight Bayesian treatment and incorporating off-the-shelf zero-shot predictors built on large-scale pre-trained models. The resulting algorithm is efficient and easy-to-implement. We perform extensive empirical studies on challenging benchmarks with considerable data noise and imbalance in the online batch selection scenario, and observe superior training efficiency over competitive baselines. Notably, on the challenging WebVision benchmark, our method can achieve similar predictive performance with significantly fewer training iterations than leading data selection methods

arXiv.org e-Print Archive

Clock Factorized Quantum Monte Carlo Method for Long-range Interacting Systems

Author: Deng Youjin
Fan Zhijie
Zhang Chao
Publication venue
Publication date: 13/08/2023
Field of study

Simulating long-range interacting systems is a challenging task due to its computational complexity that the computational effort for each local update is of order

\cal{O}

(N)

, where

N

is the size of system. Recently, a technique, called hereby the clock factorized quantum Monte Carlo method, was developed on the basis of the so-called factorized Metropolis filter [Phys. Rev. E 99 010105 (2019)]. In this work, we first explain step by step how the clock factorized quantum Monte Carlo method is implemented to reduce the computational overhead from

\cal{O}

(N)

\cal{O}

(1). In particular, the core ingredients, including the concepts of bound probabilities and bound rejection events, the tree-like data structure, and the fast algorithms for sampling an extensive set of discrete and small probabilities, are elaborated. Next, we show how the clock factorized quantum Monte Carlo method can be flexibly implemented in various update strategies, like the Metropolis and worm-type algorithms, and can be generalized to simulate quantum systems. Finally, we demonstrate the high efficiency of the clock factorized quantum Monte Carlo algorithms in the examples of the quantum Ising model and the Bose-Hubbard model with long-range interactions and/or long-range hopping amplitudes. We expect that the clock factorized quantum Monte Carlo algorithms would find broad applications in statistical and condensed-matter physics

arXiv.org e-Print Archive

Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks

Author: Deng Zhijie
Dong Yinpeng
Gao Hongcheng
Zhang Hao
Publication venue
Publication date: 15/06/2023
Field of study

Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions. The real-world applications of these models require particular attention to their safety and fidelity, but this has not been sufficiently explored. One fundamental question is whether existing T2I DMs are robust against variations over input texts. To answer it, this work provides the first robustness evaluation of T2I DMs against real-world attacks. Unlike prior studies that focus on malicious attacks involving apocryphal alterations to the input texts, we consider an attack space spanned by realistic errors (e.g., typo, glyph, phonetic) that humans can make, to ensure semantic consistency. Given the inherent randomness of the generation process, we develop novel distribution-based attack objectives to mislead T2I DMs. We perform attacks in a black-box manner without any knowledge of the model. Extensive experiments demonstrate the effectiveness of our method for attacking popular T2I DMs and simultaneously reveal their non-trivial robustness issues. Moreover, we provide an in-depth analysis of our method to show that it is not designed to attack the text encoder in T2I DMs solely

arXiv.org e-Print Archive

BayesAdapter: Being Bayesian, Inexpensively and Reliably, via Bayesian Fine-tuning

Author: Deng Zhijie
Dong Yinpeng
Yang Xiao
Zhang Hao
Zhu Jun
Publication venue
Publication date: 31/03/2021
Field of study

Despite their theoretical appealingness, Bayesian neural networks (BNNs) are left behind in real-world adoption, due to persistent concerns on their scalability, accessibility, and reliability. In this work, we aim to relieve these concerns by developing the BayesAdapter framework for learning variational BNNs. In particular, we propose to adapt the pre-trained deterministic NNs to be BNNs via cost-effective Bayesian fine-tuning. To make BayesAdapter more practical, we technically contribute 1) a modularized, user-friendly implementation for the learning of variational BNNs under two representative variational distributions, 2) a generally applicable strategy for reducing the gradient variance in stochastic variational inference, 3) an explanation for the unreliability issue of BNNs' uncertainty estimates, and a corresponding prescription. Through extensive experiments on diverse benchmarks, we show that BayesAdapter can consistently induce posteriors with higher quality than the from-scratch variational inference and other competitive baselines, especially in large-scale settings, yet significantly reducing training overheads

arXiv.org e-Print Archive

Learning Sample Difficulty from Pre-trained Models for Reliable Prediction

Author: Cui Peng
Deng Zhijie
Dong Yinpeng
Zhang Dan
Zhu Jun
Publication venue
Publication date: 30/10/2023
Field of study

Large-scale pre-trained models have achieved remarkable success in many applications, but how to leverage them to improve the prediction reliability of downstream models is undesirably under-explored. Moreover, modern neural networks have been found to be poorly calibrated and make overconfident predictions regardless of inherent sample difficulty and data uncertainty. To address this issue, we propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization. Pre-trained models that have been exposed to large-scale datasets and do not overfit the downstream training classes enable us to measure each training sample's difficulty via feature-space Gaussian modeling and relative Mahalanobis distance computation. Importantly, by adaptively penalizing overconfident prediction based on the sample difficulty, we simultaneously improve accuracy and uncertainty calibration across challenging benchmarks (e.g., +0.55% ACC and -3.7% ECE on ImageNet1k using ResNet34), consistently surpassing competitive baselines for reliable prediction. The improved uncertainty estimate further improves selective classification (abstaining from erroneous predictions) and out-of-distribution detection.Comment: NeurIPS 202

arXiv.org e-Print Archive

Improved Operator Learning by Orthogonal Attention

Author: Deng Zhijie
Hao Zhongkai
Lin Bokai
Su Hang
Xiao Zipeng
Publication venue
Publication date: 23/10/2023
Field of study

Neural operators, as an efficient surrogate model for learning the solutions of PDEs, have received extensive attention in the field of scientific machine learning. Among them, attention-based neural operators have become one of the mainstreams in related research. However, existing approaches overfit the limited training data due to the considerable number of parameters in the attention mechanism. To address this, we develop an orthogonal attention based on the eigendecomposition of the kernel integral operator and the neural approximation of eigenfunctions. The orthogonalization naturally poses a proper regularization effect on the resulting neural operator, which aids in resisting overfitting and boosting generalization. Experiments on six standard neural operator benchmark datasets comprising both regular and irregular geometries show that our method can outperform competing baselines with decent margins.Comment: 14 pages, 5 figure

arXiv.org e-Print Archive