394 research outputs found
One Forward is Enough for Neural Network Training via Likelihood Ratio Method
While backpropagation (BP) is the mainstream approach for gradient
computation in neural network training, its heavy reliance on the chain rule of
differentiation constrains the designing flexibility of network architecture
and training pipelines. We avoid the recursive computation in BP and develop a
unified likelihood ratio (ULR) method for gradient estimation with just one
forward propagation. Not only can ULR be extended to train a wide variety of
neural network architectures, but the computation flow in BP can also be
rearranged by ULR for better device adaptation. Moreover, we propose several
variance reduction techniques to further accelerate the training process. Our
experiments offer numerical results across diverse aspects, including various
neural network training scenarios, computation flow rearrangement, and
fine-tuning of pre-trained models. All findings demonstrate that ULR
effectively enhances the flexibility of neural network training by permitting
localized module training without compromising the global objective and
significantly boosts the network robustness
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Quantization of transformer language models faces significant challenges due
to the existence of detrimental outliers in activations. We observe that these
outliers are asymmetric and concentrated in specific channels. To address this
issue, we propose the Outlier Suppression+ framework. First, we introduce
channel-wise shifting and scaling operations to eliminate asymmetric
presentation and scale down problematic channels. We demonstrate that these
operations can be seamlessly migrated into subsequent modules while maintaining
equivalence. Second, we quantitatively analyze the optimal values for shifting
and scaling, taking into account both the asymmetric property and quantization
errors of weights in the next layer. Our lightweight framework can incur
minimal performance degradation under static and standard post-training
quantization settings. Comprehensive results across various tasks and models
reveal that our approach achieves near-floating-point performance on both small
models, such as BERT, and large language models (LLMs) including OPTs, BLOOM,
and BLOOMZ at 8-bit and 6-bit settings. Furthermore, we establish a new state
of the art for 4-bit BERT
A Novel Noise Injection-based Training Scheme for Better Model Robustness
Noise injection-based method has been shown to be able to improve the
robustness of artificial neural networks in previous work. In this work, we
propose a novel noise injection-based training scheme for better model
robustness. Specifically, we first develop a likelihood ratio method to
estimate the gradient with respect to both synaptic weights and noise levels
for stochastic gradient descent training. Then, we design an approximation for
the vanilla noise injection-based training method to reduce memory and improve
computational efficiency. Next, we apply our proposed scheme to spiking neural
networks and evaluate the performance of classification accuracy and robustness
on MNIST and Fashion-MNIST datasets. Experiment results show that our proposed
method achieves a much better performance on adversarial robustness and
slightly better performance on original accuracy, compared with the
conventional gradient-based training method
ICStega: Image Captioning-based Semantically Controllable Linguistic Steganography
Nowadays, social media has become the preferred communication platform for
web users but brought security threats. Linguistic steganography hides secret
data into text and sends it to the intended recipient to realize covert
communication. Compared to edit-based linguistic steganography,
generation-based approaches largely improve the payload capacity. However,
existing methods can only generate stego text alone. Another common behavior in
social media is sending semantically related image-text pairs. In this paper,
we put forward a novel image captioning-based stegosystem, where the secret
messages are embedded into the generated captions. Thus, the semantics of the
stego text can be controlled and the secret data can be transmitted by sending
semantically related image-text pairs. To balance the conflict between payload
capacity and semantic preservation, we proposed a new sampling method called
Two-Parameter Semantic Control Sampling to cutoff low-probability words.
Experimental results have shown that our method can control diversity, payload
capacity, security, and semantic accuracy at the same time.Comment: 5 pages, 5 tables, 3 figures. Accepted by ICASSP 202
MFES-HB: Efficient Hyperband with Multi-Fidelity Quality Measurements
Hyperparameter optimization (HPO) is a fundamental problem in automatic
machine learning (AutoML). However, due to the expensive evaluation cost of
models (e.g., training deep learning models or training models on large
datasets), vanilla Bayesian optimization (BO) is typically computationally
infeasible. To alleviate this issue, Hyperband (HB) utilizes the early stopping
mechanism to speed up configuration evaluations by terminating those
badly-performing configurations in advance. This leads to two kinds of quality
measurements: (1) many low-fidelity measurements for configurations that get
early-stopped, and (2) few high-fidelity measurements for configurations that
are evaluated without being early stopped. The state-of-the-art HB-style
method, BOHB, aims to combine the benefits of both BO and HB. Instead of
sampling configurations randomly in HB, BOHB samples configurations based on a
BO surrogate model, which is constructed with the high-fidelity measurements
only. However, the scarcity of high-fidelity measurements greatly hampers the
efficiency of BO to guide the configuration search. In this paper, we present
MFES-HB, an efficient Hyperband method that is capable of utilizing both the
high-fidelity and low-fidelity measurements to accelerate the convergence of
HPO tasks. Designing MFES-HB is not trivial as the low-fidelity measurements
can be biased yet informative to guide the configuration search. Thus we
propose to build a Multi- Fidelity Ensemble Surrogate (MFES) based on the
generalized Product of Experts framework, which can integrate useful
information from multi-fidelity measurements effectively. The empirical studies
on the real-world AutoML tasks demonstrate that MFES-HB can achieve 3.3-8.9x
speedups over the state-of-the-art approach - BOHB
RobustMQ: Benchmarking Robustness of Quantized Models
Quantization has emerged as an essential technique for deploying deep neural
networks (DNNs) on devices with limited resources. However, quantized models
exhibit vulnerabilities when exposed to various noises in real-world
applications. Despite the importance of evaluating the impact of quantization
on robustness, existing research on this topic is limited and often disregards
established principles of robustness evaluation, resulting in incomplete and
inconclusive findings. To address this gap, we thoroughly evaluated the
robustness of quantized models against various noises (adversarial attacks,
natural corruptions, and systematic noises) on ImageNet. The comprehensive
evaluation results empirically provide valuable insights into the robustness of
quantized models in various scenarios, for example: (1) quantized models
exhibit higher adversarial robustness than their floating-point counterparts,
but are more vulnerable to natural corruptions and systematic noises; (2) in
general, increasing the quantization bit-width results in a decrease in
adversarial robustness, an increase in natural robustness, and an increase in
systematic robustness; (3) among corruption methods, \textit{impulse noise} and
\textit{glass blur} are the most harmful to quantized models, while
\textit{brightness} has the least impact; (4) among systematic noises, the
\textit{nearest neighbor interpolation} has the highest impact, while bilinear
interpolation, cubic interpolation, and area interpolation are the three least
harmful. Our research contributes to advancing the robust quantization of
models and their deployment in real-world scenarios.Comment: 15 pages, 7 figure
Single-shot compressed ultrafast photography: a review
Compressed ultrafast photography (CUP) is a burgeoning single-shot computational imaging technique that provides an imaging speed as high as 10 trillion frames per second and a sequence depth of up to a few hundred frames. This technique synergizes compressed sensing and the streak camera technique to capture nonrepeatable ultrafast transient events with a single shot. With recent unprecedented technical developments and extensions of this methodology, it has been widely used in ultrafast optical imaging and metrology, ultrafast electron diffraction and microscopy, and information security protection. We review the basic principles of CUP, its recent advances in data acquisition and image reconstruction, its fusions with other modalities, and its unique applications in multiple research fields
- …