23 research outputs found
Graph Construction with Flexible Nodes for Traffic Demand Prediction
Graph neural networks (GNNs) have been widely applied in traffic demand
prediction, and transportation modes can be divided into station-based mode and
free-floating traffic mode. Existing research in traffic graph construction
primarily relies on map matching to construct graphs based on the road network.
However, the complexity and inhomogeneity of data distribution in free-floating
traffic demand forecasting make road network matching inflexible. To tackle
these challenges, this paper introduces a novel graph construction method
tailored to free-floating traffic mode. We propose a novel density-based
clustering algorithm (HDPC-L) to determine the flexible positioning of nodes in
the graph, overcoming the computational bottlenecks of traditional clustering
algorithms and enabling effective handling of large-scale datasets.
Furthermore, we extract valuable information from ridership data to initialize
the edge weights of GNNs. Comprehensive experiments on two real-world datasets,
the Shenzhen bike-sharing dataset and the Haikou ride-hailing dataset, show
that the method significantly improves the performance of the model. On
average, our models show an improvement in accuracy of around 25\% and 19.5\%
on the two datasets. Additionally, it significantly enhances computational
efficiency, reducing training time by approximately 12% and 32.5% on the two
datasets. We make our code available at
https://github.com/houjinyan/HDPC-L-ODInit
Benchmarking the Robustness of Quantized Models
Quantization has emerged as an essential technique for deploying deep neural
networks (DNNs) on devices with limited resources. However, quantized models
exhibit vulnerabilities when exposed to various noises in real-world
applications. Despite the importance of evaluating the impact of quantization
on robustness, existing research on this topic is limited and often disregards
established principles of robustness evaluation, resulting in incomplete and
inconclusive findings. To address this gap, we thoroughly evaluated the
robustness of quantized models against various noises (adversarial attacks,
natural corruptions, and systematic noises) on ImageNet. Extensive experiments
demonstrate that lower-bit quantization is more resilient to adversarial
attacks but is more susceptible to natural corruptions and systematic noises.
Notably, our investigation reveals that impulse noise (in natural corruptions)
and the nearest neighbor interpolation (in systematic noises) have the most
significant impact on quantized models. Our research contributes to advancing
the robust quantization of models and their deployment in real-world scenarios.Comment: Workshop at IEEE Conference on Computer Vision and Pattern
Recognition 202
RobustMQ: Benchmarking Robustness of Quantized Models
Quantization has emerged as an essential technique for deploying deep neural
networks (DNNs) on devices with limited resources. However, quantized models
exhibit vulnerabilities when exposed to various noises in real-world
applications. Despite the importance of evaluating the impact of quantization
on robustness, existing research on this topic is limited and often disregards
established principles of robustness evaluation, resulting in incomplete and
inconclusive findings. To address this gap, we thoroughly evaluated the
robustness of quantized models against various noises (adversarial attacks,
natural corruptions, and systematic noises) on ImageNet. The comprehensive
evaluation results empirically provide valuable insights into the robustness of
quantized models in various scenarios, for example: (1) quantized models
exhibit higher adversarial robustness than their floating-point counterparts,
but are more vulnerable to natural corruptions and systematic noises; (2) in
general, increasing the quantization bit-width results in a decrease in
adversarial robustness, an increase in natural robustness, and an increase in
systematic robustness; (3) among corruption methods, \textit{impulse noise} and
\textit{glass blur} are the most harmful to quantized models, while
\textit{brightness} has the least impact; (4) among systematic noises, the
\textit{nearest neighbor interpolation} has the highest impact, while bilinear
interpolation, cubic interpolation, and area interpolation are the three least
harmful. Our research contributes to advancing the robust quantization of
models and their deployment in real-world scenarios.Comment: 15 pages, 7 figure
Distribution-sensitive Information Retention for Accurate Binary Neural Network
Model binarization is an effective method of compressing neural networks and
accelerating their inference process. However, a significant performance gap
still exists between the 1-bit model and the 32-bit one. The empirical study
shows that binarization causes a great loss of information in the forward and
backward propagation. We present a novel Distribution-sensitive Information
Retention Network (DIR-Net) that retains the information in the forward and
backward propagation by improving internal propagation and introducing external
representations. The DIR-Net mainly relies on three technical contributions:
(1) Information Maximized Binarization (IMB): minimizing the information loss
and the binarization error of weights/activations simultaneously by weight
balance and standardization; (2) Distribution-sensitive Two-stage Estimator
(DTE): retaining the information of gradients by distribution-sensitive soft
approximation by jointly considering the updating capability and accurate
gradient; (3) Representation-align Binarization-aware Distillation (RBD):
retaining the representation information by distilling the representations
between full-precision and binarized networks. The DIR-Net investigates both
forward and backward processes of BNNs from the unified information
perspective, thereby providing new insight into the mechanism of network
binarization. The three techniques in our DIR-Net are versatile and effective
and can be applied in various structures to improve BNNs. Comprehensive
experiments on the image classification and objective detection tasks show that
our DIR-Net consistently outperforms the state-of-the-art binarization
approaches under mainstream and compact architectures, such as ResNet, VGG,
EfficientNet, DARTS, and MobileNet. Additionally, we conduct our DIR-Net on
real-world resource-limited devices which achieves 11.1x storage saving and
5.4x speedup
Forward and Backward Information Retention for Accurate Binary Neural Networks
Weight and activation binarization is an effective approach to deep neural
network compression and can accelerate the inference by leveraging bitwise
operations. Although many binarization methods have improved the accuracy of
the model by minimizing the quantization error in forward propagation, there
remains a noticeable performance gap between the binarized model and the
full-precision one. Our empirical study indicates that the quantization brings
information loss in both forward and backward propagation, which is the
bottleneck of training accurate binary neural networks. To address these
issues, we propose an Information Retention Network (IR-Net) to retain the
information that consists in the forward activations and backward gradients.
IR-Net mainly relies on two technical contributions: (1) Libra Parameter
Binarization (Libra-PB): simultaneously minimizing both quantization error and
information loss of parameters by balanced and standardized weights in forward
propagation; (2) Error Decay Estimator (EDE): minimizing the information loss
of gradients by gradually approximating the sign function in backward
propagation, jointly considering the updating ability and accurate gradients.
We are the first to investigate both forward and backward processes of binary
networks from the unified information perspective, which provides new insight
into the mechanism of network binarization. Comprehensive experiments with
various network structures on CIFAR-10 and ImageNet datasets manifest that the
proposed IR-Net can consistently outperform state-of-the-art quantization
methods
2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution
Low-bit quantization has become widespread for compressing image
super-resolution (SR) models for edge deployment, which allows advanced SR
models to enjoy compact low-bit parameters and efficient integer/bitwise
constructions for storage compression and inference acceleration, respectively.
However, it is notorious that low-bit quantization degrades the accuracy of SR
models compared to their full-precision (FP) counterparts. Despite several
efforts to alleviate the degradation, the transformer-based SR model still
suffers severe degradation due to its distinctive activation distribution. In
this work, we present a dual-stage low-bit post-training quantization (PTQ)
method for image super-resolution, namely 2DQuant, which achieves efficient and
accurate SR under low-bit quantization. The proposed method first investigates
the weight and activation and finds that the distribution is characterized by
coexisting symmetry and asymmetry, long tails. Specifically, we propose
Distribution-Oriented Bound Initialization (DOBI), using different searching
strategies to search a coarse bound for quantizers. To obtain refined quantizer
parameters, we further propose Distillation Quantization Calibration (DQC),
which employs a distillation approach to make the quantized model learn from
its FP counterpart. Through extensive experiments on different bits and scaling
factors, the performance of DOBI can reach the state-of-the-art (SOTA) while
after stage two, our method surpasses existing PTQ in both metrics and visual
effects. 2DQuant gains an increase in PSNR as high as 4.52dB on Set5 (x2)
compared with SOTA when quantized to 2-bit and enjoys a 3.60x compression ratio
and 5.08x speedup ratio. The code and models will be available at
https://github.com/Kai-Liu001/2DQuant.Comment: 9 pages, 6 figures. The code and models will be available at
https://github.com/Kai-Liu001/2DQuan
OHQ: On-chip Hardware-aware Quantization
Quantization emerges as one of the most promising approaches for deploying
advanced deep models on resource-constrained hardware. Mixed-precision
quantization leverages multiple bit-width architectures to unleash the accuracy
and efficiency potential of quantized models. However, existing mixed-precision
quantization suffers exhaustive search space that causes immense computational
overhead. The quantization process thus relies on separate high-performance
devices rather than locally, which also leads to a significant gap between the
considered hardware metrics and the real deployment.In this paper, we propose
an On-chip Hardware-aware Quantization (OHQ) framework that performs
hardware-aware mixed-precision quantization without accessing online devices.
First, we construct the On-chip Quantization Awareness (OQA) pipeline, enabling
perceive the actual efficiency metrics of the quantization operator on the
hardware.Second, we propose Mask-guided Quantization Estimation (MQE) technique
to efficiently estimate the accuracy metrics of operators under the constraints
of on-chip-level computing power.By synthesizing network and hardware insights
through linear programming, we obtain optimized bit-width configurations.
Notably, the quantization process occurs on-chip entirely without any
additional computing devices and data access. We demonstrate accelerated
inference after quantization for various architectures and compression ratios,
achieving 70% and 73% accuracy for ResNet-18 and MobileNetV3, respectively. OHQ
improves latency by 15~30% compared to INT8 on deployment.Comment: 10 pages, 6 figure