5,435 research outputs found
Oscillation-free Quantization for Low-bit Vision Transformers
Weight oscillation is an undesirable side effect of quantization-aware
training, in which quantized weights frequently jump between two quantized
levels, resulting in training instability and a sub-optimal final model. We
discover that the learnable scaling factor, a widely-used
setting in quantization aggravates weight oscillation. In this study, we
investigate the connection between the learnable scaling factor and quantized
weight oscillation and use ViT as a case driver to illustrate the findings and
remedies. In addition, we also found that the interdependence between quantized
weights in and of a self-attention layer makes
ViT vulnerable to oscillation. We, therefore, propose three techniques
accordingly: statistical weight quantization () to improve
quantization robustness compared to the prevalent learnable-scale-based method;
confidence-guided annealing () that freezes the weights with
and calms the oscillating weights; and
- reparameterization () to resolve the
query-key intertwined oscillation and mitigate the resulting gradient
misestimation. Extensive experiments demonstrate that these proposed techniques
successfully abate weight oscillation and consistently achieve substantial
accuracy improvement on ImageNet. Specifically, our 2-bit DeiT-T/DeiT-S
algorithms outperform the previous state-of-the-art by 9.8% and 7.7%,
respectively. Code and models are available at: https://github.com/nbasyl/OFQ.Comment: Proceedings of the 40 th International Conference on Machine
Learning, Honolulu, Hawaii, USA. PMLR 202, 202
Efficient Quantization-aware Training with Adaptive Coreset Selection
The expanding model size and computation of deep neural networks (DNNs) have
increased the demand for efficient model deployment methods. Quantization-aware
training (QAT) is a representative model compression method to leverage
redundancy in weights and activations. However, most existing QAT methods
require end-to-end training on the entire dataset, which suffers from long
training time and high energy costs. Coreset selection, aiming to improve data
efficiency utilizing the redundancy of training data, has also been widely used
for efficient training. In this work, we propose a new angle through the
coreset selection to improve the training efficiency of quantization-aware
training. Based on the characteristics of QAT, we propose two metrics: error
vector score and disagreement score, to quantify the importance of each sample
during training. Guided by these two metrics of importance, we proposed a
quantization-aware adaptive coreset selection (ACS) method to select the data
for the current training epoch. We evaluate our method on various networks
(ResNet-18, MobileNetV2), datasets(CIFAR-100, ImageNet-1K), and under different
quantization settings. Compared with previous coreset selection methods, our
method significantly improves QAT performance with different dataset fractions.
Our method can achieve an accuracy of 68.39% of 4-bit quantized ResNet-18 on
the ImageNet-1K dataset with only a 10% subset, which has an absolute gain of
4.24% compared to the baseline.Comment: Code: https://github.com/HuangOwen/QAT-AC
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
We propose LLM-FP4 for quantizing both weights and activations in large
language models (LLMs) down to 4-bit floating-point values, in a post-training
manner. Existing post-training quantization (PTQ) solutions are primarily
integer-based and struggle with bit widths below 8 bits. Compared to integer
quantization, floating-point (FP) quantization is more flexible and can better
handle long-tail or bell-shaped distributions, and it has emerged as a default
choice in many hardware platforms. One characteristic of FP quantization is
that its performance largely depends on the choice of exponent bits and
clipping range. In this regard, we construct a strong FP-PTQ baseline by
searching for the optimal quantization parameters. Furthermore, we observe a
high inter-channel variance and low intra-channel variance pattern in
activation distributions, which adds activation quantization difficulty. We
recognize this pattern to be consistent across a spectrum of transformer models
designed for diverse tasks, such as LLMs, BERT, and Vision Transformer models.
To tackle this, we propose per-channel activation quantization and show that
these additional scaling factors can be reparameterized as exponential biases
of weights, incurring a negligible cost. Our method, for the first time, can
quantize both weights and activations in the LLaMA-13B to only 4-bit and
achieves an average score of 63.1 on the common sense zero-shot reasoning
tasks, which is only 5.8 lower than the full-precision model, significantly
outperforming the previous state-of-the-art by 12.7 points. Code is available
at: https://github.com/nbasyl/LLM-FP4.Comment: EMNLP 2023 Main Conferenc
TCN AA: A Wi Fi based Temporal Convolution Network for Human to Human Interaction Recognition with Augmentation and Attention
The utilization of Wi-Fi-based human activity recognition (HAR) has gained
considerable interest in recent times, primarily owing to its applications in
various domains such as healthcare for monitoring breath and heart rate,
security, elderly care, and others. These Wi-Fi-based methods exhibit several
advantages over conventional state-of-the-art techniques that rely on cameras
and sensors, including lower costs and ease of deployment. However, a
significant challenge associated with Wi-Fi-based HAR is the significant
decline in performance when the scene or subject changes. To mitigate this
issue, it is imperative to train the model using an extensive dataset. In
recent studies, the utilization of CNN-based models or sequence-to-sequence
models such as LSTM, GRU, or Transformer has become prevalent. While
sequence-to-sequence models can be more precise, they are also more
computationally intensive and require a larger amount of training data. To
tackle these limitations, we propose a novel approach that leverages a temporal
convolution network with augmentations and attention, referred to as TCN-AA.
Our proposed method is computationally efficient and exhibits improved accuracy
even when the data size is increased threefold through our augmentation
techniques. Our experiments on a publicly available dataset indicate that our
approach outperforms existing state-of-the-art methods, with a final accuracy
of 99.42%.Comment: Published to IEEE Internet of things Journal but haven't been
accepted yet (under review
Intelligent Information Dissemination Scheme for Urban Vehicular Ad Hoc Networks
In vehicular ad hoc networks (VANETs), a hotspot, such as a parking lot, is an information source and will receive inquiries from many vehicles for seeking any possible free parking space. According to the routing protocols in literature, each of the vehicles needs to flood its route discovery (RD) packets to discover a route to the hotspot before sending inquiring packets to the parking lot. As a result, the VANET nearby an urban area or city center may incur the problem of broadcast storm due to so many flooding RD packets during rush hours. To avoid the broadcast storm problem, this paper presents a hotspot-enabled routing-tree based data forwarding method, called the intelligent information dissemination scheme (IID). Our method can let the hotspot automatically decide when to build the routing-tree for proactive information transmissions under the condition that the number of vehicle routing discoveries during a given period exceeds a certain threshold which is calculated through our developed analytical packet delivery model. The routing information will be dynamically maintained by vehicles located at each intersection near the hotspot if the maintaining cost is less than that of allowing vehicles to discover routes themselves. Simulation results show that this method can minimize routing delays for vehicles with lower packets delivery overheads
Density Peaks Clustering Approach for Discovering Demand Hot Spots in City-scale Taxi Fleet Dataset
Abstract—In this paper, we introduce a variant of the density peaks clustering (DPC) approach for discovering demand hot spots from a low-frequency, low-quality taxi fleet operational dataset. From the literature, the DPC approach mainly uses density peaks as features to discover potential cluster centers, and this requires distances between all pairs of data points to be calculated. This implies that the DPC approach can only be applied to cases with relatively small numbers of data points. For the domain of urban taxi operations that we are interested in, we could have millions of demand points per day, and calculating all-pair distances between all demand points would be practically impossible, thus making DPC approach not applicable. To address this issue, we project all points to a density image and execute our variant of the DPC algorithm on the processed image. Experiment results show that our proposed DPC variant could get similar results as original DPC, yet with much shorter execution time and lower memory consumption. By running our DPC variant on a real-world dataset collected in Singapore, we show that there are indeed recurrent demand hot spots within the central business district that are not covered by the current taxi stand design. Our approach could be of use to both taxi fleet operator and traffic planners in guiding drivers and setting up taxi stands. I
A protein interaction based model for schizophrenia study
<p>Abstract</p> <p>Background</p> <p>Schizophrenia is a complex disease with multiple factors contributing to its pathogenesis. In addition to environmental factors, genetic factors may also increase susceptibility. In other words, schizophrenia is a highly heritable disease. Some candidate genes have been deduced on the basis of their known function with others found on the basis of chromosomal location. Individuals with multiple candidate genes may have increased risk. However it is not clear what kind of gene combinations may produce the disease phenotype. Their collective effect remains to be studied.</p> <p>Results</p> <p>Most pathways except metabolic pathways are rich in protein-protein interactions (PPIs). Thus, the PPI network contains pathway information, even though the upstream-downstream relation of PPI is yet to be explored. Here we have constructed a PPI sub-network by extracting the nearest neighbour of the 36 reported candidate genes described in the literature. Although these candidate genes were discovered by different approaches, most of the proteins formed a cluster. Two major protein interaction modules were identified on the basis of the pairwise distance among the proteins in this sub-network. The large and small clusters might play roles in synaptic transmission and signal transduction, respectively, based on gene ontology annotation. The protein interactions in the synaptic transmission cluster were used to explain the interaction between the NRG1 and CACNG2 genes, which was found by both linkage and association studies. This working hypothesis is supported by the co-expression analysis based on public microarray gene expression.</p> <p>Conclusion</p> <p>On the basis of the protein interaction network, it appears that the NRG1-triggered NMDAR protein internalization and the CACNG2 mediated AMPA receptor recruiting may act together in the glutamatergic signalling process. Since both the NMDA and AMPA receptors are calcium channels, this process may regulate the influx of Ca<sup>2+</sup>. Reducing the cation influx might be one of the disease mechanisms for schizophrenia. This PPI network analysis approach combined with the support from co-expression analysis may provide an efficient way to propose pathogenetic mechanisms for various highly heritable diseases.</p
- …