5,435 research outputs found

    Oscillation-free Quantization for Low-bit Vision Transformers

    Full text link
    Weight oscillation is an undesirable side effect of quantization-aware training, in which quantized weights frequently jump between two quantized levels, resulting in training instability and a sub-optimal final model. We discover that the learnable scaling factor, a widely-used de facto\textit{de facto} setting in quantization aggravates weight oscillation. In this study, we investigate the connection between the learnable scaling factor and quantized weight oscillation and use ViT as a case driver to illustrate the findings and remedies. In addition, we also found that the interdependence between quantized weights in query\textit{query} and key\textit{key} of a self-attention layer makes ViT vulnerable to oscillation. We, therefore, propose three techniques accordingly: statistical weight quantization (StatsQ\rm StatsQ) to improve quantization robustness compared to the prevalent learnable-scale-based method; confidence-guided annealing (CGA\rm CGA) that freezes the weights with high confidence\textit{high confidence} and calms the oscillating weights; and query\textit{query}-key\textit{key} reparameterization (QKR\rm QKR) to resolve the query-key intertwined oscillation and mitigate the resulting gradient misestimation. Extensive experiments demonstrate that these proposed techniques successfully abate weight oscillation and consistently achieve substantial accuracy improvement on ImageNet. Specifically, our 2-bit DeiT-T/DeiT-S algorithms outperform the previous state-of-the-art by 9.8% and 7.7%, respectively. Code and models are available at: https://github.com/nbasyl/OFQ.Comment: Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 202

    Efficient Quantization-aware Training with Adaptive Coreset Selection

    Full text link
    The expanding model size and computation of deep neural networks (DNNs) have increased the demand for efficient model deployment methods. Quantization-aware training (QAT) is a representative model compression method to leverage redundancy in weights and activations. However, most existing QAT methods require end-to-end training on the entire dataset, which suffers from long training time and high energy costs. Coreset selection, aiming to improve data efficiency utilizing the redundancy of training data, has also been widely used for efficient training. In this work, we propose a new angle through the coreset selection to improve the training efficiency of quantization-aware training. Based on the characteristics of QAT, we propose two metrics: error vector score and disagreement score, to quantify the importance of each sample during training. Guided by these two metrics of importance, we proposed a quantization-aware adaptive coreset selection (ACS) method to select the data for the current training epoch. We evaluate our method on various networks (ResNet-18, MobileNetV2), datasets(CIFAR-100, ImageNet-1K), and under different quantization settings. Compared with previous coreset selection methods, our method significantly improves QAT performance with different dataset fractions. Our method can achieve an accuracy of 68.39% of 4-bit quantized ResNet-18 on the ImageNet-1K dataset with only a 10% subset, which has an absolute gain of 4.24% compared to the baseline.Comment: Code: https://github.com/HuangOwen/QAT-AC

    LLM-FP4: 4-Bit Floating-Point Quantized Transformers

    Full text link
    We propose LLM-FP4 for quantizing both weights and activations in large language models (LLMs) down to 4-bit floating-point values, in a post-training manner. Existing post-training quantization (PTQ) solutions are primarily integer-based and struggle with bit widths below 8 bits. Compared to integer quantization, floating-point (FP) quantization is more flexible and can better handle long-tail or bell-shaped distributions, and it has emerged as a default choice in many hardware platforms. One characteristic of FP quantization is that its performance largely depends on the choice of exponent bits and clipping range. In this regard, we construct a strong FP-PTQ baseline by searching for the optimal quantization parameters. Furthermore, we observe a high inter-channel variance and low intra-channel variance pattern in activation distributions, which adds activation quantization difficulty. We recognize this pattern to be consistent across a spectrum of transformer models designed for diverse tasks, such as LLMs, BERT, and Vision Transformer models. To tackle this, we propose per-channel activation quantization and show that these additional scaling factors can be reparameterized as exponential biases of weights, incurring a negligible cost. Our method, for the first time, can quantize both weights and activations in the LLaMA-13B to only 4-bit and achieves an average score of 63.1 on the common sense zero-shot reasoning tasks, which is only 5.8 lower than the full-precision model, significantly outperforming the previous state-of-the-art by 12.7 points. Code is available at: https://github.com/nbasyl/LLM-FP4.Comment: EMNLP 2023 Main Conferenc

    TCN AA: A Wi Fi based Temporal Convolution Network for Human to Human Interaction Recognition with Augmentation and Attention

    Full text link
    The utilization of Wi-Fi-based human activity recognition (HAR) has gained considerable interest in recent times, primarily owing to its applications in various domains such as healthcare for monitoring breath and heart rate, security, elderly care, and others. These Wi-Fi-based methods exhibit several advantages over conventional state-of-the-art techniques that rely on cameras and sensors, including lower costs and ease of deployment. However, a significant challenge associated with Wi-Fi-based HAR is the significant decline in performance when the scene or subject changes. To mitigate this issue, it is imperative to train the model using an extensive dataset. In recent studies, the utilization of CNN-based models or sequence-to-sequence models such as LSTM, GRU, or Transformer has become prevalent. While sequence-to-sequence models can be more precise, they are also more computationally intensive and require a larger amount of training data. To tackle these limitations, we propose a novel approach that leverages a temporal convolution network with augmentations and attention, referred to as TCN-AA. Our proposed method is computationally efficient and exhibits improved accuracy even when the data size is increased threefold through our augmentation techniques. Our experiments on a publicly available dataset indicate that our approach outperforms existing state-of-the-art methods, with a final accuracy of 99.42%.Comment: Published to IEEE Internet of things Journal but haven't been accepted yet (under review

    Intelligent Information Dissemination Scheme for Urban Vehicular Ad Hoc Networks

    Get PDF
    In vehicular ad hoc networks (VANETs), a hotspot, such as a parking lot, is an information source and will receive inquiries from many vehicles for seeking any possible free parking space. According to the routing protocols in literature, each of the vehicles needs to flood its route discovery (RD) packets to discover a route to the hotspot before sending inquiring packets to the parking lot. As a result, the VANET nearby an urban area or city center may incur the problem of broadcast storm due to so many flooding RD packets during rush hours. To avoid the broadcast storm problem, this paper presents a hotspot-enabled routing-tree based data forwarding method, called the intelligent information dissemination scheme (IID). Our method can let the hotspot automatically decide when to build the routing-tree for proactive information transmissions under the condition that the number of vehicle routing discoveries during a given period exceeds a certain threshold which is calculated through our developed analytical packet delivery model. The routing information will be dynamically maintained by vehicles located at each intersection near the hotspot if the maintaining cost is less than that of allowing vehicles to discover routes themselves. Simulation results show that this method can minimize routing delays for vehicles with lower packets delivery overheads

    Density Peaks Clustering Approach for Discovering Demand Hot Spots in City-scale Taxi Fleet Dataset

    Get PDF
    Abstract—In this paper, we introduce a variant of the density peaks clustering (DPC) approach for discovering demand hot spots from a low-frequency, low-quality taxi fleet operational dataset. From the literature, the DPC approach mainly uses density peaks as features to discover potential cluster centers, and this requires distances between all pairs of data points to be calculated. This implies that the DPC approach can only be applied to cases with relatively small numbers of data points. For the domain of urban taxi operations that we are interested in, we could have millions of demand points per day, and calculating all-pair distances between all demand points would be practically impossible, thus making DPC approach not applicable. To address this issue, we project all points to a density image and execute our variant of the DPC algorithm on the processed image. Experiment results show that our proposed DPC variant could get similar results as original DPC, yet with much shorter execution time and lower memory consumption. By running our DPC variant on a real-world dataset collected in Singapore, we show that there are indeed recurrent demand hot spots within the central business district that are not covered by the current taxi stand design. Our approach could be of use to both taxi fleet operator and traffic planners in guiding drivers and setting up taxi stands. I

    A protein interaction based model for schizophrenia study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Schizophrenia is a complex disease with multiple factors contributing to its pathogenesis. In addition to environmental factors, genetic factors may also increase susceptibility. In other words, schizophrenia is a highly heritable disease. Some candidate genes have been deduced on the basis of their known function with others found on the basis of chromosomal location. Individuals with multiple candidate genes may have increased risk. However it is not clear what kind of gene combinations may produce the disease phenotype. Their collective effect remains to be studied.</p> <p>Results</p> <p>Most pathways except metabolic pathways are rich in protein-protein interactions (PPIs). Thus, the PPI network contains pathway information, even though the upstream-downstream relation of PPI is yet to be explored. Here we have constructed a PPI sub-network by extracting the nearest neighbour of the 36 reported candidate genes described in the literature. Although these candidate genes were discovered by different approaches, most of the proteins formed a cluster. Two major protein interaction modules were identified on the basis of the pairwise distance among the proteins in this sub-network. The large and small clusters might play roles in synaptic transmission and signal transduction, respectively, based on gene ontology annotation. The protein interactions in the synaptic transmission cluster were used to explain the interaction between the NRG1 and CACNG2 genes, which was found by both linkage and association studies. This working hypothesis is supported by the co-expression analysis based on public microarray gene expression.</p> <p>Conclusion</p> <p>On the basis of the protein interaction network, it appears that the NRG1-triggered NMDAR protein internalization and the CACNG2 mediated AMPA receptor recruiting may act together in the glutamatergic signalling process. Since both the NMDA and AMPA receptors are calcium channels, this process may regulate the influx of Ca<sup>2+</sup>. Reducing the cation influx might be one of the disease mechanisms for schizophrenia. This PPI network analysis approach combined with the support from co-expression analysis may provide an efficient way to propose pathogenetic mechanisms for various highly heritable diseases.</p
    corecore