159 research outputs found

    Data Transmission with Reduced Delay for Distributed Acoustic Sensors

    Full text link
    This paper proposes a channel access control scheme fit to dense acoustic sensor nodes in a sensor network. In the considered scenario, multiple acoustic sensor nodes within communication range of a cluster head are grouped into clusters. Acoustic sensor nodes in a cluster detect acoustic signals and convert them into electric signals (packets). Detection by acoustic sensors can be executed periodically or randomly and random detection by acoustic sensors is event driven. As a result, each acoustic sensor generates their packets (50bytes each) periodically or randomly over short time intervals (400ms~4seconds) and transmits directly to a cluster head (coordinator node). Our approach proposes to use a slotted carrier sense multiple access. All acoustic sensor nodes in a cluster are allocated to time slots and the number of allocated sensor nodes to each time slot is uniform. All sensor nodes allocated to a time slot listen for packet transmission from the beginning of the time slot for a duration proportional to their priority. The first node that detect the channel to be free for its whole window is allowed to transmit. The order of packet transmissions with the acoustic sensor nodes in the time slot is autonomously adjusted according to the history of packet transmissions in the time slot. In simulations, performances of the proposed scheme are demonstrated by the comparisons with other low rate wireless channel access schemes.Comment: Accepted to IJDSN, final preprinted versio

    FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization

    Full text link
    Post-training quantization (PTQ) has been gaining popularity for the deployment of deep neural networks on resource-limited devices since unlike quantization-aware training, neither a full training dataset nor end-to-end training is required at all. As PTQ schemes based on reconstructing each layer or block output turn out to be effective to enhance quantized model performance, recent works have developed algorithms to devise and learn a new weight-rounding scheme so as to better reconstruct each layer or block output. In this work, we propose a simple yet effective new weight-rounding mechanism for PTQ, coined FlexRound, based on element-wise division instead of typical element-wise addition such that FlexRound enables jointly learning a common quantization grid size as well as a different scale for each pre-trained weight. Thanks to the reciprocal rule of derivatives induced by element-wise division, FlexRound is inherently able to exploit pre-trained weights when updating their corresponding scales, and thus, flexibly quantize pre-trained weights depending on their magnitudes. We empirically validate the efficacy of FlexRound on a wide range of models and tasks. To the best of our knowledge, our work is the first to carry out comprehensive experiments on not only image classification and natural language understanding but also natural language generation, assuming a per-tensor uniform PTQ setting. Moreover, we demonstrate, for the first time, that large language models can be efficiently quantized, with only a negligible impact on performance compared to half-precision baselines, achieved by reconstructing the output in a block-by-block manner.Comment: Accepted to ICML 202

    Sensor Fusion by Spatial Encoding for Autonomous Driving

    Full text link
    Sensor fusion is critical to perception systems for task domains such as autonomous driving and robotics. Recently, the Transformer integrated with CNN has demonstrated high performance in sensor fusion for various perception tasks. In this work, we introduce a method for fusing data from camera and LiDAR. By employing Transformer modules at multiple resolutions, proposed method effectively combines local and global contextual relationships. The performance of the proposed method is validated by extensive experiments with two adversarial benchmarks with lengthy routes and high-density traffics. The proposed method outperforms previous approaches with the most challenging benchmarks, achieving significantly higher driving and infraction scores. Compared with TransFuser, it achieves 8% and 19% improvement in driving scores for the Longest6 and Town05 Long benchmarks, respectively.Comment: This paper has been accepted for Lecture presentation at the 2023 IEEE SENSORS conferenc

    Risk factors related to the recurrence of endometrioma in patients with long-term postoperative medical therapy

    Get PDF
    Objectives: The purpose of this study was to identify clinical risk factors for the recurrence of ovarian endometrioma after ovarian cystectomy in Korean women with long-term postoperative medical therapy.Material and Methods: A total of 134 patients who were surgically treated for endometriotic cysts at Pusan National University Hospital were included in this retrospective study. All patients received long-term postoperative medical treatment for at least 12 months after the first-line conservative surgery. Several epidemiologic variables were analyzed as possible risk factors for recurrence. Endometrioma recurrence was considered when a cystic mass was observed on transvaginal or transrectal sonography. Statistical analysis was performed using independent t-tests for parametric continuous variables.Results: The mean follow-up period for the 134 patients was 56.5 ± 14.3 months (range, 36–120 months) and the mean duration of the medical therapy was 17.9 ± 17.3 months (range, 12–120 months). The overall recurrence rate was 35/134 (26.12%). Our univariate analysis showed statistically significant differences between the recurrent and non-recurrent groups in terms of weight (P = 0.013), body mass index (P = 0.007), age at the time of surgery (P = 0.013), the diameter of the largest cyst (P = 0.001), the presence of dysmenorrhea (P < 0.0001), and postoperative pregnancy (P = 0.016). Multivariate analysis showed that body mass index (OR 1.153, 95% CI 1.003–1.326, P = 0.046), age at the time of surgery (OR 0.924, 95% CI 0.860–0.992, P = 0.029), and presence of dysmenorrhea (OR 12.226, 95% CI 3.543–42.188, P < 0.0001) were significantly correlated with the recurrence of endometrioma.Conclusions: We found that patients with dysmenorrhea after surgery, and a younger age of the patient at the time of surgery were the highest risk factors associated with the recurrence of endometrioma, despite long-term postoperative medication

    nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models

    Full text link
    The recent advance of self-supervised learning associated with the Transformer architecture enables natural language processing (NLP) to exhibit extremely low perplexity. Such powerful models demand ever-increasing model size and, thus, large amounts of computations and memory footprints. In this paper, we propose an efficient inference framework for large-scale generative language models. As the key to reducing model size, we quantize weights by a non-uniform quantization method. Then, quantized matrix multiplications are accelerated by our proposed kernel, called nuQmm, which allows a wide trade-off between compression ratio and accuracy. Our proposed nuQmm reduces the latency of not only each GPU but also the entire inference of large LMs because a high compression ratio (by low-bit quantization) mitigates the minimum required number of GPUs. Assuming 2-bit quantization, we demonstrate that nuQmm can reduce latency to generate each token for OPT-175B (that requires 8 GPUs without nuQmm) by 47.3% using 8 GPUs or by 23.2% using only 2 GPUs.Comment: 15 pages (including 5 pages of References & Appendix), 14 figures, 7 table

    DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation

    Full text link
    Transformer is a deep learning language model widely used for natural language processing (NLP) services in datacenters. Among transformer models, Generative Pre-trained Transformer (GPT) has achieved remarkable performance in text generation, or natural language generation (NLG), which needs the processing of a large input context in the summarization stage, followed by the generation stage that produces a single word at a time. The conventional platforms such as GPU are specialized for the parallel processing of large inputs in the summarization stage, but their performance significantly degrades in the generation stage due to its sequential characteristic. Therefore, an efficient hardware platform is required to address the high latency caused by the sequential characteristic of text generation. In this paper, we present DFX, a multi-FPGA acceleration appliance that executes GPT-2 model inference end-to-end with low latency and high throughput in both summarization and generation stages. DFX uses model parallelism and optimized dataflow that is model-and-hardware-aware for fast simultaneous workload execution among devices. Its compute cores operate on custom instructions and provide GPT-2 operations end-to-end. We implement the proposed hardware architecture on four Xilinx Alveo U280 FPGAs and utilize all of the channels of the high bandwidth memory (HBM) and the maximum number of compute resources for high hardware efficiency. DFX achieves 5.58x speedup and 3.99x energy efficiency over four NVIDIA V100 GPUs on the modern GPT-2 model. DFX is also 8.21x more cost-effective than the GPU appliance, suggesting that it is a promising solution for text generation workloads in cloud datacenters.Comment: Extension of HOTCHIPS 2022 and accepted in MICRO 202
    • …
    corecore