159 research outputs found
Data Transmission with Reduced Delay for Distributed Acoustic Sensors
This paper proposes a channel access control scheme fit to dense acoustic
sensor nodes in a sensor network. In the considered scenario, multiple acoustic
sensor nodes within communication range of a cluster head are grouped into
clusters. Acoustic sensor nodes in a cluster detect acoustic signals and
convert them into electric signals (packets). Detection by acoustic sensors can
be executed periodically or randomly and random detection by acoustic sensors
is event driven. As a result, each acoustic sensor generates their packets
(50bytes each) periodically or randomly over short time intervals
(400ms~4seconds) and transmits directly to a cluster head (coordinator node).
Our approach proposes to use a slotted carrier sense multiple access. All
acoustic sensor nodes in a cluster are allocated to time slots and the number
of allocated sensor nodes to each time slot is uniform. All sensor nodes
allocated to a time slot listen for packet transmission from the beginning of
the time slot for a duration proportional to their priority. The first node
that detect the channel to be free for its whole window is allowed to transmit.
The order of packet transmissions with the acoustic sensor nodes in the time
slot is autonomously adjusted according to the history of packet transmissions
in the time slot. In simulations, performances of the proposed scheme are
demonstrated by the comparisons with other low rate wireless channel access
schemes.Comment: Accepted to IJDSN, final preprinted versio
FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization
Post-training quantization (PTQ) has been gaining popularity for the
deployment of deep neural networks on resource-limited devices since unlike
quantization-aware training, neither a full training dataset nor end-to-end
training is required at all. As PTQ schemes based on reconstructing each layer
or block output turn out to be effective to enhance quantized model
performance, recent works have developed algorithms to devise and learn a new
weight-rounding scheme so as to better reconstruct each layer or block output.
In this work, we propose a simple yet effective new weight-rounding mechanism
for PTQ, coined FlexRound, based on element-wise division instead of typical
element-wise addition such that FlexRound enables jointly learning a common
quantization grid size as well as a different scale for each pre-trained
weight. Thanks to the reciprocal rule of derivatives induced by element-wise
division, FlexRound is inherently able to exploit pre-trained weights when
updating their corresponding scales, and thus, flexibly quantize pre-trained
weights depending on their magnitudes. We empirically validate the efficacy of
FlexRound on a wide range of models and tasks. To the best of our knowledge,
our work is the first to carry out comprehensive experiments on not only image
classification and natural language understanding but also natural language
generation, assuming a per-tensor uniform PTQ setting. Moreover, we
demonstrate, for the first time, that large language models can be efficiently
quantized, with only a negligible impact on performance compared to
half-precision baselines, achieved by reconstructing the output in a
block-by-block manner.Comment: Accepted to ICML 202
Sensor Fusion by Spatial Encoding for Autonomous Driving
Sensor fusion is critical to perception systems for task domains such as
autonomous driving and robotics. Recently, the Transformer integrated with CNN
has demonstrated high performance in sensor fusion for various perception
tasks. In this work, we introduce a method for fusing data from camera and
LiDAR. By employing Transformer modules at multiple resolutions, proposed
method effectively combines local and global contextual relationships. The
performance of the proposed method is validated by extensive experiments with
two adversarial benchmarks with lengthy routes and high-density traffics. The
proposed method outperforms previous approaches with the most challenging
benchmarks, achieving significantly higher driving and infraction scores.
Compared with TransFuser, it achieves 8% and 19% improvement in driving scores
for the Longest6 and Town05 Long benchmarks, respectively.Comment: This paper has been accepted for Lecture presentation at the 2023
IEEE SENSORS conferenc
Risk factors related to the recurrence of endometrioma in patients with long-term postoperative medical therapy
Objectives: The purpose of this study was to identify clinical risk factors for the recurrence of ovarian endometrioma after ovarian cystectomy in Korean women with long-term postoperative medical therapy.Material and Methods: A total of 134 patients who were surgically treated for endometriotic cysts at Pusan National University Hospital were included in this retrospective study. All patients received long-term postoperative medical treatment for at least 12 months after the first-line conservative surgery. Several epidemiologic variables were analyzed as possible risk factors for recurrence. Endometrioma recurrence was considered when a cystic mass was observed on transvaginal or transrectal sonography. Statistical analysis was performed using independent t-tests for parametric continuous variables.Results: The mean follow-up period for the 134 patients was 56.5 ± 14.3 months (range, 36–120 months) and the mean duration of the medical therapy was 17.9 ± 17.3 months (range, 12–120 months). The overall recurrence rate was 35/134 (26.12%). Our univariate analysis showed statistically significant differences between the recurrent and non-recurrent groups in terms of weight (P = 0.013), body mass index (P = 0.007), age at the time of surgery (P = 0.013), the diameter of the largest cyst (P = 0.001), the presence of dysmenorrhea (P < 0.0001), and postoperative pregnancy (P = 0.016). Multivariate analysis showed that body mass index (OR 1.153, 95% CI 1.003–1.326, P = 0.046), age at the time of surgery (OR 0.924, 95% CI 0.860–0.992, P = 0.029), and presence of dysmenorrhea (OR 12.226, 95% CI 3.543–42.188, P < 0.0001) were significantly correlated with the recurrence of endometrioma.Conclusions: We found that patients with dysmenorrhea after surgery, and a younger age of the patient at the time of surgery were the highest risk factors associated with the recurrence of endometrioma, despite long-term postoperative medication
nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
The recent advance of self-supervised learning associated with the
Transformer architecture enables natural language processing (NLP) to exhibit
extremely low perplexity. Such powerful models demand ever-increasing model
size and, thus, large amounts of computations and memory footprints. In this
paper, we propose an efficient inference framework for large-scale generative
language models. As the key to reducing model size, we quantize weights by a
non-uniform quantization method. Then, quantized matrix multiplications are
accelerated by our proposed kernel, called nuQmm, which allows a wide trade-off
between compression ratio and accuracy. Our proposed nuQmm reduces the latency
of not only each GPU but also the entire inference of large LMs because a high
compression ratio (by low-bit quantization) mitigates the minimum required
number of GPUs. Assuming 2-bit quantization, we demonstrate that nuQmm can
reduce latency to generate each token for OPT-175B (that requires 8 GPUs
without nuQmm) by 47.3% using 8 GPUs or by 23.2% using only 2 GPUs.Comment: 15 pages (including 5 pages of References & Appendix), 14 figures, 7
table
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Transformer is a deep learning language model widely used for natural
language processing (NLP) services in datacenters. Among transformer models,
Generative Pre-trained Transformer (GPT) has achieved remarkable performance in
text generation, or natural language generation (NLG), which needs the
processing of a large input context in the summarization stage, followed by the
generation stage that produces a single word at a time. The conventional
platforms such as GPU are specialized for the parallel processing of large
inputs in the summarization stage, but their performance significantly degrades
in the generation stage due to its sequential characteristic. Therefore, an
efficient hardware platform is required to address the high latency caused by
the sequential characteristic of text generation.
In this paper, we present DFX, a multi-FPGA acceleration appliance that
executes GPT-2 model inference end-to-end with low latency and high throughput
in both summarization and generation stages. DFX uses model parallelism and
optimized dataflow that is model-and-hardware-aware for fast simultaneous
workload execution among devices. Its compute cores operate on custom
instructions and provide GPT-2 operations end-to-end. We implement the proposed
hardware architecture on four Xilinx Alveo U280 FPGAs and utilize all of the
channels of the high bandwidth memory (HBM) and the maximum number of compute
resources for high hardware efficiency. DFX achieves 5.58x speedup and 3.99x
energy efficiency over four NVIDIA V100 GPUs on the modern GPT-2 model. DFX is
also 8.21x more cost-effective than the GPU appliance, suggesting that it is a
promising solution for text generation workloads in cloud datacenters.Comment: Extension of HOTCHIPS 2022 and accepted in MICRO 202
- …