171 research outputs found
DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion
In autonomous driving, data augmentation is commonly used for improving 3D
object detection. The most basic methods include insertion of copied objects
and rotation and scaling of the entire training frame. Numerous variants have
been developed as well. The existing methods, however, are considerably limited
when compared to the variety of the real world possibilities. In this work, we
develop a diversified and realistic augmentation method that can flexibly
construct a whole-body object, freely locate and rotate the object, and apply
self-occlusion and external-occlusion accordingly. To improve the diversity of
the whole-body object construction, we develop an iterative method that
stochastically combines multiple objects observed from the real world into a
single object. Unlike the existing augmentation methods, the constructed
objects can be randomly located and rotated in the training frame because
proper occlusions can be reflected to the whole-body objects in the final step.
Finally, proper self-occlusion at each local object level and
external-occlusion at the global frame level are applied using the Hidden Point
Removal (HPR) algorithm that is computationally efficient. HPR is also used for
adaptively controlling the point density of each object according to the
object's distance from the LiDAR. Experiment results show that the proposed
DR.CPO algorithm is data-efficient and model-agnostic without incurring any
computational overhead. Also, DR.CPO can improve mAP performance by 2.08% when
compared to the best 3D detection result known for KITTI dataset. The code is
available at https://github.com/SNU-DRL/DRCPO.gi
Energy-Based Accounting and Scheduling of Virtual Machines in a Cloud System
Computer systems & NetworksVirtualization enables flexible resource provisioning and improves energy efficiency through consolidating virtualized servers into a smaller number of physical servers than that of the virtualized servers. Therefore, it is becoming an essential component for the emerging cloud computing model. Currently, virtualized environment including cloud computing systems bills users for the amount of their processor time, or the number of their virtual machine instances. However, accounting based only on the depreciation cost of server hardware is not an economically proper model because the cooling and energy cost for datacenters has already exceeded the cost to own servers. This paper suggests an estimation model to account energy consumption of each virtual machine without any dedicated measurement hardware. Our estimation model estimates the energy consumption of a virtual machine based on the in-processor events generated by the virtual machine. Based on the estimation model, this paper also proposes the virtual machine scheduling algorithm that is able to provide computing resources according to the energy budget of each virtual machine. The suggested schemes are implemented in the Xen virtualization system, and the evaluation shows the suggested schemes estimate and provide energy consumption with errors less than 5% of the total energy consumption.ope
An Effective Path Selection Method in Multiple Care-of Addresses MIPv6 with Parallel Delay Measurement Technique
Abstract. In the Ubiquitous Society, there will be many types of mobile access network surrounding us and we can access the Internet anytime anywhere. At that time, mobile device can select several links from surrounded mobile access networks and access the Internet with multiple interfaces. We have already Mobile IPv6 protocol that supports mobility and try to extend to support multiple Care-of Addresses registration. But, we don't have any solution for selecting effective path. The effective path has many advantages such as reducing communication overhead. In this paper, we propose that effective path selection method in Multiple Care-of Addresses Mobile IPv6 environment with 'Parallel Delay Measurement' technique. With our technique, we can make down average packet delay
Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Knowledge distillation (KD) has been a ubiquitous method for model
compression to strengthen the capability of a lightweight model with the
transferred knowledge from the teacher. In particular, KD has been employed in
quantization-aware training (QAT) of Transformer encoders like BERT to improve
the accuracy of the student model with the reduced-precision weight parameters.
However, little is understood about which of the various KD approaches best
fits the QAT of Transformers. In this work, we provide an in-depth analysis of
the mechanism of KD on attention recovery of quantized large Transformers. In
particular, we reveal that the previously adopted MSE loss on the attention
score is insufficient for recovering the self-attention information. Therefore,
we propose two KD methods; attention-map and attention-output losses.
Furthermore, we explore the unification of both losses to address
task-dependent preference between attention-map and output losses. The
experimental results on various Transformer encoder models demonstrate that the
proposed KD methods achieve state-of-the-art accuracy for QAT with sub-2-bit
weight quantization.Comment: EMNLP 2022 Main Track Long Pape
An Analytical Model-based Capacity Planning Approach for Building CSD-based Storage Systems
The data movement in large-scale computing facilities (from compute nodes to
data nodes) is categorized as one of the major contributors to high cost and
energy utilization. To tackle it, in-storage processing (ISP) within storage
devices, such as Solid-State Drives (SSDs), has been explored actively. The
introduction of computational storage drives (CSDs) enabled ISP within the same
form factor as regular SSDs and made it easy to replace SSDs within traditional
compute nodes. With CSDs, host systems can offload various operations such as
search, filter, and count. However, commercialized CSDs have different hardware
resources and performance characteristics. Thus, it requires careful
consideration of hardware, performance, and workload characteristics for
building a CSD-based storage system within a compute node. Therefore, storage
architects are hesitant to build a storage system based on CSDs as there are no
tools to determine the benefits of CSD-based compute nodes to meet the
performance requirements compared to traditional nodes based on SSDs. In this
work, we proposed an analytical model-based storage capacity planner called
CSDPlan for system architects to build performance-effective CSD-based compute
nodes. Our model takes into account the performance characteristics of the host
system, targeted workloads, and hardware and performance characteristics of
CSDs to be deployed and provides optimal configuration based on the number of
CSDs for a compute node. Furthermore, CSDPlan estimates and reduces the total
cost of ownership (TCO) for building a CSD-based compute node. To evaluate the
efficacy of CSDPlan, we selected two commercially available CSDs and 4
representative big data analysis workloads
Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization
Uniform-precision neural network quantization has gained popularity since it
simplifies densely packed arithmetic unit for high computing capability.
However, it ignores heterogeneous sensitivity to the impact of quantization
errors across the layers, resulting in sub-optimal inference accuracy. This
work proposes a novel neural architecture search called neural channel
expansion that adjusts the network structure to alleviate accuracy degradation
from ultra-low uniform-precision quantization. The proposed method selectively
expands channels for the quantization sensitive layers while satisfying
hardware constraints (e.g., FLOPs, PARAMs). Based on in-depth analysis and
experiments, we demonstrate that the proposed method can adapt several popular
networks channels to achieve superior 2-bit quantization accuracy on CIFAR10
and ImageNet. In particular, we achieve the best-to-date Top-1/Top-5 accuracy
for 2-bit ResNet50 with smaller FLOPs and the parameter size.Comment: Accepted as a full paper by the TinyML Research Symposium 202
Enhanced Electrochemical Performances of Hollow-Structured N-Doped Carbon Derived from a Zeolitic Imidazole Framework (ZIF-8) Coated by Polydopamine as an Anode for Lithium-Ion Batteries
Doping heteroatoms such as nitrogen (N) and boron (B) into the framework of carbon materials is one of the most efficient methods to improve the electrical performance of carbon-based electrodes. In this study, N-doped carbon has been facilely synthesized using a ZIF-8/polydopamine precursor. The polyhedral structure of ZIF-8 and the effective surface-coating capability of dopamine enabled the formation of N-doped carbon with a hollow structure. The ZIF-8 polyhedron served as a sacrificial template for hollow structures, and dopamine participated as a donor of the nitrogen element. When compared to ZIF-8-derived carbon, the HSNC electrode showed an improved reversible capacity of approximately 1398 mAhĀ·gā1 after 100 cycles, with excellent cycling retention at a voltage range of 0.01 to 3.0 V using a current density of 0.1 AĀ·gā1
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Generative Language Models (GLMs) have shown impressive performance in tasks
such as text generation, understanding, and reasoning. However, the large model
size poses challenges for practical deployment. To solve this problem,
Quantization-Aware Training (QAT) has become increasingly popular. However,
current QAT methods for generative models have resulted in a noticeable loss of
accuracy. To counteract this issue, we propose a novel knowledge distillation
method specifically designed for GLMs. Our method, called token-scaled logit
distillation, prevents overfitting and provides superior learning from the
teacher model and ground truth. This research marks the first evaluation of
ternary weight quantization-aware training of large-scale GLMs with less than
1.0 degradation in perplexity and no loss of accuracy in a reasoning task
- ā¦