171 research outputs found

    DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion

    Full text link
    In autonomous driving, data augmentation is commonly used for improving 3D object detection. The most basic methods include insertion of copied objects and rotation and scaling of the entire training frame. Numerous variants have been developed as well. The existing methods, however, are considerably limited when compared to the variety of the real world possibilities. In this work, we develop a diversified and realistic augmentation method that can flexibly construct a whole-body object, freely locate and rotate the object, and apply self-occlusion and external-occlusion accordingly. To improve the diversity of the whole-body object construction, we develop an iterative method that stochastically combines multiple objects observed from the real world into a single object. Unlike the existing augmentation methods, the constructed objects can be randomly located and rotated in the training frame because proper occlusions can be reflected to the whole-body objects in the final step. Finally, proper self-occlusion at each local object level and external-occlusion at the global frame level are applied using the Hidden Point Removal (HPR) algorithm that is computationally efficient. HPR is also used for adaptively controlling the point density of each object according to the object's distance from the LiDAR. Experiment results show that the proposed DR.CPO algorithm is data-efficient and model-agnostic without incurring any computational overhead. Also, DR.CPO can improve mAP performance by 2.08% when compared to the best 3D detection result known for KITTI dataset. The code is available at https://github.com/SNU-DRL/DRCPO.gi

    Energy-Based Accounting and Scheduling of Virtual Machines in a Cloud System

    Get PDF
    Computer systems & NetworksVirtualization enables flexible resource provisioning and improves energy efficiency through consolidating virtualized servers into a smaller number of physical servers than that of the virtualized servers. Therefore, it is becoming an essential component for the emerging cloud computing model. Currently, virtualized environment including cloud computing systems bills users for the amount of their processor time, or the number of their virtual machine instances. However, accounting based only on the depreciation cost of server hardware is not an economically proper model because the cooling and energy cost for datacenters has already exceeded the cost to own servers. This paper suggests an estimation model to account energy consumption of each virtual machine without any dedicated measurement hardware. Our estimation model estimates the energy consumption of a virtual machine based on the in-processor events generated by the virtual machine. Based on the estimation model, this paper also proposes the virtual machine scheduling algorithm that is able to provide computing resources according to the energy budget of each virtual machine. The suggested schemes are implemented in the Xen virtualization system, and the evaluation shows the suggested schemes estimate and provide energy consumption with errors less than 5% of the total energy consumption.ope

    An Effective Path Selection Method in Multiple Care-of Addresses MIPv6 with Parallel Delay Measurement Technique

    Get PDF
    Abstract. In the Ubiquitous Society, there will be many types of mobile access network surrounding us and we can access the Internet anytime anywhere. At that time, mobile device can select several links from surrounded mobile access networks and access the Internet with multiple interfaces. We have already Mobile IPv6 protocol that supports mobility and try to extend to support multiple Care-of Addresses registration. But, we don't have any solution for selecting effective path. The effective path has many advantages such as reducing communication overhead. In this paper, we propose that effective path selection method in Multiple Care-of Addresses Mobile IPv6 environment with 'Parallel Delay Measurement' technique. With our technique, we can make down average packet delay

    Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

    Full text link
    Knowledge distillation (KD) has been a ubiquitous method for model compression to strengthen the capability of a lightweight model with the transferred knowledge from the teacher. In particular, KD has been employed in quantization-aware training (QAT) of Transformer encoders like BERT to improve the accuracy of the student model with the reduced-precision weight parameters. However, little is understood about which of the various KD approaches best fits the QAT of Transformers. In this work, we provide an in-depth analysis of the mechanism of KD on attention recovery of quantized large Transformers. In particular, we reveal that the previously adopted MSE loss on the attention score is insufficient for recovering the self-attention information. Therefore, we propose two KD methods; attention-map and attention-output losses. Furthermore, we explore the unification of both losses to address task-dependent preference between attention-map and output losses. The experimental results on various Transformer encoder models demonstrate that the proposed KD methods achieve state-of-the-art accuracy for QAT with sub-2-bit weight quantization.Comment: EMNLP 2022 Main Track Long Pape

    An Analytical Model-based Capacity Planning Approach for Building CSD-based Storage Systems

    Full text link
    The data movement in large-scale computing facilities (from compute nodes to data nodes) is categorized as one of the major contributors to high cost and energy utilization. To tackle it, in-storage processing (ISP) within storage devices, such as Solid-State Drives (SSDs), has been explored actively. The introduction of computational storage drives (CSDs) enabled ISP within the same form factor as regular SSDs and made it easy to replace SSDs within traditional compute nodes. With CSDs, host systems can offload various operations such as search, filter, and count. However, commercialized CSDs have different hardware resources and performance characteristics. Thus, it requires careful consideration of hardware, performance, and workload characteristics for building a CSD-based storage system within a compute node. Therefore, storage architects are hesitant to build a storage system based on CSDs as there are no tools to determine the benefits of CSD-based compute nodes to meet the performance requirements compared to traditional nodes based on SSDs. In this work, we proposed an analytical model-based storage capacity planner called CSDPlan for system architects to build performance-effective CSD-based compute nodes. Our model takes into account the performance characteristics of the host system, targeted workloads, and hardware and performance characteristics of CSDs to be deployed and provides optimal configuration based on the number of CSDs for a compute node. Furthermore, CSDPlan estimates and reduces the total cost of ownership (TCO) for building a CSD-based compute node. To evaluate the efficacy of CSDPlan, we selected two commercially available CSDs and 4 representative big data analysis workloads

    Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization

    Full text link
    Uniform-precision neural network quantization has gained popularity since it simplifies densely packed arithmetic unit for high computing capability. However, it ignores heterogeneous sensitivity to the impact of quantization errors across the layers, resulting in sub-optimal inference accuracy. This work proposes a novel neural architecture search called neural channel expansion that adjusts the network structure to alleviate accuracy degradation from ultra-low uniform-precision quantization. The proposed method selectively expands channels for the quantization sensitive layers while satisfying hardware constraints (e.g., FLOPs, PARAMs). Based on in-depth analysis and experiments, we demonstrate that the proposed method can adapt several popular networks channels to achieve superior 2-bit quantization accuracy on CIFAR10 and ImageNet. In particular, we achieve the best-to-date Top-1/Top-5 accuracy for 2-bit ResNet50 with smaller FLOPs and the parameter size.Comment: Accepted as a full paper by the TinyML Research Symposium 202

    Enhanced Electrochemical Performances of Hollow-Structured N-Doped Carbon Derived from a Zeolitic Imidazole Framework (ZIF-8) Coated by Polydopamine as an Anode for Lithium-Ion Batteries

    Get PDF
    Doping heteroatoms such as nitrogen (N) and boron (B) into the framework of carbon materials is one of the most efficient methods to improve the electrical performance of carbon-based electrodes. In this study, N-doped carbon has been facilely synthesized using a ZIF-8/polydopamine precursor. The polyhedral structure of ZIF-8 and the effective surface-coating capability of dopamine enabled the formation of N-doped carbon with a hollow structure. The ZIF-8 polyhedron served as a sacrificial template for hollow structures, and dopamine participated as a donor of the nitrogen element. When compared to ZIF-8-derived carbon, the HSNC electrode showed an improved reversible capacity of approximately 1398 mAhĀ·gāˆ’1 after 100 cycles, with excellent cycling retention at a voltage range of 0.01 to 3.0 V using a current density of 0.1 AĀ·gāˆ’1

    Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

    Full text link
    Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning. However, the large model size poses challenges for practical deployment. To solve this problem, Quantization-Aware Training (QAT) has become increasingly popular. However, current QAT methods for generative models have resulted in a noticeable loss of accuracy. To counteract this issue, we propose a novel knowledge distillation method specifically designed for GLMs. Our method, called token-scaled logit distillation, prevents overfitting and provides superior learning from the teacher model and ground truth. This research marks the first evaluation of ternary weight quantization-aware training of large-scale GLMs with less than 1.0 degradation in perplexity and no loss of accuracy in a reasoning task
    • ā€¦
    corecore