24 research outputs found

    Workload-aware Automatic Parallelization for Multi-GPU DNN Training

    Full text link
    Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intelligence applications, but their very large and deep models impose high computational requirements during training. Multi-GPU parallelization is a popular option to accelerate demanding computations in DNN training, but most state-of-the-art multi-GPU deep learning frameworks not only require users to have an in-depth understanding of the implementation of the frameworks themselves, but also apply parallelization in a straight-forward way without optimizing GPU utilization. In this work, we propose a workload-aware auto-parallelization framework (WAP) for DNN training, where the work is automatically distributed to multiple GPUs based on the workload characteristics. We evaluate WAP using TensorFlow with popular DNN benchmarks (AlexNet and VGG-16), and show competitive training throughput compared with the state-of-the-art frameworks, and also demonstrate that WAP automatically optimizes GPU assignment based on the workload's compute requirements, thereby improving energy efficiency.Comment: This paper is accepted in ICASSP201

    Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

    Full text link
    The quantization of deep neural networks (QDNNs) has been actively studied for deployment in edge devices. Recent studies employ the knowledge distillation (KD) method to improve the performance of quantized networks. In this study, we propose stochastic precision ensemble training for QDNNs (SPEQ). SPEQ is a knowledge distillation training scheme; however, the teacher is formed by sharing the model parameters of the student network. We obtain the soft labels of the teacher by changing the bit precision of the activation stochastically at each layer of the forward-pass computation. The student model is trained with these soft labels to reduce the activation quantization noise. The cosine similarity loss is employed, instead of the KL-divergence, for KD training. As the teacher model changes continuously by random bit-precision assignment, it exploits the effect of stochastic ensemble KD. SPEQ outperforms the existing quantization training methods in various tasks, such as image classification, question-answering, and transfer learning without the need for cumbersome teacher networks

    DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion

    Full text link
    In autonomous driving, data augmentation is commonly used for improving 3D object detection. The most basic methods include insertion of copied objects and rotation and scaling of the entire training frame. Numerous variants have been developed as well. The existing methods, however, are considerably limited when compared to the variety of the real world possibilities. In this work, we develop a diversified and realistic augmentation method that can flexibly construct a whole-body object, freely locate and rotate the object, and apply self-occlusion and external-occlusion accordingly. To improve the diversity of the whole-body object construction, we develop an iterative method that stochastically combines multiple objects observed from the real world into a single object. Unlike the existing augmentation methods, the constructed objects can be randomly located and rotated in the training frame because proper occlusions can be reflected to the whole-body objects in the final step. Finally, proper self-occlusion at each local object level and external-occlusion at the global frame level are applied using the Hidden Point Removal (HPR) algorithm that is computationally efficient. HPR is also used for adaptively controlling the point density of each object according to the object's distance from the LiDAR. Experiment results show that the proposed DR.CPO algorithm is data-efficient and model-agnostic without incurring any computational overhead. Also, DR.CPO can improve mAP performance by 2.08% when compared to the best 3D detection result known for KITTI dataset. The code is available at https://github.com/SNU-DRL/DRCPO.gi

    Reduced radiation exposure to circulating blood cells in proton therapy compared with X-ray therapy in locally advanced lung cancer: Computational simulation based on circulating blood cells

    Get PDF
    BackgroundWe estimated the dose of circulating blood cells (CBCs) in patients with locally advanced non-small cell lung cancer for predicting severe radiation-induced lymphopenia (SRIL) and compared pencil-beam scanning proton therapy (PBSPT) and intensity-modulated (photon) radiotherapy (IMRT).Materials and methodsAfter reviewing 325 patients who received definitive chemoradiotherapy with PBSPT (n = 37) or IMRT (n = 164). SRIL was diagnosed when two or more events of an absolute lymphocyte count < 200 µL occurred during the treatment course. Dose information for the heart and lungs was utilized for the time-dependent computational dose calculation of CBCs.ResultsThe dose distribution of CBCs was significantly lesser in the PBSPT group than that in the IMRT group. Overall, 75 (37.3%) patients experienced SRIL during the treatment course; 72 and 3 patients were treated with IMRT and PBSPT, respectively. SRIL was associated with poor progression-free and overall survival outcomes. Upon incorporating the dose information of CBCs for predicting SRIL, CBC D90% > 2.6 GyE was associated with the development of SRIL with the baseline lymphocyte count and target volume. Furthermore, PBSPT significantly reduced the dose of CBC D90% (odds ratio = 0.11; p = 0.004) compared with IMRT.ConclusionThe results of this study demonstrate the significance of the dose distribution of CBCs in predicting SRIL. Furthermore, reducing the dose of CBCs after PBSPT minimized the risk of SRIL. Lymphocyte-sparing radiotherapy in PBSPT could improve outcomes, particularly in the setting of maintenance immunotherapy

    Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion

    No full text
    In autonomous driving, data augmentation is commonly used for improving 3D object detection. The most basic methods include insertion of copied objects and rotation and scaling of the entire training frame. Numerous variants have been developed as well. The existing methods, however, are considerably limited when compared to the variety of the real world possibilities. In this work, we develop a diversified and realistic augmentation method that can flexibly construct a whole-body object, freely locate and rotate the object, and apply self-occlusion and external-occlusion accordingly. To improve the diversity of the whole-body object construction, we develop an iterative method that stochastically combines multiple objects observed from the real world into a single object. Unlike the existing augmentation methods, the constructed objects can be randomly located and rotated in the training frame because proper occlusions can be reflected to the whole-body objects in the final step. Finally, proper self-occlusion at each local object level and external-occlusion at the global frame level are applied using the Hidden Point Removal (HPR) algorithm that is computationally efficient. HPR is also used for adaptively controlling the point density of each object according to the object's distance from the LiDAR. Experiment results show that the proposed DR.CPO algorithm is data-efficient and model-agnostic without incurring any computational overhead. Also, DR.CPO can improve mAP performance by 2.08% when compared to the best 3D detection result known for KITTI dataset

    A Simple Route of Printing Explosive Crystalized Micro-Patterns by Using Direct Ink Writing

    No full text
    The production of energetic crystalized micro-patterns by using one-step printing has become a recent trend in energetic materials engineering. We report a direct ink writing (DIW) approach in which micro-scale energetic composites composed of 1,3,5-trinitro-1,3,5-triazinane (RDX) crystals in selected ink formulations of a cellulose acetate butyrate (CAB) matrix are produced based on a direct phase transformation from organic, solvent-based, all-liquid ink. Using the formulated RDX ink and the DIW method, we printed crystalized RDX micro-patterns of various sizes and shapes on silicon wafers. The crystalized RDX micro-patterns contained single crystals on pristine Si wafers while the micro-patterns containing dendrite crystals were produced on UV-ozone (UVO)-treated Si wafers. The printing method and the formulated all-liquid ink make up a simple route for designing and printing energetic micro-patterns for micro-electromechanical systems

    Dependence of gold nanoparticle radiosensitization on cell geometry

    No full text
    Detailed modeling of cell geometries was shown to be important to estimate radiosensitization effects of gold nanoparticles (GNPs).</p
    corecore