Search CORE

165 research outputs found

YouTube AV 50K: An Annotated Corpus for Comments in Autonomous Vehicles

Author: Choi Minsoo
Fu Kaiming
Gong Siyuan
Li Tao
Lin Lei
Wang Jian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/10/2018
Field of study

With one billion monthly viewers, and millions of users discussing and sharing opinions, comments below YouTube videos are rich sources of data for opinion mining and sentiment analysis. We introduce the YouTube AV 50K dataset, a freely-available collections of more than 50,000 YouTube comments and metadata below autonomous vehicle (AV)-related videos. We describe its creation process, its content and data format, and discuss its possible usages. Especially, we do a case study of the first self-driving car fatality to evaluate the dataset, and show how we can use this dataset to better understand public attitudes toward self-driving cars and public reactions to the accident. Future developments of the dataset are also discussed.Comment: in Proceedings of the Thirteenth International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP 2018

arXiv.org e-Print Archive

Crossref

Music Sequence Prediction with Mixture Hidden Markov Models

Author: Choi Minsoo
Fu Kaiming
Li Tao
Lin Lei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/09/2018
Field of study

Recommendation systems that automatically generate personalized music playlists for users have attracted tremendous attention in recent years. Nowadays, most music recommendation systems rely on item-based or user-based collaborative filtering or content-based approaches. In this paper, we propose a novel mixture hidden Markov model (HMM) for music play sequence prediction. We compare the mixture model with state-of-the-art methods and evaluate the predictions quantitatively and qualitatively on a large-scale real-world dataset in a Kaggle competition. Results show that our model significantly outperforms traditional methods as well as other competitors. We conclude by envisioning a next-generation music recommendation system that integrates our model with recent advances in deep learning, computer vision, and speech techniques, and has promising potential in both academia and industry.Comment: Accepted to the 4th International Conference on Artificial Intelligence and Applications (AI 2018

arXiv.org e-Print Archive

Crossref

Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

Author: Chang Du-Seong
Choi Jungwook
Hong Sukjin
Kim Minsoo
Lee Sihwa
Publication venue
Publication date: 20/11/2022
Field of study

Knowledge distillation (KD) has been a ubiquitous method for model compression to strengthen the capability of a lightweight model with the transferred knowledge from the teacher. In particular, KD has been employed in quantization-aware training (QAT) of Transformer encoders like BERT to improve the accuracy of the student model with the reduced-precision weight parameters. However, little is understood about which of the various KD approaches best fits the QAT of Transformers. In this work, we provide an in-depth analysis of the mechanism of KD on attention recovery of quantized large Transformers. In particular, we reveal that the previously adopted MSE loss on the attention score is insufficient for recovering the self-attention information. Therefore, we propose two KD methods; attention-map and attention-output losses. Furthermore, we explore the unification of both losses to address task-dependent preference between attention-map and output losses. The experimental results on various Transformer encoder models demonstrate that the proposed KD methods achieve state-of-the-art accuracy for QAT with sub-2-bit weight quantization.Comment: EMNLP 2022 Main Track Long Pape

arXiv.org e-Print Archive

Interventional treatment of different localization of inflammatory vasculitis: Takayasu's arteritis

Author: Ahn Tae-Hoon
Choi Dong-Hoon
Shim Won-Heum
Shin Eak-Kyun
Son Minsoo
Publication venue: Published by Elsevier Inc.
Publication date: 06/03/2002
Field of study

Elsevier - Publisher Connector

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Author: Chang Du-Seong
Choi Jungwook
Hong Sukjin
Kim Minsoo
Lee Janghwan
Lee Sihwa
Sung Wonyong
Publication venue
Publication date: 13/08/2023
Field of study

Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning. However, the large model size poses challenges for practical deployment. To solve this problem, Quantization-Aware Training (QAT) has become increasingly popular. However, current QAT methods for generative models have resulted in a noticeable loss of accuracy. To counteract this issue, we propose a novel knowledge distillation method specifically designed for GLMs. Our method, called token-scaled logit distillation, prevents overfitting and provides superior learning from the teacher model and ground truth. This research marks the first evaluation of ternary weight quantization-aware training of large-scale GLMs with less than 1.0 degradation in perplexity and no loss of accuracy in a reasoning task

arXiv.org e-Print Archive

PROBE3.0: A Systematic Framework for Design-Technology Pathfinding with Improved Design Enablement

Author: Choi Suhyeong
Jung Jinwook
Kahng Andrew B.
Kim Minsoo
Park Chul-Hong
Pramanik Bodhisatta
Yoon Dooseok
Publication venue
Publication date: 25/04/2023
Field of study

We propose a systematic framework to conduct design-technology pathfinding for PPAC in advanced nodes. Our goal is to provide configurable, scalable generation of process design kit (PDK) and standard-cell library, spanning key scaling boosters (backside PDN and buried power rail), to explore PPAC across given technology and design parameters. We build on PROBE2.0, which addressed only area and cost (AC), to include power and performance (PP) evaluations through automated generation of full design enablements. We also improve the use of artificial designs in the PPAC assessment of technology and design configurations. We generate more realistic artificial designs by applying a machine learning-based parameter tuning flow. We further employ clustering-based cell width-regularized placements at the core of routability assessment, enabling more realistic placement utilization and improved experimental efficiency. We demonstrate PPAC evaluation across scaling boosters and artificial designs in a predictive technology node.Comment: 14 pages, 17 figures, submitted to IEEE Trans. on CA

arXiv.org e-Print Archive

Strain sensitive flexible magnetoelectric ceramic nanocomposites

Author: Aktas Buse
Chen Xiang-Zhong
Choi Hongsoo
Kim Donghoon
Kim Minsoo
Nelson Bradley J.
Pané Salvador
Puigmartí-Luis Josep
Publication venue
Publication date: 18/10/2022
Field of study

Advanced flexible electronics and soft robotics require the development and implementation of flexible functional materials. Magnetoelectric (ME) oxide materials can convert magnetic input into electric output and vice versa, making them excellent candidates for advanced sensing, actuating, data storage, and communication. However, their application has been limited to rigid devices due to their brittle nature. Here, we report flexible ME oxide composite (BaTiO3/CoFe2O4) thin film nanostructures that can be transferred onto a stretchable substrate such as polydimethylsiloxane (PDMS). In contrast to rigid bulk counterparts, these ceramic nanostructures display a flexible behavior and exhibit reversibly tunable ME coupling via mechanical stretching. We believe our study can open up new avenues for integrating ceramic ME composites into flexible electronics and soft robotic devices

arXiv.org e-Print Archive

DGIST Library Institutional Repository

Automatic segmentation of cardiac structures for breast cancer radiotherapy

Author: Choi Minsoo
Jones Elizabeth C.
Jung Jae Won
Lee Choonik
Lee Choonsik
Mille Matthew M.
Mosher Elizabeth G.
Yeom Yeon Soo
Publication venue: 'Elsevier BV'
Publication date: 22/11/2019
Field of study

Background and purpose We developed an automatic method to segment cardiac substructures given a radiotherapy planning CT images to support epidemiological studies or clinical trials looking at cardiac disease endpoints after radiotherapy. Material and methods We used a most-similar atlas selection algorithm and 3D deformation combined with 30 detailed cardiac atlases. We cross-validated our method within the atlas library by evaluating geometric comparison metrics and by comparing cardiac doses for simulated breast radiotherapy between manual and automatic contours. We analyzed the impact of the number of cardiac atlas in the library and the use of manual guide points on the performance of our method. Results The Dice Similarity Coefficients from the cross-validation reached up to 97% (whole heart) and 80% (chambers). The Average Surface Distance for the coronary arteries was less than 10.3 mm on average, with the best agreement (7.3 mm) in the left anterior descending artery (LAD). The dose comparison for simulated breast radiotherapy showed differences less than 0.06 Gy for the whole heart and atria, and 0.3 Gy for the ventricles. For the coronary arteries, the dose differences were 2.3 Gy (LAD) and 0.3 Gy (other arteries). The sensitivity analysis showed no notable improvement beyond ten atlases and the manual guide points does not significantly improve performance. Conclusion We developed an automated method to contour cardiac substructures for radiotherapy CTs. When combined with accurate dose calculation techniques, our method should be useful for cardiac dose reconstruction of a large number of patients in epidemiological studies or clinical trials

ScholarShip