Search CORE

55 research outputs found

Efficient Video Representation Learning via Masked Video Modeling with Motion-centric Token Selection

Author: Hwang Sung Ju
Hwang Sunil
Lee Youngwan
Yoon Jaehong
Publication venue
Publication date: 19/11/2022
Field of study

Self-supervised Video Representation Learning (VRL) aims to learn transferrable representations from uncurated, unlabeled video streams that could be utilized for diverse downstream tasks. With recent advances in Masked Image Modeling (MIM), in which the model learns to predict randomly masked regions in the images given only the visible patches, MIM-based VRL methods have emerged and demonstrated their potential by significantly outperforming previous VRL methods. However, they require an excessive amount of computations due to the added temporal dimension. This is because existing MIM-based VRL methods overlook spatial and temporal inequality of information density among the patches in arriving videos by resorting to random masking strategies, thereby wasting computations on predicting uninformative tokens/frames. To tackle these limitations of Masked Video Modeling, we propose a new token selection method that masks our more important tokens according to the object's motions in an online manner, which we refer to as Motion-centric Token Selection. Further, we present a dynamic frame selection strategy that allows the model to focus on informative and causal frames with minimal redundancy. We validate our method over multiple benchmark and Ego4D datasets, showing that the pre-trained model using our proposed method significantly outperforms state-of-the-art VRL methods on downstream tasks, such as action recognition and object state change classification while largely reducing memory requirements during pre-training and fine-tuning.Comment: 15 page

arXiv.org e-Print Archive

Dynamic and Super-Personalized Media Ecosystem Driven by Generative AI: Unpredictable Plays Never Repeating The Same

Author: Ahn Sungjun
Lee Youngwan
Park Sung-Ik
Yim Hyun-Jeong
Publication venue
Publication date: 18/02/2024
Field of study

This paper introduces a media service model that exploits artificial intelligence (AI) video generators at the receive end. This proposal deviates from the traditional multimedia ecosystem, completely relying on in-house production, by shifting part of the content creation onto the receiver. We bring a semantic process into the framework, allowing the distribution network to provide service elements that prompt the content generator, rather than distributing encoded data of fully finished programs. The service elements include fine-tailored text descriptions, lightweight image data of some objects, or application programming interfaces, comprehensively referred to as semantic sources, and the user terminal translates the received semantic data into video frames. Empowered by the random nature of generative AI, the users could then experience super-personalized services accordingly. The proposed idea incorporates the situations in which the user receives different service providers' element packages; a sequence of packages over time, or multiple packages at the same time. Given promised in-context coherence and content integrity, the combinatory dynamics will amplify the service diversity, allowing the users to always chance upon new experiences. This work particularly aims at short-form videos and advertisements, which the users would easily feel fatigued by seeing the same frame sequence every time. In those use cases, the content provider's role will be recast as scripting semantic sources, transformed from a thorough producer. Overall, this work explores a new form of media ecosystem facilitated by receiver-embedded generative models, featuring both random content dynamics and enhanced delivery efficiency simultaneously.Comment: 13 pages, 7 figure

arXiv.org e-Print Archive

Localization Uncertainty Estimation for Anchor-Free Object Detection

Author: Hwang Joong-won
Kim Hyung-Il
Kwon Yongjin
Lee Youngwan
Yun Kimin
Publication venue
Publication date: 29/11/2020
Field of study

Since many safety-critical systems, such as surgical robots and autonomous driving cars, are in unstable environments with sensor noise and incomplete data, it is desirable for object detectors to take into account the confidence of localization prediction. There are three limitations of the prior uncertainty estimation methods for anchor-based object detection. 1) They model the uncertainty based on object properties having different characteristics, such as location (center point) and scale (width, height). 2) they model a box offset and ground-truth as Gaussian distribution and Dirac delta distribution, which leads to the model misspecification problem. Because the Dirac delta distribution is not exactly represented as Gaussian, i.e., for any

\mu

and

\Sigma

. 3) Since anchor-based methods are sensitive to hyper-parameters of anchor, the localization uncertainty modeling is also sensitive to these parameters. Therefore, we propose a new localization uncertainty estimation method called Gaussian-FCOS for anchor-free object detection. Our method captures the uncertainty based on four directions of box offsets~(left, right, top, bottom) that have similar properties, which enables to capture which direction is uncertain and provide a quantitative value in range~[0, 1]. To this end, we design a new uncertainty loss, negative power log-likelihood loss, to measure uncertainty by weighting IoU to the likelihood loss, which alleviates the model misspecification problem. Experiments on COCO datasets demonstrate that our Gaussian-FCOS reduces false positives and finds more missing-objects by mitigating over-confidence scores with the estimated uncertainty. We hope Gaussian-FCOS serves as a crucial component for the reliability-required task

arXiv.org e-Print Archive

OF@TEIN: An OpenFlow-enabled SDN Testbed over International SmartX Rack Sites

Author: An Hyeong Geun
Cha ByungRae
Cho Ilkwon
Hong JiHoon
Jang DongSeok
Jang Youngwan
Kang Sun-Moo
Kim Byungchul
Kim Hyong-Soon
Kim Jongryool
Kim JongWon
Kim Namgon Lucas
Ko TaeWan
Lee Jaeyong
Min Seokhong
Noh Gyeongsoo
Park Hongsik
Song Wang-Cheol
Publication venue: 'Proceedings of the Asia-Pacific Advanced Network'
Publication date: 23/12/2013
Field of study

In this paper, we will discuss our on-going effort for OF@TEIN SDN(Software-Defined Networking) testbed, which currently spans over Korea and fiveSouth-East Asian (SEA) collaborators with internationally deployed OpenFlowenabledSmartX Racks

Proceedings of the Asia-Pacific Advanced Network

Selective Growth and Contact Gap-Fill of Low Resistivity Si via Microwave Plasma-Enhanced CVD

Author: Myoungwoo Lee
Youn-Jea Kim
Youngwan Kim
Publication venue: 'MDPI AG'
Publication date: 12/10/2019
Field of study

Low resistivity polycrystalline Si could be selectively grown in the deep (~200 nm) and narrow patterns (~20 nm) of 20 nm pitch design rule DRAM (Dynamic Random Access Memory) by microwave plasma-enhanced chemical vapor deposition (MW-CVD). We were able to achieve the high phosphorus (CVD gap-fill in a large electrical contact area which does is affected by line pitch size) doping concentration (>2.5 × 1021 cm−3) and, thus, a low resistivity by adjusting source gas (SiH4, H2, PH3) decomposition through MW-CVD with a showerhead controlling the decomposition of source gases by using two different gas injection paths. In this study, a selective growth mechanism was applied by using the deposition/etch cyclic process to achieve the bottom–up process in the L-shaped contact, using H2 plasma that simultaneously promoted the deposition and the etch processes. Additionally, the cyclic selective growth technique was set up by controlling the SiH4 flow rate. The bottom-up process resulted in a uniform doping distribution, as well as an excellent filling capacity without seam and center void formation. Thus, low contact resistivity and higher transistor on-current could be achieved at a high and uniform phosphorus (P)-concentration. Compared to the conventional thermal, this method is expected to be a strong candidate for the complicated deep and narrow contact process

Multidisciplinary Digital Publishing Institute