Search CORE

15 research outputs found

Panoramic Vision Transformer for Saliency Detection in 360{\deg} Videos

Author: Kim Gunhee
Lee Sehun
Yun Heeseung
Publication venue
Publication date: 19/09/2022
Field of study

360

^\circ

video saliency detection is one of the challenging benchmarks for 360

^\circ

video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360

^\circ

videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature. We present a new framework named Panoramic Vision Transformer (PAVER). We design the encoder using Vision Transformer with deformable convolution, which enables us not only to plug pretrained models from normal videos into our architecture without additional modules or finetuning but also to perform geometric approximation only once, unlike previous deep CNN-based approaches. Thanks to its powerful encoder, PAVER can learn the saliency from three simple relative relations among local patch features, outperforming state-of-the-art models for the Wild360 benchmark by large margins without supervision or auxiliary information like class activation. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision, including head movement.Comment: Published to ECCV202

arXiv.org e-Print Archive

Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation

Author: Kim Gunhee
Na Joonil
Yun Heeseung
Publication venue
Publication date: 20/09/2023
Field of study

Sound can convey significant information for spatial reasoning in our daily lives. To endow deep networks with such ability, we address the challenge of dense indoor prediction with sound in both 2D and 3D via cross-modal knowledge distillation. In this work, we propose a Spatial Alignment via Matching (SAM) distillation framework that elicits local correspondence between the two modalities in vision-to-audio knowledge transfer. SAM integrates audio features with visually coherent learnable spatial embeddings to resolve inconsistencies in multiple layers of a student model. Our approach does not rely on a specific input representation, allowing for flexibility in the input shapes or dimensions without performance degradation. With a newly curated benchmark named Dense Auditory Prediction of Surroundings (DAPS), we are the first to tackle dense indoor prediction of omnidirectional surroundings in both 2D and 3D with audio observations. Specifically, for audio-based depth estimation, semantic segmentation, and challenging 3D scene reconstruction, the proposed distillation framework consistently achieves state-of-the-art performance across various metrics and backbone architectures.Comment: Published to ICCV202

arXiv.org e-Print Archive

A Mobile Robot Generating Video Summaries of Seniors' Indoor Activities

Author: Hsu Jane Yung-jen
Varadaraj Srenavis
Yang Chih-Yuan
Yun Heeseung
Publication venue
Publication date: 23/07/2019
Field of study

We develop a system which generates summaries from seniors' indoor-activity videos captured by a social robot to help remote family members know their seniors' daily activities at home. Unlike the traditional video summarization datasets, indoor videos captured from a moving robot poses additional challenges, namely, (i) the video sequences are very long (ii) a significant number of video-frames contain no-subject or with subjects at ill-posed locations and scales (iii) most of the well-posed frames contain highly redundant information. To address this problem, we propose to \hl{exploit} pose estimation \hl{for detecting} people in frames\hl{. This guides the robot} to follow the user and capture effective videos. We use person identification to distinguish a target senior from other people. We \hl{also make use of} action recognition to analyze seniors' major activities at different moments, and develop a video summarization method to select diverse and representative keyframes as summaries.Comment: accepted by MobileHCI'1

arXiv.org e-Print Archive

Crossref

Effects of clonidine on the activity of the rat glutamate transporter EAAT3 expressed in Xenopus oocytes

Author: Arenas-López
Arriza
Baik
Bakuridze
Bergles
Cheun
Danbolt
Do
Do
Do
Farooqi
Feron
Giorgi
Hee Jung Baik
Heeseung Lee
Hoffman
Jackson
Jae Hee Woo
Jong In Han
Kamibayashi
Kamisaki
Kim
Laudenbach
Lowenthal
Maragakis
Maze
O'Shea
Palmada
Papanicolaou
Papanicolaou
Sepkuty
Sumiya
Yokoyama
Yun
Publication venue: The Korean Society of Anesthesiologists
Publication date: 01/01/2012
Field of study

Crossref

PubMed Central

Pano-AVQA: Grounded Audio-Visual Question Answering on 360◦ Videos

Author: Kim Gun Hee
Lee Kangil
Yang Wonsuk
Yu Youngjae
Yun Heeseung
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

© 2021 IEEE360◦ videos convey holistic views for the surroundings of a scene. It provides audio-visual cues beyond predetermined normal field of views and displays distinctive spatial relations on a sphere. However, previous benchmark tasks for panoramic videos are still limited to evaluate the semantic understanding of audio-visual relationships or spherical spatial property in surroundings. We propose a novel benchmark named Pano-AVQA as a large-scale grounded audio-visual question answering dataset on panoramic videos. Using 5.4K 360◦ video clips harvested online, we collect two types of novel question-answer pairs with bounding-box grounding: spherical spatial relation QAs and audio-visual relation QAs. We train several transformer-based models from Pano-AVQA, where the results suggest that our proposed spherical spatial embeddings and multimodal training objectives fairly contribute to a better semantic understanding of the panoramic surroundings on the dataset.N

SNU Open Repository and Archive

Transitional adaptation of pretrained models for visual storytelling

Author: Chung Jiwan
Kim Gun Hee
Kim Jongseok
Yu Youngjae
Yun Heeseung
Publication venue: IEEE Computer Society
Publication date: 01/01/2021
Field of study

© 2021 IEEEPrevious models for vision-to-language generation tasks usually pretrain a visual encoder and a language generator in the respective domains and jointly finetune them with the target task. However, this direct transfer practice may suffer from the discord between visual specificity and language fluency since they are often separately trained from large corpora of visual and text data with no common ground. In this work, we claim that a transitional adaptation task is required between pretraining and finetuning to harmonize the visual encoder and the language model for challenging downstream target tasks like visual storytelling. We propose a novel approach named Transitional Adaptation of Pretrained Model (TAPM) that adapts the multi-modal modules to each other with a simpler alignment task between visual inputs only with no need for text labels. Through extensive experiments, we show that the adaptation step significantly improves the performance of multiple language models for sequential video and image captioning tasks. We achieve new state-of-the-art performance on both language metrics and human evaluation in the multi-sentence description task of LSMDC 2019 [50] and the image storytelling task of VIST [18]. Our experiments reveal that this improvement in caption quality does not depend on the specific choice of language models.N

SNU Open Repository and Archive

Trapped Gravitational Waves in Jackiw–Teitelboim Gravity

Author: Gibum Yun
Heeseung Zoe
Ido Ben-Dayan
Jeong-Myeong Bae
Marcelo Schiffer
Publication venue: 'MDPI AG'
Publication date: 07/02/2021
Field of study

We discuss the possibility that gravitational fluctuations (“gravitational-waves”) are trapped in space by gravitational interactions in two dimensional Jackiw–Teitelboim gravity. In the standard geon (gravitational electromagnetic entity) approach, the effective energy is entirely deposited in a thin layer, the active region, that achieves spatial self-confinement and raises doubts about the geon’s stability. In this paper we relinquish the “active region” approach and obtain self-confinement of “gravitational waves” that are trapped by the vacuum geometry and can be stable against the backreaction due to metric fluctuations

Multidisciplinary Digital Publishing Institute

AWMC: Abnormal-Weather Monitoring and Curation Service Based on Dynamic Graph Embedding

Author: David Camacho
Gen Li
Heeseung Yun
Jason J. Jung
Jiakai Gu
Sojung An
Yuxuan Gu
Publication venue: MDPI AG
Publication date: 01/10/2022
Field of study

This paper presents a system, namely, the abnormal-weather monitoring and curation service (AWMC), which provides people with a better understanding of abnormal weather conditions. The service can analyze a set of multivariate weather datasets (i.e., 7 meteorological datasets from 18 cities in Korea) and show (i) which dates are mostly abnormal in a certain city, and (ii) which cities are mostly abnormal on a certain date. In particular, the dynamic graph-embedding-based anomaly detection method was employed to measure anomaly scores. We implemented the service and conducted evaluations. Regarding the results of monitoring abnormal weather, AWMC shows that the average precision was approximately 90.9%, recall was 93.2%, and F1 score was 92.1% for all the cities

Directory of Open Access Journals