Search CORE

56 research outputs found

Audio-Visual Glance Network for Efficient Video Recognition

Author: Kim Changick
Lee Sumin
Nugroho Muhammad Adi
Woo Sangmin
Publication venue
Publication date: 18/08/2023
Field of study

Deep learning has made significant strides in video understanding tasks, but the computation required to classify lengthy and massive videos using clip-level video classifiers remains impractical and prohibitively expensive. To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temporally important parts of a video. AVGN firstly divides the video into snippets of image-audio clip pair and employs lightweight unimodal encoders to extract global visual features and audio features. To identify the important temporal segments, we use an Audio-Visual Temporal Saliency Transformer (AV-TeST) that estimates the saliency scores of each frame. To further increase efficiency in the spatial dimension, AVGN processes only the important patches instead of the whole images. We use an Audio-Enhanced Spatial Patch Attention (AESPA) module to produce a set of enhanced coarse visual features, which are fed to a policy network that produces the coordinates of the important patches. This approach enables us to focus only on the most important spatio-temporally parts of the video, leading to more efficient video recognition. Moreover, we incorporate various training techniques and multi-modal feature fusion to enhance the robustness and effectiveness of our AVGN. By combining these strategies, our AVGN sets new state-of-the-art performance in multiple video recognition benchmarks while achieving faster processing speed.Comment: ICCV 202

arXiv.org e-Print Archive

Towards Good Practices for Missing Modality Robust Action Recognition

Author: Kim Changick
Lee Sumin
Nugroho Muhammad Adi
Park Yeonju
Woo Sangmin
Publication venue
Publication date: 25/11/2022
Field of study

Standard multi-modal models assume the use of the same modalities in training and inference stages. However, in practice, the environment in which multi-modal models operate may not satisfy such assumption. As such, their performances degrade drastically if any modality is missing in the inference stage. We ask: how can we train a model that is robust to missing modalities? This paper seeks a set of good practices for multi-modal action recognition, with a particular interest in circumstances where some modalities are not available at an inference time. First, we study how to effectively regularize the model during training (e.g., data augmentation). Second, we investigate on fusion methods for robustness to missing modalities: we find that transformer-based fusion shows better robustness for missing modality than summation or concatenation. Third, we propose a simple modular network, ActionMAE, which learns missing modality predictive coding by randomly dropping modality features and tries to reconstruct them with the remaining modality features. Coupling these good practices, we build a model that is not only effective in multi-modal action recognition but also robust to modality missing. Our model achieves the state-of-the-arts on multiple benchmarks and maintains competitive performances even in missing modality scenarios. Codes are available at https://github.com/sangminwoo/ActionMAE.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance

Author: Bae Sangmin
Jung Ho-Young
Kim June-Woo
Toikkanen Miika
Yoon Chihyeon
Publication venue
Publication date: 11/11/2023
Field of study

Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential data like respiratory sounds is less explored. In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder. We also demonstrate a simple yet effective adversarial fine-tuning method to align features between the synthetic and real respiratory sound samples to improve respiratory sound classification performance. Our experimental results on the ICBHI dataset demonstrate that the proposed adversarial fine-tuning is effective, while only using the conventional augmentation method shows performance degradation. Moreover, our method outperforms the baseline by 2.24% on the ICBHI Score and improves the accuracy of the minority classes up to 26.58%. For the supplementary material, we provide the code at https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound.Comment: accepted in NeurIPS 2023 Workshop on Deep Generative Models for Health (DGM4H

arXiv.org e-Print Archive

Sketch-based Video Object Localization

Author: Jeon So-Yeong
Kim Changick
Lee Sumin
Park Jinyoung
Son Minji
Woo Sangmin
Publication venue
Publication date: 18/08/2023
Field of study

We introduce Sketch-based Video Object Localization (SVOL), a new task aimed at localizing spatio-temporal object boxes in video queried by the input sketch. We first outline the challenges in the SVOL task and build the Sketch-Video Attention Network (SVANet) with the following design principles: (i) to consider temporal information of video and bridge the domain gap between sketch and video; (ii) to accurately identify and localize multiple objects simultaneously; (iii) to handle various styles of sketches; (iv) to be classification-free. In particular, SVANet is equipped with a Cross-modal Transformer that models the interaction between learnable object tokens, query sketch, and video through attention operations, and learns upon a per-frame set matching strategy that enables frame-wise prediction while utilizing global video context. We evaluate SVANet on a newly curated SVOL dataset. By design, SVANet successfully learns the mapping between the query sketches and video objects, achieving state-of-the-art results on the SVOL benchmark. We further confirm the effectiveness of SVANet via extensive ablation studies and visualizations. Lastly, we demonstrate its transfer capability on unseen datasets and novel categories, suggesting its high scalability in real-world application

arXiv.org e-Print Archive

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification

Author: Bae Sangmin
Baek Hyerim
Cho Won-Yang
Ha Changwan
Kim June-Woo
Kim Sungnyun
Lee Byungjo
Son Soyoun
Tae Kyongpil
Yun Se-Young
Publication venue
Publication date: 27/05/2023
Field of study

Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end, cutting-edge deep learning models have been developed to diagnose lung diseases; however, it is still challenging due to the scarcity of medical data. In this study, we demonstrate that the pretrained model on large-scale visual and audio datasets can be generalized to the respiratory sound classification task. In addition, we introduce a straightforward Patch-Mix augmentation, which randomly mixes patches between different samples, with Audio Spectrogram Transformer (AST). We further propose a novel and effective Patch-Mix Contrastive Learning to distinguish the mixed representations in the latent space. Our method achieves state-of-the-art performance on the ICBHI dataset, outperforming the prior leading score by an improvement of 4.08%.Comment: INTERSPEECH 2023, Code URL: https://github.com/raymin0223/patch-mix_contrastive_learnin

arXiv.org e-Print Archive

Recommended from our members

Cell Labeling and Tracking Method without Distorted Signals by Phagocytosis of Macrophages

Author: Cho Yong Woo
Choi Kuiwon
Jeong Seo Young
Kang Sun-Woong
Kim Kwangmeyung
Kim Sun Hwa
Koo Heebeom
Kwon Ick Chan
Lee Dong-Eun
Lee Sangmin
Na Jin Hee
Yoon Hwa In
Publication venue: 'Ivyspring International Publisher'
Publication date: 11/03/2014
Field of study

Cell labeling and tracking are important processes in understanding biologic mechanisms and the therapeutic effect of inoculated cells in vivo. Numerous attempts have been made to label and track inoculated cells in vivo; however, these methods have limitations as a result of their biological effects, including secondary phagocytosis of macrophages and genetic modification. Here, we investigated a new cell labeling and tracking strategy based on metabolic glycoengineering and bioorthogonal click chemistry. We first treated cells with tetra-acetylated N-azidoacetyl-D-mannosamine to generate unnatural sialic acids with azide groups on the surface of the target cells. The azide-labeled cells were then transplanted to mouse liver, and dibenzyl cyclooctyne-conjugated Cy5 (DBCO-Cy5) was intravenously injected into mice to chemically bind with the azide groups on the surface of the target cells in vivo for target cell visualization. Unnatural sialic acids with azide groups could be artificially induced on the surface of target cells by glycoengineering. We then tracked the azide groups on the surface of the cells by DBCO-Cy5 in vivo using bioorthogonal click chemistry. Importantly, labeling efficacy was enhanced and false signals by phagocytosis of macrophages were reduced. This strategy will be highly useful for cell labeling and tracking

Harvard University - DASH

SALM5 trans-synaptically interacts with LAR-RPTPs in a splicing-dependent manner to regulate synapse development

Author: Cho Kwangwook
Choi Yeonsoo
Jeon Sangmin
Kim Doyoun
Kim Eunjoon
Kim Ho Min
Ko Jaewon
Kwon Seok-Kyu
Lee Sang-Gyu
Li Yan
Ma Won
Nam Jungyong
Sung Song Yoo
Um Ji Won
Whitcomb Daniel
Woo Jooyeon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/05/2016
Field of study

Synaptogenic adhesion molecules play critical roles in synapse formation. SALM5/Lrfn5, a SALM/Lrfn family adhesion molecule implicated in autism spectrum disorders (ASDs) and schizophrenia, induces presynaptic differentiation in contacting axons, but its presynaptic ligand remains unknown. We found that SALM5 interacts with the Ig domains of LAR family receptor protein tyrosine phosphatases (LAR-RPTPs; LAR, PTPδ, and PTPσ). These interactions are strongly inhibited by the splice insert B in the Ig domain region of LAR-RPTPs, and mediate SALM5-dependent presynaptic differentiation in contacting axons. In addition, SALM5 regulates AMPA receptor-mediated synaptic transmission through mechanisms involving the interaction of postsynaptic SALM5 with presynaptic LAR-RPTPs. These results suggest that postsynaptic SALM5 promotes synapse development by trans-synaptically interacting with presynaptic LAR-RPTPs and is important for the regulation of excitatory synaptic strength

Crossref

PubMed Central

Explore Bristol Research

Protective Effects of Gabapentin on Allodynia and α2δ1-Subunit of Voltage-dependent Calcium Channel in Spinal Nerve-Ligated Rats

Author: Back
Bolay
Bromley
Chang-Dae Bae
Chaplan
Cheng
Coderre
Dahl
Dixon
Dooley
Field
Freiman
Gee
Gurnett
Han-Seop Kim
Hara
Hunter
Hyun Joo Ahn
Hyun Sung Cho
Isom
Jie Ae Kim
Kaneko
Kim
LaBuda
Li
Luo
Luo
Maneuf
Marais
Mi Sook Gwak
Mixcoatl-Zecuatl
Nemeroff
Ng
Rorarius
Sangmin M. Lee
Seung Woon Lim
Soo Joo Choi
Tae Soo Hahm
Visser
Wallin
Wilson
Woo Seog Sim
Xiao
Xie
Zimmermann
Publication venue: The Korean Academy of Medical Sciences
Publication date: 01/01/2009
Field of study

This study was designed to determine whether early gabapentin treatment has a protective analgesic effect on neuropathic pain and compared its effect to the late treatment in a rat neuropathic model, and as the potential mechanism of protective action, the α2δ1-subunit of the voltage-dependent calcium channel (α2δ1-subunit) was evaluated in both sides of the L5 dorsal root ganglia (DRG). Neuropathic pain was induced in male Sprague-Dawley rats by a surgical ligation of left L5 nerve. For the early treatment group, rats were injected with gabapentin (100 mg/kg) intraperitoneally 15 min prior to surgery and then every 24 hr during postoperative day (POD) 1-4. For the late treatment group, the same dose of gabapentin was injected every 24 hr during POD 8-12. For the control group, L5 nerve was ligated but no gabapentin was administered. In the early treatment group, the development of allodynia was delayed up to POD 10, whereas allodynia was developed on POD 2 in the control and the late treatment group (p<0.05). The α2δ1-subunit was up-regulated in all groups, however, there was no difference in the level of the α2δ1-subunit among the three groups. These results suggest that early treatment with gabapentin offers some protection against neuropathic pain but it is unlikely that this action is mediated through modulation of the α2δ1-subunit in DRG

Crossref

PubMed Central