15 research outputs found
SimOn: A Simple Framework for Online Temporal Action Localization
Online Temporal Action Localization (On-TAL) aims to immediately provide
action instances from untrimmed streaming videos. The model is not allowed to
utilize future frames and any processing techniques to modify past predictions,
making On-TAL much more challenging. In this paper, we propose a simple yet
effective framework, termed SimOn, that learns to predict action instances
using the popular Transformer architecture in an end-to-end manner.
Specifically, the model takes the current frame feature as a query and a set of
past context information as keys and values of the Transformer. Different from
the prior work that uses a set of outputs of the model as past contexts, we
leverage the past visual context and the learnable context embedding for the
current query. Experimental results on the THUMOS14 and ActivityNet1.3 datasets
show that our model remarkably outperforms the previous methods, achieving a
new state-of-the-art On-TAL performance. In addition, the evaluation for Online
Detection of Action Start (ODAS) demonstrates the effectiveness and robustness
of our method in the online setting. The code is available at
https://github.com/TuanTNG/SimO
Knowing Where to Focus: Event-aware Transformer for Video Grounding
Recent DETR-based video grounding models have made the model directly predict
moment timestamps without any hand-crafted components, such as a pre-defined
proposal or non-maximum suppression, by learning moment queries. However, their
input-agnostic moment queries inevitably overlook an intrinsic temporal
structure of a video, providing limited positional information. In this paper,
we formulate an event-aware dynamic moment query to enable the model to take
the input-specific content and positional information of the video into
account. To this end, we present two levels of reasoning: 1) Event reasoning
that captures distinctive event units constituting a given video using a slot
attention mechanism; and 2) moment reasoning that fuses the moment queries with
a given sentence through a gated fusion transformer layer and learns
interactions between the moment queries and video-sentence representations to
predict moment timestamps. Extensive experiments demonstrate the effectiveness
and efficiency of the event-aware dynamic moment queries, outperforming
state-of-the-art approaches on several video grounding benchmarks.Comment: ICCV 2023. Code is available at https://github.com/jinhyunj/EaT
Cross-Domain Grouping and Alignment for Domain Adaptive Semantic Segmentation
Existing techniques to adapt semantic segmentation networks across the source
and target domains within deep convolutional neural networks (CNNs) deal with
all the samples from the two domains in a global or category-aware manner. They
do not consider an inter-class variation within the target domain itself or
estimated category, providing the limitation to encode the domains having a
multi-modal data distribution. To overcome this limitation, we introduce a
learnable clustering module, and a novel domain adaptation framework called
cross-domain grouping and alignment. To cluster the samples across domains with
an aim to maximize the domain alignment without forgetting precise segmentation
ability on the source domain, we present two loss functions, in particular, for
encouraging semantic consistency and orthogonality among the clusters. We also
present a loss so as to solve a class imbalance problem, which is the other
limitation of the previous methods. Our experiments show that our method
consistently boosts the adaptation performance in semantic segmentation,
outperforming the state-of-the-arts on various domain adaptation settings.Comment: AAAI 202
Cervical collar makes difficult airway: a simulation study using the LEMON criteria
Objective Endotracheal intubation is extremely difficult to perform in patients wearing a cervical collar for a head and neck injury. Therefore, we analyzed actual measurements using the look externally, evaluate 3-3-2, Mallampati score, obstruction, and neck mobility (LEMON) criteria before and after cervical collar application to investigate the causes of a difficult airway. Methods This simulation study was performed in 76 healthy volunteers. We measured the mouth opening, modified Mallampati classification, and neck extension before and after cervical collar application. Results The mean inter-incisor distance significantly decreased from 4.3 to 2.6 cm (P<0.001). Fifty-seven participants classified as I and II were newly classified as III and IV according to the modified Mallampati classification after cervical collar application (16% to 91%). The angles of neck extension significantly decreased from 44° to 22° after cervical collar application (P<0.001). Before cervical collar application, our simulations predicted that 14 of 76 participants (18%) would have a difficult airway, whereas after cervical collar application, 76 of 76 (100%) were predicted to have a difficult airway. Conclusion All values for the LEMON criteria (mouth opening, modified Mallampati classification, and neck extension) worsened significantly after cervical collar application. Additionally, a difficult airway was predicted in all participants after cervical collar application
Efficacy and Safety of Intra-articular Injections of Hyaluronic Acid Combined With Polydeoxyribonucleotide in the Treatment of Knee Osteoarthritis
Objective To assess the clinical efficacy and safety of intra-articular injection of hyaluronic acid (HA) combined with polydeoxyribonucleotide (PDRN) in patients with knee osteoarthritis in comparison with that of HA alone. Methods The current single-center, prospective, randomized, double-blind, controlled study was conducted in 36 patients with knee osteoarthritis at our medical institution. All the eligible patients (n=30) were equally assigned to two treatment arms (trial group ‘HA+PDRN’ and control group ‘HA’). For efficacy assessment, the patients were evaluated for the visual analogue scale (VAS) scores, the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and the Knee Society Scores (KSS), all of which served as efficacy outcome measures. We monitored time-dependent changes in efficacy outcome measures at baseline and 1, 3 and 6 months. Subsequently, we compared differences in changes in efficacy outcome measures at 6 months from baseline between the two groups. Moreover, we assessed the safety based on the treatment-emergent adverse events (TEAEs), adverse drug reactions (ADRs) and any other complications serving as safety outcome measures. Results There were significant differences in changes in the VAS scores, the WOMAC scores in all domains, except ‘Stiffness’, the total WOMAC scores, and the KSS scores in all the domains at 6 months from baseline between the two groups (p<0.05). In our series, there were no TEAEs, ADRs, and any other complications. Conclusion Intra-articular injections of HA combined with PDRN can also be considered in the treatment of knee osteoarthritis. However, further large-scale and multi-center studies are required to demonstrate the potential of the proposed combination
Context-Aware Emotion Recognition Networks
Traditional techniques for emotion recognition have focused on the facial expression analysis only, thus providing limited ability to encode context that comprehensively represents the emotional responses. We present deep networks for context-aware emotion recognition, called CAERNet, that exploit not only human facial expression but also context information in a joint and boosting manner. The key idea is to hide human faces in a visual scene and seek other contexts based on an attention mechanism. Our networks consist of two sub-networks, including two-stream encoding networks to separately extract the features of face and context regions, and adaptive fusion networks to fuse such features in an adaptive fashion. We also introduce a novel benchmark for context-aware emotion recognition, called CAER, that is more appropriate than existing benchmarks both qualitatively and quantitatively. On several benchmarks, CAER-Net proves the effect of context for emotion recognition
Pin the Memory: Learning to Generalize Semantic Segmentation
The rise of deep neural networks has led to several breakthroughs for
semantic segmentation. In spite of this, a model trained on source domain often
fails to work properly in new challenging domains, that is directly concerned
with the generalization capability of the model. In this paper, we present a
novel memory-guided domain generalization method for semantic segmentation
based on meta-learning framework. Especially, our method abstracts the
conceptual knowledge of semantic classes into categorical memory which is
constant beyond the domains. Upon the meta-learning concept, we repeatedly
train memory-guided networks and simulate virtual test to 1) learn how to
memorize a domain-agnostic and distinct information of classes and 2) offer an
externally settled memory as a class-guidance to reduce the ambiguity of
representation in the test data of arbitrary unseen domain. To this end, we
also propose memory divergence and feature cohesion losses, which encourage to
learn memory reading and update processes for category-aware domain
generalization. Extensive experiments for semantic segmentation demonstrate the
superior generalization capability of our method over state-of-the-art works on
various benchmarks.Comment: Accepted to CVPR 202