15 research outputs found

    SimOn: A Simple Framework for Online Temporal Action Localization

    Full text link
    Online Temporal Action Localization (On-TAL) aims to immediately provide action instances from untrimmed streaming videos. The model is not allowed to utilize future frames and any processing techniques to modify past predictions, making On-TAL much more challenging. In this paper, we propose a simple yet effective framework, termed SimOn, that learns to predict action instances using the popular Transformer architecture in an end-to-end manner. Specifically, the model takes the current frame feature as a query and a set of past context information as keys and values of the Transformer. Different from the prior work that uses a set of outputs of the model as past contexts, we leverage the past visual context and the learnable context embedding for the current query. Experimental results on the THUMOS14 and ActivityNet1.3 datasets show that our model remarkably outperforms the previous methods, achieving a new state-of-the-art On-TAL performance. In addition, the evaluation for Online Detection of Action Start (ODAS) demonstrates the effectiveness and robustness of our method in the online setting. The code is available at https://github.com/TuanTNG/SimO

    Knowing Where to Focus: Event-aware Transformer for Video Grounding

    Full text link
    Recent DETR-based video grounding models have made the model directly predict moment timestamps without any hand-crafted components, such as a pre-defined proposal or non-maximum suppression, by learning moment queries. However, their input-agnostic moment queries inevitably overlook an intrinsic temporal structure of a video, providing limited positional information. In this paper, we formulate an event-aware dynamic moment query to enable the model to take the input-specific content and positional information of the video into account. To this end, we present two levels of reasoning: 1) Event reasoning that captures distinctive event units constituting a given video using a slot attention mechanism; and 2) moment reasoning that fuses the moment queries with a given sentence through a gated fusion transformer layer and learns interactions between the moment queries and video-sentence representations to predict moment timestamps. Extensive experiments demonstrate the effectiveness and efficiency of the event-aware dynamic moment queries, outperforming state-of-the-art approaches on several video grounding benchmarks.Comment: ICCV 2023. Code is available at https://github.com/jinhyunj/EaT

    Cross-Domain Grouping and Alignment for Domain Adaptive Semantic Segmentation

    Full text link
    Existing techniques to adapt semantic segmentation networks across the source and target domains within deep convolutional neural networks (CNNs) deal with all the samples from the two domains in a global or category-aware manner. They do not consider an inter-class variation within the target domain itself or estimated category, providing the limitation to encode the domains having a multi-modal data distribution. To overcome this limitation, we introduce a learnable clustering module, and a novel domain adaptation framework called cross-domain grouping and alignment. To cluster the samples across domains with an aim to maximize the domain alignment without forgetting precise segmentation ability on the source domain, we present two loss functions, in particular, for encouraging semantic consistency and orthogonality among the clusters. We also present a loss so as to solve a class imbalance problem, which is the other limitation of the previous methods. Our experiments show that our method consistently boosts the adaptation performance in semantic segmentation, outperforming the state-of-the-arts on various domain adaptation settings.Comment: AAAI 202

    Cervical collar makes difficult airway: a simulation study using the LEMON criteria

    Get PDF
    Objective Endotracheal intubation is extremely difficult to perform in patients wearing a cervical collar for a head and neck injury. Therefore, we analyzed actual measurements using the look externally, evaluate 3-3-2, Mallampati score, obstruction, and neck mobility (LEMON) criteria before and after cervical collar application to investigate the causes of a difficult airway. Methods This simulation study was performed in 76 healthy volunteers. We measured the mouth opening, modified Mallampati classification, and neck extension before and after cervical collar application. Results The mean inter-incisor distance significantly decreased from 4.3 to 2.6 cm (P<0.001). Fifty-seven participants classified as I and II were newly classified as III and IV according to the modified Mallampati classification after cervical collar application (16% to 91%). The angles of neck extension significantly decreased from 44° to 22° after cervical collar application (P<0.001). Before cervical collar application, our simulations predicted that 14 of 76 participants (18%) would have a difficult airway, whereas after cervical collar application, 76 of 76 (100%) were predicted to have a difficult airway. Conclusion All values for the LEMON criteria (mouth opening, modified Mallampati classification, and neck extension) worsened significantly after cervical collar application. Additionally, a difficult airway was predicted in all participants after cervical collar application

    Efficacy and Safety of Intra-articular Injections of Hyaluronic Acid Combined With Polydeoxyribonucleotide in the Treatment of Knee Osteoarthritis

    Get PDF
    Objective To assess the clinical efficacy and safety of intra-articular injection of hyaluronic acid (HA) combined with polydeoxyribonucleotide (PDRN) in patients with knee osteoarthritis in comparison with that of HA alone. Methods The current single-center, prospective, randomized, double-blind, controlled study was conducted in 36 patients with knee osteoarthritis at our medical institution. All the eligible patients (n=30) were equally assigned to two treatment arms (trial group ‘HA+PDRN’ and control group ‘HA’). For efficacy assessment, the patients were evaluated for the visual analogue scale (VAS) scores, the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and the Knee Society Scores (KSS), all of which served as efficacy outcome measures. We monitored time-dependent changes in efficacy outcome measures at baseline and 1, 3 and 6 months. Subsequently, we compared differences in changes in efficacy outcome measures at 6 months from baseline between the two groups. Moreover, we assessed the safety based on the treatment-emergent adverse events (TEAEs), adverse drug reactions (ADRs) and any other complications serving as safety outcome measures. Results There were significant differences in changes in the VAS scores, the WOMAC scores in all domains, except ‘Stiffness’, the total WOMAC scores, and the KSS scores in all the domains at 6 months from baseline between the two groups (p<0.05). In our series, there were no TEAEs, ADRs, and any other complications. Conclusion Intra-articular injections of HA combined with PDRN can also be considered in the treatment of knee osteoarthritis. However, further large-scale and multi-center studies are required to demonstrate the potential of the proposed combination

    Context-Aware Emotion Recognition Networks

    No full text
    Traditional techniques for emotion recognition have focused on the facial expression analysis only, thus providing limited ability to encode context that comprehensively represents the emotional responses. We present deep networks for context-aware emotion recognition, called CAERNet, that exploit not only human facial expression but also context information in a joint and boosting manner. The key idea is to hide human faces in a visual scene and seek other contexts based on an attention mechanism. Our networks consist of two sub-networks, including two-stream encoding networks to separately extract the features of face and context regions, and adaptive fusion networks to fuse such features in an adaptive fashion. We also introduce a novel benchmark for context-aware emotion recognition, called CAER, that is more appropriate than existing benchmarks both qualitatively and quantitatively. On several benchmarks, CAER-Net proves the effect of context for emotion recognition

    Pin the Memory: Learning to Generalize Semantic Segmentation

    Full text link
    The rise of deep neural networks has led to several breakthroughs for semantic segmentation. In spite of this, a model trained on source domain often fails to work properly in new challenging domains, that is directly concerned with the generalization capability of the model. In this paper, we present a novel memory-guided domain generalization method for semantic segmentation based on meta-learning framework. Especially, our method abstracts the conceptual knowledge of semantic classes into categorical memory which is constant beyond the domains. Upon the meta-learning concept, we repeatedly train memory-guided networks and simulate virtual test to 1) learn how to memorize a domain-agnostic and distinct information of classes and 2) offer an externally settled memory as a class-guidance to reduce the ambiguity of representation in the test data of arbitrary unseen domain. To this end, we also propose memory divergence and feature cohesion losses, which encourage to learn memory reading and update processes for category-aware domain generalization. Extensive experiments for semantic segmentation demonstrate the superior generalization capability of our method over state-of-the-art works on various benchmarks.Comment: Accepted to CVPR 202
    corecore