78 research outputs found

    Cross-Modal Learning with 3D Deformable Attention for Action Recognition

    Full text link
    An important challenge in vision-based action recognition is the embedding of spatiotemporal features with two or more heterogeneous modalities into a single feature. In this study, we propose a new 3D deformable transformer for action recognition with adaptive spatiotemporal receptive fields and a cross-modal learning scheme. The 3D deformable transformer consists of three attention modules: 3D deformability, local joint stride, and temporal stride attention. The two cross-modal tokens are input into the 3D deformable attention module to create a cross-attention token with a reflected spatiotemporal correlation. Local joint stride attention is applied to spatially combine attention and pose tokens. Temporal stride attention temporally reduces the number of input tokens in the attention module and supports temporal expression learning without the simultaneous use of all tokens. The deformable transformer iterates L times and combines the last cross-modal token for classification. The proposed 3D deformable transformer was tested on the NTU60, NTU120, FineGYM, and Penn Action datasets, and showed results better than or similar to pre-trained state-of-the-art methods even without a pre-training process. In addition, by visualizing important joints and correlations during action recognition through spatial joint and temporal stride attention, the possibility of achieving an explainable potential for action recognition is presented.Comment: 10 pages, 8 figure

    Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and Message Passing Neural Network

    Full text link
    Along with generative AI, interest in scene graph generation (SGG), which comprehensively captures the relationships and interactions between objects in an image and creates a structured graph-based representation, has significantly increased in recent years. However, relying on object-centric and dichotomous relationships, existing SGG methods have a limited ability to accurately predict detailed relationships. To solve these problems, a new approach to the modeling multiobject relationships, called edge dual scene graph generation (EdgeSGG), is proposed herein. EdgeSGG is based on a edge dual scene graph and Dual Message Passing Neural Network (DualMPNN), which can capture rich contextual interactions between unconstrained objects. To facilitate the learning of edge dual scene graphs with a symmetric graph structure, the proposed DualMPNN learns both object- and relation-centric features for more accurately predicting relation-aware contexts and allows fine-grained relational updates between objects. A comparative experiment with state-of-the-art (SoTA) methods was conducted using two public datasets for SGG operations and six metrics for three subtasks. Compared with SoTA approaches, the proposed model exhibited substantial performance improvements across all SGG subtasks. Furthermore, experiment on long-tail distributions revealed that incorporating the relationships between objects effectively mitigates existing long-tail problems

    Observation of the orbital Hall effect in a light metal Ti

    Full text link
    The orbital angular momentum is a core ingredient of orbital magnetism, spin Hall effect, giant Rashba spin splitting, orbital Edelstein effect, and spin-orbit torque. However, its experimental detection is tricky. In particular, direct detection of the orbital Hall effect remains elusive despite its importance for electrical control of magnetic nanodevices. Here we report the direct observation of the orbital Hall effect in a light metal Ti. The Kerr rotation by the accumulated orbital magnetic moment is measured at Ti surfaces, whose result agrees with theoretical calculations semiquantitatively and is supported by the orbital torque measurement in Ti-based magnetic heterostructures. The results confirm the electron orbital angular momentum as an essential dynamic degree of freedom, which may provide a novel mechanism for the electric control of magnetism. The results may also deepen the understanding of spin, valley, phonon, and magnon dynamics coupled with orbital dynamics

    Different contribution of extent of myocardial injury to left ventricular systolic and diastolic function in early reperfused acute myocardial infarction

    Get PDF
    BACKGROUND: We sought to investigate the influence of the extent of myocardial injury on left ventricular (LV) systolic and diastolic function in patients after reperfused acute myocardial infarction (AMI). METHODS: Thirty-eight reperfused AMI patients underwent cardiac magnetic resonance (CMR) imaging after percutaneous coronary revascularization. The extent of myocardial edema and scarring were assessed by T2 weighted imaging and late gadolinium enhancement (LGE) imaging, respectively. Within a day of CMR, echocardiography was done. Using 2D speckle tracking analysis, LV longitudinal, circumferential strain, and twist were measured. RESULTS: Extent of LGE were significantly correlated with LV systolic functional indices such as ejection fraction (r��=��-0.57, p��<��0.001), regional wall motion score index (r��=��0.52, p��=��0.001), and global longitudinal strain (r��=��0.56, p��<��0.001). The diastolic functional indices significantly correlated with age (r��=��-0.64, p��<��0.001), LV twist (r��=��-0.39, p��=��0.02), average non-infarcted myocardial circumferential strain (r��=��-0.52, p��=��0.001), and LV end-diastolic wall stress index (r��=��-0.47, p��=��0.003 with e') but not or weakly with extent of LGE. In multivariate analysis, age and non-infarcted myocardial circumferential strain independently correlated with diastolic functional indices rather than extent of injury. CONCLUSIONS: In patients with timely reperfused AMI, not only extent of myocardial injury but also age and non-infarcted myocardial function were more significantly related to LV chamber diastolic function.ope

    SINE indel polymorphism of AGL gene and association with growth and carcass traits in Landrace × Jeju black pig F2 population

    Get PDF
    Genetic polymorphisms in the glycogen debrancher enzyme (AGL) gene were assessed with regard to their association with growth and carcass traits in the F2 population crossbred Landrace and Jeju (Korea) Black pig. Three genotypes representing the insertion and/or deletion (indel) polymorphisms of short interspersed nuclear element were detected at frequencies of 0.278 (L/L), 0.479 (L/S), and 0.243 (S/S), respectively. The AGL S allele-containing pigs evidenced significantly heavier body weights at birth, the 3rd week, 10th week, and 20th week during developmental stages and higher average daily gains during the late period than were noted in the L/L homozygous pigs (P < 0.05), respectively. However, average daily gains during the early period were not significantly associated with genotype distribution (P > 0.05). With regard to the carcass traits, the S allele pigs (S/-) evidenced significantly heavier carcass weights and thicker backfat than was measured in L/L homozygous pigs (P < 0.05). However, body lengths, meat color, and marbling scores were all found not to be statistically significant (P > 0.05). Consequently, the faster growth rate during the late period and backfat deposition rather than intramuscular fat deposition cause differences in pig productivity according to genotypes of the AGL gene. These findings indicate that the AGL genotypes may prove to be useful genetic markers for the improvement of Jeju Black pig-related crossbreeding systems

    A Brief Review of Facial Emotion Recognition Based on Visual Information

    No full text
    Facial emotion recognition (FER) is an important topic in the fields of computer vision and artificial intelligence owing to its significant academic and commercial potential. Although FER can be conducted using multiple sensors, this review focuses on studies that exclusively use facial images, because visual expressions are one of the main information channels in interpersonal communication. This paper provides a brief review of researches in the field of FER conducted over the past decades. First, conventional FER approaches are described along with a summary of the representative categories of FER systems and their main algorithms. Deep-learning-based FER approaches using deep networks enabling “end-to-end” learning are then presented. This review also focuses on an up-to-date hybrid deep-learning approach combining a convolutional neural network (CNN) for the spatial features of an individual frame and long short-term memory (LSTM) for temporal features of consecutive frames. In the later part of this paper, a brief review of publicly available evaluation metrics is given, and a comparison with benchmark results, which are a standard for a quantitative comparison of FER researches, is described. This review can serve as a brief guidebook to newcomers in the field of FER, providing basic knowledge and a general understanding of the latest state-of-the-art studies, as well as to experienced researchers looking for productive directions for future work

    Driver’s Facial Expression Recognition in Real-Time for Safe Driving

    No full text
    In recent years, researchers of deep neural networks (DNNs)-based facial expression recognition (FER) have reported results showing that these approaches overcome the limitations of conventional machine learning-based FER approaches. However, as DNN-based FER approaches require an excessive amount of memory and incur high processing costs, their application in various fields is very limited and depends on the hardware specifications. In this paper, we propose a fast FER algorithm for monitoring a driver&#8217;s emotions that is capable of operating in low specification devices installed in vehicles. For this purpose, a hierarchical weighted random forest (WRF) classifier that is trained based on the similarity of sample data, in order to improve its accuracy, is employed. In the first step, facial landmarks are detected from input images and geometric features are extracted, considering the spatial position between landmarks. These feature vectors are then implemented in the proposed hierarchical WRF classifier to classify facial expressions. Our method was evaluated experimentally using three databases, extended Cohn-Kanade database (CK+), MMI and the Keimyung University Facial Expression of Drivers (KMU-FED) database, and its performance was compared with that of state-of-the-art methods. The results show that our proposed method yields a performance similar to that of deep learning FER methods as 92.6% for CK+ and 76.7% for MMI, with a significantly reduced processing cost approximately 3731 times less than that of the DNN method. These results confirm that the proposed method is optimized for real-time embedded applications having limited computing resources

    Rethinking Attention Mechanisms in Vision Transformers with Graph Structures

    No full text
    In this paper, we propose a new type of vision transformer (ViT) based on graph head attention (GHA). Because the multi-head attention (MHA) of a pure ViT requires multiple parameters and tends to lose the locality of an image, we replaced MHA with GHA by applying a graph to the attention head of the transformer. Consequently, the proposed GHA maintains both the locality and globality of the input patches and guarantees the diversity of the attention. The proposed GHA-ViT commonly outperforms pure ViT-based models using small-sized CIFAR-10/100, MNIST, and MNIST-F datasets and a medium-sized ImageNet-1K dataset in scratch training. A Top-1 accuracy of 81.7% was achieved for ImageNet-1K using GHA-B, which is a base model with approximately 29 M parameters. In addition, with CIFAR-10/100, the existing ViT and parameters are reduced 17-fold and the performance increased by 0.4/4.3%, respectively. The proposed GHA-ViT shows promising results in terms of the number of parameters and operations and the level of accuracy in comparison with other state-of-the-art ViT-lightweight models
    corecore