78 research outputs found
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
An important challenge in vision-based action recognition is the embedding of
spatiotemporal features with two or more heterogeneous modalities into a single
feature. In this study, we propose a new 3D deformable transformer for action
recognition with adaptive spatiotemporal receptive fields and a cross-modal
learning scheme. The 3D deformable transformer consists of three attention
modules: 3D deformability, local joint stride, and temporal stride attention.
The two cross-modal tokens are input into the 3D deformable attention module to
create a cross-attention token with a reflected spatiotemporal correlation.
Local joint stride attention is applied to spatially combine attention and pose
tokens. Temporal stride attention temporally reduces the number of input tokens
in the attention module and supports temporal expression learning without the
simultaneous use of all tokens. The deformable transformer iterates L times and
combines the last cross-modal token for classification. The proposed 3D
deformable transformer was tested on the NTU60, NTU120, FineGYM, and Penn
Action datasets, and showed results better than or similar to pre-trained
state-of-the-art methods even without a pre-training process. In addition, by
visualizing important joints and correlations during action recognition through
spatial joint and temporal stride attention, the possibility of achieving an
explainable potential for action recognition is presented.Comment: 10 pages, 8 figure
Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and Message Passing Neural Network
Along with generative AI, interest in scene graph generation (SGG), which
comprehensively captures the relationships and interactions between objects in
an image and creates a structured graph-based representation, has significantly
increased in recent years. However, relying on object-centric and dichotomous
relationships, existing SGG methods have a limited ability to accurately
predict detailed relationships. To solve these problems, a new approach to the
modeling multiobject relationships, called edge dual scene graph generation
(EdgeSGG), is proposed herein. EdgeSGG is based on a edge dual scene graph and
Dual Message Passing Neural Network (DualMPNN), which can capture rich
contextual interactions between unconstrained objects. To facilitate the
learning of edge dual scene graphs with a symmetric graph structure, the
proposed DualMPNN learns both object- and relation-centric features for more
accurately predicting relation-aware contexts and allows fine-grained
relational updates between objects. A comparative experiment with
state-of-the-art (SoTA) methods was conducted using two public datasets for SGG
operations and six metrics for three subtasks. Compared with SoTA approaches,
the proposed model exhibited substantial performance improvements across all
SGG subtasks. Furthermore, experiment on long-tail distributions revealed that
incorporating the relationships between objects effectively mitigates existing
long-tail problems
Observation of the orbital Hall effect in a light metal Ti
The orbital angular momentum is a core ingredient of orbital magnetism, spin
Hall effect, giant Rashba spin splitting, orbital Edelstein effect, and
spin-orbit torque. However, its experimental detection is tricky. In
particular, direct detection of the orbital Hall effect remains elusive despite
its importance for electrical control of magnetic nanodevices. Here we report
the direct observation of the orbital Hall effect in a light metal Ti. The Kerr
rotation by the accumulated orbital magnetic moment is measured at Ti surfaces,
whose result agrees with theoretical calculations semiquantitatively and is
supported by the orbital torque measurement in Ti-based magnetic
heterostructures. The results confirm the electron orbital angular momentum as
an essential dynamic degree of freedom, which may provide a novel mechanism for
the electric control of magnetism. The results may also deepen the
understanding of spin, valley, phonon, and magnon dynamics coupled with orbital
dynamics
Different contribution of extent of myocardial injury to left ventricular systolic and diastolic function in early reperfused acute myocardial infarction
BACKGROUND: We sought to investigate the influence of the extent of myocardial injury on left ventricular (LV) systolic and diastolic function in patients after reperfused acute myocardial infarction (AMI).
METHODS: Thirty-eight reperfused AMI patients underwent cardiac magnetic resonance (CMR) imaging after percutaneous coronary revascularization. The extent of myocardial edema and scarring were assessed by T2 weighted imaging and late gadolinium enhancement (LGE) imaging, respectively. Within a day of CMR, echocardiography was done. Using 2D speckle tracking analysis, LV longitudinal, circumferential strain, and twist were measured.
RESULTS: Extent of LGE were significantly correlated with LV systolic functional indices such as ejection fraction (r��=��-0.57, p��<��0.001), regional wall motion score index (r��=��0.52, p��=��0.001), and global longitudinal strain (r��=��0.56, p��<��0.001). The diastolic functional indices significantly correlated with age (r��=��-0.64, p��<��0.001), LV twist (r��=��-0.39, p��=��0.02), average non-infarcted myocardial circumferential strain (r��=��-0.52, p��=��0.001), and LV end-diastolic wall stress index (r��=��-0.47, p��=��0.003 with e') but not or weakly with extent of LGE. In multivariate analysis, age and non-infarcted myocardial circumferential strain independently correlated with diastolic functional indices rather than extent of injury.
CONCLUSIONS: In patients with timely reperfused AMI, not only extent of myocardial injury but also age and non-infarcted myocardial function were more significantly related to LV chamber diastolic function.ope
SINE indel polymorphism of AGL gene and association with growth and carcass traits in Landrace × Jeju black pig F2 population
Genetic polymorphisms in the glycogen debrancher enzyme (AGL) gene were assessed with regard to their association with growth and carcass traits in the F2 population crossbred Landrace and Jeju (Korea) Black pig. Three genotypes representing the insertion and/or deletion (indel) polymorphisms of short interspersed nuclear element were detected at frequencies of 0.278 (L/L), 0.479 (L/S), and 0.243 (S/S), respectively. The AGL S allele-containing pigs evidenced significantly heavier body weights at birth, the 3rd week, 10th week, and 20th week during developmental stages and higher average daily gains during the late period than were noted in the L/L homozygous pigs (P < 0.05), respectively. However, average daily gains during the early period were not significantly associated with genotype distribution (P > 0.05). With regard to the carcass traits, the S allele pigs (S/-) evidenced significantly heavier carcass weights and thicker backfat than was measured in L/L homozygous pigs (P < 0.05). However, body lengths, meat color, and marbling scores were all found not to be statistically significant (P > 0.05). Consequently, the faster growth rate during the late period and backfat deposition rather than intramuscular fat deposition cause differences in pig productivity according to genotypes of the AGL gene. These findings indicate that the AGL genotypes may prove to be useful genetic markers for the improvement of Jeju Black pig-related crossbreeding systems
A Brief Review of Facial Emotion Recognition Based on Visual Information
Facial emotion recognition (FER) is an important topic in the fields of computer vision and artificial intelligence owing to its significant academic and commercial potential. Although FER can be conducted using multiple sensors, this review focuses on studies that exclusively use facial images, because visual expressions are one of the main information channels in interpersonal communication. This paper provides a brief review of researches in the field of FER conducted over the past decades. First, conventional FER approaches are described along with a summary of the representative categories of FER systems and their main algorithms. Deep-learning-based FER approaches using deep networks enabling “end-to-end” learning are then presented. This review also focuses on an up-to-date hybrid deep-learning approach combining a convolutional neural network (CNN) for the spatial features of an individual frame and long short-term memory (LSTM) for temporal features of consecutive frames. In the later part of this paper, a brief review of publicly available evaluation metrics is given, and a comparison with benchmark results, which are a standard for a quantitative comparison of FER researches, is described. This review can serve as a brief guidebook to newcomers in the field of FER, providing basic knowledge and a general understanding of the latest state-of-the-art studies, as well as to experienced researchers looking for productive directions for future work
Driver’s Facial Expression Recognition in Real-Time for Safe Driving
In recent years, researchers of deep neural networks (DNNs)-based facial expression recognition (FER) have reported results showing that these approaches overcome the limitations of conventional machine learning-based FER approaches. However, as DNN-based FER approaches require an excessive amount of memory and incur high processing costs, their application in various fields is very limited and depends on the hardware specifications. In this paper, we propose a fast FER algorithm for monitoring a driver’s emotions that is capable of operating in low specification devices installed in vehicles. For this purpose, a hierarchical weighted random forest (WRF) classifier that is trained based on the similarity of sample data, in order to improve its accuracy, is employed. In the first step, facial landmarks are detected from input images and geometric features are extracted, considering the spatial position between landmarks. These feature vectors are then implemented in the proposed hierarchical WRF classifier to classify facial expressions. Our method was evaluated experimentally using three databases, extended Cohn-Kanade database (CK+), MMI and the Keimyung University Facial Expression of Drivers (KMU-FED) database, and its performance was compared with that of state-of-the-art methods. The results show that our proposed method yields a performance similar to that of deep learning FER methods as 92.6% for CK+ and 76.7% for MMI, with a significantly reduced processing cost approximately 3731 times less than that of the DNN method. These results confirm that the proposed method is optimized for real-time embedded applications having limited computing resources
Rethinking Attention Mechanisms in Vision Transformers with Graph Structures
In this paper, we propose a new type of vision transformer (ViT) based on graph head attention (GHA). Because the multi-head attention (MHA) of a pure ViT requires multiple parameters and tends to lose the locality of an image, we replaced MHA with GHA by applying a graph to the attention head of the transformer. Consequently, the proposed GHA maintains both the locality and globality of the input patches and guarantees the diversity of the attention. The proposed GHA-ViT commonly outperforms pure ViT-based models using small-sized CIFAR-10/100, MNIST, and MNIST-F datasets and a medium-sized ImageNet-1K dataset in scratch training. A Top-1 accuracy of 81.7% was achieved for ImageNet-1K using GHA-B, which is a base model with approximately 29 M parameters. In addition, with CIFAR-10/100, the existing ViT and parameters are reduced 17-fold and the performance increased by 0.4/4.3%, respectively. The proposed GHA-ViT shows promising results in terms of the number of parameters and operations and the level of accuracy in comparison with other state-of-the-art ViT-lightweight models
- …