685 research outputs found
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
An important challenge in vision-based action recognition is the embedding of
spatiotemporal features with two or more heterogeneous modalities into a single
feature. In this study, we propose a new 3D deformable transformer for action
recognition with adaptive spatiotemporal receptive fields and a cross-modal
learning scheme. The 3D deformable transformer consists of three attention
modules: 3D deformability, local joint stride, and temporal stride attention.
The two cross-modal tokens are input into the 3D deformable attention module to
create a cross-attention token with a reflected spatiotemporal correlation.
Local joint stride attention is applied to spatially combine attention and pose
tokens. Temporal stride attention temporally reduces the number of input tokens
in the attention module and supports temporal expression learning without the
simultaneous use of all tokens. The deformable transformer iterates L times and
combines the last cross-modal token for classification. The proposed 3D
deformable transformer was tested on the NTU60, NTU120, FineGYM, and Penn
Action datasets, and showed results better than or similar to pre-trained
state-of-the-art methods even without a pre-training process. In addition, by
visualizing important joints and correlations during action recognition through
spatial joint and temporal stride attention, the possibility of achieving an
explainable potential for action recognition is presented.Comment: 10 pages, 8 figure
Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and Message Passing Neural Network
Along with generative AI, interest in scene graph generation (SGG), which
comprehensively captures the relationships and interactions between objects in
an image and creates a structured graph-based representation, has significantly
increased in recent years. However, relying on object-centric and dichotomous
relationships, existing SGG methods have a limited ability to accurately
predict detailed relationships. To solve these problems, a new approach to the
modeling multiobject relationships, called edge dual scene graph generation
(EdgeSGG), is proposed herein. EdgeSGG is based on a edge dual scene graph and
Dual Message Passing Neural Network (DualMPNN), which can capture rich
contextual interactions between unconstrained objects. To facilitate the
learning of edge dual scene graphs with a symmetric graph structure, the
proposed DualMPNN learns both object- and relation-centric features for more
accurately predicting relation-aware contexts and allows fine-grained
relational updates between objects. A comparative experiment with
state-of-the-art (SoTA) methods was conducted using two public datasets for SGG
operations and six metrics for three subtasks. Compared with SoTA approaches,
the proposed model exhibited substantial performance improvements across all
SGG subtasks. Furthermore, experiment on long-tail distributions revealed that
incorporating the relationships between objects effectively mitigates existing
long-tail problems
Axial strain dependence of all-fiber acousto-optic tunable filters
We report the axial strain dependence of two types of all-fiber acousto-optic tunable filters based on flexural and torsional acoustic waves. Experimental observation of the resonant wavelength shift under applied axial strain could be explained by theoretical consideration of the combination of acoustic and optical effects. We discuss the possibility of suppressing the strain effect in the filters, or conversely, the possibility of using the strain dependence for wavelength tuning or strain sensors
Brillouin fiber laser pumped by a DFB laser diode
In this paper, we present a novel Brillouin fiber-ring laser utilizing an unbalanced Mach-Zehnder interferometer (UMZI) as coupling device. The laser is pumped by a distributed-feedback laser diode and shows continuous-wave and single-frequency operation. Frequency-dependent transmission characteristics of the UMZI make it possible for the pump wave to pass through the laser-ring cavity with no resonance effect for stable pump operation, while the Brillouin laser signal still resonates in a high-finesse cavity. Single and multiple longitudinal mode operations are observed according to the relative location between longitudinal modes and Brillouin gain-curve center. A stable single-frequency operation is achieved using a simple stabilizing feedback loop based on dithering and autotracking techniques. Using this simple stabilizing feedback loop, the laser-intensity fluctuation is highly suppressed and remains below 4%. The Brillouin output converted from the pump power of 26.4 mW is about 3.18 mW, and the linewidth is measured to be below 1 kH
- …