196 research outputs found
FESNet: Spotting Facial Expressions Using Local Spatial Discrepancy and Multi-Scale Temporal Aggregation
Facial expressions (FEs) spotting aims to split long videos into intervals of neutral expression, macro-expression, or micro-expression. Recent works mainly focus on feature descriptor or optical flow methods, suffering from difficulty capturing subtle facial motion and efficient temporal aggregation. This paper proposes a novel end-to-end network, named FESNet (Facial Expression Spotting Network), to solve the above challenges. The main idea is to model the subtle facial motion as local spatial discrepancy and incorporate temporal correlation by multi-scale temporal convolution. The FESNet comprises a local spatial discrepancy module (LSDM) and a multi-scale temporal aggregation module (MTAM). The LSDM first extracts the static spatial features from each frame by residual convolution and learns the inner spatial correlation by multi-head attention. Moreover, the subtle facial motion of facial expression is modeled as the discrepancy between the first frame and the current frame of the input interval, making frame-wise spatial proposals. Using the local spatial discrepancy features and proposals as input, the MTAM incorporates the temporal correlation by multi-scale temporal convolution and performs cascade refinement to make the final prediction. Furthermore, this paper proposes a smooth loss to ensure the temporal consistency of the cascade refined proposals from MTAM. Comprehensive experiments show that FESNet achieves competitive performance compared to state-of-the-art methods
Experimental Study on Loading Capacity of Glued-Laminated Timber Arches Subjected to Vertical Concentrated Loads
Glued-laminated timber arches are widely used in gymnasiums, bridges, and roof trusses. However, studies on their mechanical behaviours and design methods are still insufficient. This paper investigates the in-plane loading capacity of circular glued-laminated timber arches made of Douglas fir. Experiments were conducted on four timber-arch models with different rise-to-span ratios under concentrated loads at mid-span and quarter-point locations. The structural responses, failure modes, and loading capacity of the timber arch specimens were obtained. The results show that the timber arches presented symmetric and antisymmetric deformation under mid-point and quarter-point loading conditions, respectively. The downward shifting of the neutral axis of the cross section was observed under mid-point loading condition, which contributes to higher loading capacity compared to that under quarter-point loading condition. The loading condition significantly affects the ultimate loads and the strain distribution in the cross section. Based on the design formula in current standards for timber structures, an equivalent beam-column method was introduced to estimate the loading capacity of the laminated timber arches under vertical concentrated loads. The moment amplification factor in the formula was compared and discussed, and the value provided in the National Design Specification for Wood Construction was recommended with acceptable accuracy
Multifunctional imaging enabled by optical bound states in the continuum with broken symmetry
For photonic crystal slab (PCS) structures, bound states in the continuum
(BICs) and circularly polarized states (dubbed C-points) are important
topological polarization singularities in momentum-space and have attracted
burgeoning attention due to their novel topological and optical properties. In
our work, the evolution of polarization singularities from BICs to C-points is
achieved by breaking the in-plane C2 symmetry of a PCS structure of a square
lattice with C4v symmetry. Correspondingly, a BIC is split into two C-points
with opposite chirality, incurring distinct optical transmission responses with
the incidence of right or left circular polarization (RCP or LCP). Harnessing
such chirality selectivity of the C-points, we propose a multifunctional
imaging system by integrating the designed PCS into a conventional 4-f imaging
system, to realize both the edge imaging and conventional bright-field imaging,
determined by the circular polarization state of the light source. In addition
to multifunctional imaging, our system also provides a vivid picture about the
evolution of the PCS platforms' singularities.Comment: 11 pages, 4 figure
CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation
Surface defect inspection is of great importance for industrial manufacture
and production. Though defect inspection methods based on deep learning have
made significant progress, there are still some challenges for these methods,
such as indistinguishable weak defects and defect-like interference in the
background. To address these issues, we propose a transformer network with
multi-stage CNN (Convolutional Neural Network) feature injection for surface
defect segmentation, which is a UNet-like structure named CINFormer. CINFormer
presents a simple yet effective feature integration mechanism that injects the
multi-level CNN features of the input image into different stages of the
transformer network in the encoder. This can maintain the merit of CNN
capturing detailed features and that of transformer depressing noises in the
background, which facilitates accurate defect detection. In addition, CINFormer
presents a Top-K self-attention module to focus on tokens with more important
information about the defects, so as to further reduce the impact of the
redundant background. Extensive experiments conducted on the surface defect
datasets DAGM 2007, Magnetic tile, and NEU show that the proposed CINFormer
achieves state-of-the-art performance in defect detection
Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects
Surface defect inspection is a very challenging task in which surface defects
usually show weak appearances or exist under complex backgrounds. Most
high-accuracy defect detection methods require expensive computation and
storage overhead, making them less practical in some resource-constrained
defect detection applications. Although some lightweight methods have achieved
real-time inference speed with fewer parameters, they show poor detection
accuracy in complex defect scenarios. To this end, we develop a Global Context
Aggregation Network (GCANet) for lightweight saliency detection of surface
defects on the encoder-decoder structure. First, we introduce a novel
transformer encoder on the top layer of the lightweight backbone, which
captures global context information through a novel Depth-wise Self-Attention
(DSA) module. The proposed DSA performs element-wise similarity in channel
dimension while maintaining linear complexity. In addition, we introduce a
novel Channel Reference Attention (CRA) module before each decoder block to
strengthen the representation of multi-level features in the bottom-up path.
The proposed CRA exploits the channel correlation between features at different
layers to adaptively enhance feature representation. The experimental results
on three public defect datasets demonstrate that the proposed network achieves
a better trade-off between accuracy and running efficiency compared with other
17 state-of-the-art methods. Specifically, GCANet achieves competitive accuracy
(91.79% , 93.55% , and 97.35% ) on
SD-saliency-900 while running 272fps on a single gpu
A robotic learning and generalization framework for curved surface based on modified DMP
Learning from demonstration (LfD) can enable robots to quickly obtain reference trajectory information. How to reproduce and generalize the skills acquired by demonstrating is a hot topic for researchers. Firstly, aiming at the drawback that many industrial robots were difficult to continuously and smoothly drag and demonstrate, a compliant continuous drag demonstration system based on discrete admittance model was designed. Then, in order to solve the problem of poor generalization ability of the classical dynamic movement primitive (DMP) on curved surface, the modified DMP contained the scaling factor and the force coupling term. Finally, the curve drawing experiments were carried out on a 6-DoF robot. Experimental results show the effectiveness of our proposed learning and generalization framework
Element detection and segmentation of mathematical function graphs based on improved Mask R-CNN
There are approximately 2.2 billion people around the world with varying degrees of visual impairments. Among them, individuals with severe visual impairments predominantly rely on hearing and touch to gather external information. At present, there are limited reading materials for the visually impaired, mostly in the form of audio or text, which cannot satisfy the needs for the visually impaired to comprehend graphical content. Although many scholars have devoted their efforts to investigating methods for converting visual images into tactile graphics, tactile graphic translation fails to meet the reading needs of visually impaired individuals due to image type diversity and limitations in image recognition technology. The primary goal of this paper is to enable the visually impaired to gain a greater understanding of the natural sciences by transforming images of mathematical functions into an electronic format for the production of tactile graphics. In an effort to enhance the accuracy and efficiency of graph element recognition and segmentation of function graphs, this paper proposes an MA Mask R-CNN model which utilizes MA ConvNeXt as its improved feature extraction backbone network and MA BiFPN as its improved feature fusion network. The MA ConvNeXt is a novel feature extraction network proposed in this paper, while the MA BiFPN is a novel feature fusion network introduced in this paper. This model combines the information of local relations, global relations and different channels to form an attention mechanism that is able to establish multiple connections, thus increasing the detection capability of the original Mask R-CNN model on slender and multi-type targets by combining a variety of multi-scale features. Finally, the experimental results show that MA Mask R-CNN attains an 89.6% mAP value for target detection and 72.3% mAP value for target segmentation in the instance segmentation of function graphs. This results in a 9% mAP improvement for target detection and 12.8% mAP improvement for target segmentation compared to the original Mask R-CNN
Fine structures of radio bursts from flare star AD Leo with FAST observations
Radio bursts from nearby active M-dwarfs have been frequently reported and
extensively studied in solar or planetary paradigms. Whereas, their
sub-structures or fine structures remain rarely explored despite their
potential significance in diagnosing the plasma and magnetic field properties
of the star. Such studies in the past have been limited by the sensitivity of
radio telescopes. Here we report the inspiring results from the high
time-resolution observations of a known flare star AD Leo with the
Five-hundred-meter Aperture Spherical radio Telescope (FAST). We detected many
radio bursts in the two days of observations with fine structures in the form
of numerous millisecond-scale sub-bursts. Sub-bursts on the first day display
stripe-like shapes with nearly uniform frequency drift rates, which are
possibly stellar analogs to Jovian S-bursts. Sub-bursts on the second day,
however, reveal a different blob-like shape with random occurrence patterns and
are akin to solar radio spikes. The new observational results suggest that the
intense emission from AD Leo is driven by electron cyclotron maser instability
which may be related to stellar flares or interactions with a planetary
companion.Comment: 25 pages, 12 figures, accepted for publication in Ap
- …