196 research outputs found

    FESNet: Spotting Facial Expressions Using Local Spatial Discrepancy and Multi-Scale Temporal Aggregation

    Get PDF
    Facial expressions (FEs) spotting aims to split long videos into intervals of neutral expression, macro-expression, or micro-expression. Recent works mainly focus on feature descriptor or optical flow methods, suffering from difficulty capturing subtle facial motion and efficient temporal aggregation. This paper proposes a novel end-to-end network, named FESNet (Facial Expression Spotting Network), to solve the above challenges. The main idea is to model the subtle facial motion as local spatial discrepancy and incorporate temporal correlation by multi-scale temporal convolution. The FESNet comprises a local spatial discrepancy module (LSDM) and a multi-scale temporal aggregation module (MTAM). The LSDM first extracts the static spatial features from each frame by residual convolution and learns the inner spatial correlation by multi-head attention. Moreover, the subtle facial motion of facial expression is modeled as the discrepancy between the first frame and the current frame of the input interval, making frame-wise spatial proposals. Using the local spatial discrepancy features and proposals as input, the MTAM incorporates the temporal correlation by multi-scale temporal convolution and performs cascade refinement to make the final prediction. Furthermore, this paper proposes a smooth loss to ensure the temporal consistency of the cascade refined proposals from MTAM. Comprehensive experiments show that FESNet achieves competitive performance compared to state-of-the-art methods

    Experimental Study on Loading Capacity of Glued-Laminated Timber Arches Subjected to Vertical Concentrated Loads

    Get PDF
    Glued-laminated timber arches are widely used in gymnasiums, bridges, and roof trusses. However, studies on their mechanical behaviours and design methods are still insufficient. This paper investigates the in-plane loading capacity of circular glued-laminated timber arches made of Douglas fir. Experiments were conducted on four timber-arch models with different rise-to-span ratios under concentrated loads at mid-span and quarter-point locations. The structural responses, failure modes, and loading capacity of the timber arch specimens were obtained. The results show that the timber arches presented symmetric and antisymmetric deformation under mid-point and quarter-point loading conditions, respectively. The downward shifting of the neutral axis of the cross section was observed under mid-point loading condition, which contributes to higher loading capacity compared to that under quarter-point loading condition. The loading condition significantly affects the ultimate loads and the strain distribution in the cross section. Based on the design formula in current standards for timber structures, an equivalent beam-column method was introduced to estimate the loading capacity of the laminated timber arches under vertical concentrated loads. The moment amplification factor in the formula was compared and discussed, and the value provided in the National Design Specification for Wood Construction was recommended with acceptable accuracy

    Multifunctional imaging enabled by optical bound states in the continuum with broken symmetry

    Full text link
    For photonic crystal slab (PCS) structures, bound states in the continuum (BICs) and circularly polarized states (dubbed C-points) are important topological polarization singularities in momentum-space and have attracted burgeoning attention due to their novel topological and optical properties. In our work, the evolution of polarization singularities from BICs to C-points is achieved by breaking the in-plane C2 symmetry of a PCS structure of a square lattice with C4v symmetry. Correspondingly, a BIC is split into two C-points with opposite chirality, incurring distinct optical transmission responses with the incidence of right or left circular polarization (RCP or LCP). Harnessing such chirality selectivity of the C-points, we propose a multifunctional imaging system by integrating the designed PCS into a conventional 4-f imaging system, to realize both the edge imaging and conventional bright-field imaging, determined by the circular polarization state of the light source. In addition to multifunctional imaging, our system also provides a vivid picture about the evolution of the PCS platforms' singularities.Comment: 11 pages, 4 figure

    CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation

    Full text link
    Surface defect inspection is of great importance for industrial manufacture and production. Though defect inspection methods based on deep learning have made significant progress, there are still some challenges for these methods, such as indistinguishable weak defects and defect-like interference in the background. To address these issues, we propose a transformer network with multi-stage CNN (Convolutional Neural Network) feature injection for surface defect segmentation, which is a UNet-like structure named CINFormer. CINFormer presents a simple yet effective feature integration mechanism that injects the multi-level CNN features of the input image into different stages of the transformer network in the encoder. This can maintain the merit of CNN capturing detailed features and that of transformer depressing noises in the background, which facilitates accurate defect detection. In addition, CINFormer presents a Top-K self-attention module to focus on tokens with more important information about the defects, so as to further reduce the impact of the redundant background. Extensive experiments conducted on the surface defect datasets DAGM 2007, Magnetic tile, and NEU show that the proposed CINFormer achieves state-of-the-art performance in defect detection

    Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects

    Full text link
    Surface defect inspection is a very challenging task in which surface defects usually show weak appearances or exist under complex backgrounds. Most high-accuracy defect detection methods require expensive computation and storage overhead, making them less practical in some resource-constrained defect detection applications. Although some lightweight methods have achieved real-time inference speed with fewer parameters, they show poor detection accuracy in complex defect scenarios. To this end, we develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure. First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module. The proposed DSA performs element-wise similarity in channel dimension while maintaining linear complexity. In addition, we introduce a novel Channel Reference Attention (CRA) module before each decoder block to strengthen the representation of multi-level features in the bottom-up path. The proposed CRA exploits the channel correlation between features at different layers to adaptively enhance feature representation. The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods. Specifically, GCANet achieves competitive accuracy (91.79% FβwF_{\beta}^{w}, 93.55% SαS_\alpha, and 97.35% EϕE_\phi) on SD-saliency-900 while running 272fps on a single gpu

    A robotic learning and generalization framework for curved surface based on modified DMP

    Get PDF
    Learning from demonstration (LfD) can enable robots to quickly obtain reference trajectory information. How to reproduce and generalize the skills acquired by demonstrating is a hot topic for researchers. Firstly, aiming at the drawback that many industrial robots were difficult to continuously and smoothly drag and demonstrate, a compliant continuous drag demonstration system based on discrete admittance model was designed. Then, in order to solve the problem of poor generalization ability of the classical dynamic movement primitive (DMP) on curved surface, the modified DMP contained the scaling factor and the force coupling term. Finally, the curve drawing experiments were carried out on a 6-DoF robot. Experimental results show the effectiveness of our proposed learning and generalization framework

    Element detection and segmentation of mathematical function graphs based on improved Mask R-CNN

    Get PDF
    There are approximately 2.2 billion people around the world with varying degrees of visual impairments. Among them, individuals with severe visual impairments predominantly rely on hearing and touch to gather external information. At present, there are limited reading materials for the visually impaired, mostly in the form of audio or text, which cannot satisfy the needs for the visually impaired to comprehend graphical content. Although many scholars have devoted their efforts to investigating methods for converting visual images into tactile graphics, tactile graphic translation fails to meet the reading needs of visually impaired individuals due to image type diversity and limitations in image recognition technology. The primary goal of this paper is to enable the visually impaired to gain a greater understanding of the natural sciences by transforming images of mathematical functions into an electronic format for the production of tactile graphics. In an effort to enhance the accuracy and efficiency of graph element recognition and segmentation of function graphs, this paper proposes an MA Mask R-CNN model which utilizes MA ConvNeXt as its improved feature extraction backbone network and MA BiFPN as its improved feature fusion network. The MA ConvNeXt is a novel feature extraction network proposed in this paper, while the MA BiFPN is a novel feature fusion network introduced in this paper. This model combines the information of local relations, global relations and different channels to form an attention mechanism that is able to establish multiple connections, thus increasing the detection capability of the original Mask R-CNN model on slender and multi-type targets by combining a variety of multi-scale features. Finally, the experimental results show that MA Mask R-CNN attains an 89.6% mAP value for target detection and 72.3% mAP value for target segmentation in the instance segmentation of function graphs. This results in a 9% mAP improvement for target detection and 12.8% mAP improvement for target segmentation compared to the original Mask R-CNN

    Fine structures of radio bursts from flare star AD Leo with FAST observations

    Full text link
    Radio bursts from nearby active M-dwarfs have been frequently reported and extensively studied in solar or planetary paradigms. Whereas, their sub-structures or fine structures remain rarely explored despite their potential significance in diagnosing the plasma and magnetic field properties of the star. Such studies in the past have been limited by the sensitivity of radio telescopes. Here we report the inspiring results from the high time-resolution observations of a known flare star AD Leo with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). We detected many radio bursts in the two days of observations with fine structures in the form of numerous millisecond-scale sub-bursts. Sub-bursts on the first day display stripe-like shapes with nearly uniform frequency drift rates, which are possibly stellar analogs to Jovian S-bursts. Sub-bursts on the second day, however, reveal a different blob-like shape with random occurrence patterns and are akin to solar radio spikes. The new observational results suggest that the intense emission from AD Leo is driven by electron cyclotron maser instability which may be related to stellar flares or interactions with a planetary companion.Comment: 25 pages, 12 figures, accepted for publication in Ap
    • …
    corecore