405 research outputs found

    The pre-stack migration imaging technique for damages identification in concrete structures

    Get PDF
    AbstractPre-stack migration imaging (PMI) method, which is used in geophysical exploration by the performance of single side detection and visually display, can be used to identify the location, orientation, and severity of damages in concrete structure. In particular, this letter focuses on the experimental study by using a finite number of sensors for further practical applications. A concrete structure with a surface-mounted linear PZT transducers array is illustrated. Three types of damages, horizontal, dipping and V-shaped crack damage, have been studied. A pre-stack reverse time migration technique is used to back-propagate the scattering waves and to image damages in concrete structure. The migration results from the scattering waves of an artificial damage are presented. It is shown that the existence of the damage in concrete structure is correctly revealed through migration process

    Focusing Modeling of OPFC Linear Array Transducer by Using Distributed Point Source Method

    Get PDF
    The improvement of ultrasonic phased array detection technology is a major concern of engineering community. Orthotropic piezoelectric fiber composite (OPFC) can be constructed to multielement linear array which may be applied conveniently to actuators and sensors. The phased array transducers can generate special directional strong actuator power and high sensitivity for its orthotropic performance. Focusing beam of the linear phased array transducer is obtained simply only by adjusting a parabolic time delay. In this work, the distributed point source method (DPSM) is used to model the ultrasonic field. DPSM is a newly developed mesh-free numerical technique that has been developed for solving a variety of engineering problems. This work gives the basic theory of this method and solves the problems from the application of new OPFC phased array transducer. Compared with traditional transducer, the interaction effect of two OPFC linear phased array transducers is also modeled in the same medium, which shows that the pressure beam produced by the new transducer is narrower or more collimated than that produced by the conventional transducer at different angles. DPSM can be used to analyze and optimally design the OPFC linear phased array transducer

    Attention-enhanced connectionist temporal classification for discrete speech emotion recognition

    Get PDF
    Discrete speech emotion recognition (SER), the assignment of a single emotion label to an entire speech utterance, is typically performed as a sequence-to-label task. This approach, however, is limited, in that it can result in models that do not capture temporal changes in the speech signal, including those indicative of a particular emotion. One potential solution to overcome this limitation is to model SER as a sequence-to-sequence task instead. In this regard, we have developed an attention-based bidirectional long short-term memory (BLSTM) neural network in combination with a connectionist temporal classification (CTC) objective function (Attention-BLSTM-CTC) for SER. We also assessed the benefits of incorporating two contemporary attention mechanisms, namely component attention and quantum attention, into the CTC framework. To the best of the authors’ knowledge, this is the first time that such a hybrid architecture has been employed for SER.We demonstrated the effectiveness of our approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) and FAU-Aibo Emotion corpora. The experimental results demonstrate that our proposed model outperforms current state-of-the-art approaches.The work presented in this paper substantially supported by the National Natural Science Foundation of China (Grant No. 61702370), the Key Program of the Natural Science Foundation of Tianjin (Grant No. 18JCZDJC36300), the Open Projects Program of the National Laboratory of Pattern Recognition, and the Senior Visiting Scholar Program of Tianjin Normal University. Interspeech 2019 ISSN: 1990-977

    YOLO-FaceV2: A Scale and Occlusion Aware Face Detector

    Full text link
    In recent years, face detection algorithms based on deep learning have made great progress. These algorithms can be generally divided into two categories, i.e. two-stage detector like Faster R-CNN and one-stage detector like YOLO. Because of the better balance between accuracy and speed, one-stage detectors have been widely used in many applications. In this paper, we propose a real-time face detector based on the one-stage detector YOLOv5, named YOLO-FaceV2. We design a Receptive Field Enhancement module called RFE to enhance receptive field of small face, and use NWD Loss to make up for the sensitivity of IoU to the location deviation of tiny objects. For face occlusion, we present an attention module named SEAM and introduce Repulsion Loss to solve it. Moreover, we use a weight function Slide to solve the imbalance between easy and hard samples and use the information of the effective receptive field to design the anchor. The experimental results on WiderFace dataset show that our face detector outperforms YOLO and its variants can be find in all easy, medium and hard subsets. Source code in https://github.com/Krasjet-Yu/YOLO-FaceV

    Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition

    Get PDF
    Automatic emotion recognition from speech, which is an important and challenging task in the field of affective computing, heavily relies on the effectiveness of the speech features for classification. Previous approaches to emotion recognition have mostly focused on the extraction of carefully hand-crafted features. How to model spatio-temporal dynamics for speech emotion recognition effectively is still under active investigation. In this paper, we propose a method to tackle the problem of emotional relevant feature extraction from speech by leveraging Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks with fully convolutional networks in order to automatically learn the best spatio-temporal representations of speech signals. The learned high-level features are then fed into a deep neural network (DNN) to predict the final emotion. The experimental results on the Chinese Natural Audio-Visual Emotion Database (CHEAVD) and the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpora show that our method provides more accurate predictions compared with other existing emotion recognition algorithms

    Hierarchical attention transfer networks for depression assessment from speech

    Get PDF

    M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

    Full text link
    Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence. Nevertheless, VLM models supporting multi-language, e.g., in both Chinese and English, have lagged due to the relative scarcity of large-scale pretraining datasets. Toward this end, we introduce a comprehensive bilingual (Chinese-English) dataset BM-6B with over 6 billion image-text pairs, aimed at enhancing multimodal foundation models to well understand images in both languages. To handle such a scale of dataset, we propose a novel grouped aggregation approach for image-text contrastive loss computation, which reduces the communication overhead and GPU memory demands significantly, facilitating a 60% increase in training speed. We pretrain a series of bilingual image-text foundation models with an enhanced fine-grained understanding ability on BM-6B, the resulting models, dubbed as M2M^2-Encoders (pronounced "M-Square"), set new benchmarks in both languages for multimodal retrieval and classification tasks. Notably, Our largest M2M^2-Encoder-10B model has achieved top-1 accuracies of 88.5% on ImageNet and 80.7% on ImageNet-CN under a zero-shot classification setting, surpassing previously reported SoTA methods by 2.2% and 21.1%, respectively. The M2M^2-Encoder series represents one of the most comprehensive bilingual image-text foundation models to date, so we are making it available to the research community for further exploration and development
    corecore