405 research outputs found
The pre-stack migration imaging technique for damages identification in concrete structures
AbstractPre-stack migration imaging (PMI) method, which is used in geophysical exploration by the performance of single side detection and visually display, can be used to identify the location, orientation, and severity of damages in concrete structure. In particular, this letter focuses on the experimental study by using a finite number of sensors for further practical applications. A concrete structure with a surface-mounted linear PZT transducers array is illustrated. Three types of damages, horizontal, dipping and V-shaped crack damage, have been studied. A pre-stack reverse time migration technique is used to back-propagate the scattering waves and to image damages in concrete structure. The migration results from the scattering waves of an artificial damage are presented. It is shown that the existence of the damage in concrete structure is correctly revealed through migration process
Focusing Modeling of OPFC Linear Array Transducer by Using Distributed Point Source Method
The improvement of ultrasonic phased array detection technology is a major concern of engineering community. Orthotropic piezoelectric fiber composite (OPFC) can be constructed to multielement linear array which may be applied conveniently to actuators and sensors. The phased array transducers can generate special directional strong actuator power and high sensitivity for its orthotropic performance. Focusing beam of the linear phased array transducer is obtained simply only by adjusting a parabolic time delay. In this work, the distributed point source method (DPSM) is used to model the ultrasonic field. DPSM is a newly developed mesh-free numerical technique that has been developed for solving a variety of engineering problems. This work gives the basic theory of this method and solves the problems from the application of new OPFC phased array transducer. Compared with traditional transducer, the interaction effect of two OPFC linear phased array transducers is also modeled in the same medium, which shows that the pressure beam produced by the new transducer is narrower or more collimated than that produced by the conventional transducer at different angles. DPSM can be used to analyze and optimally design the OPFC linear phased array transducer
Attention-enhanced connectionist temporal classification for discrete speech emotion recognition
Discrete speech emotion recognition (SER), the assignment of a single emotion label to an entire speech utterance, is typically performed as a sequence-to-label task. This approach, however, is limited, in that it can result in models that do not capture temporal changes in the speech signal, including those indicative of a particular emotion. One potential solution to overcome this limitation is to model SER as a sequence-to-sequence task instead. In this regard, we have developed an attention-based bidirectional long short-term memory (BLSTM) neural network in combination with a connectionist temporal classification (CTC) objective function (Attention-BLSTM-CTC) for SER. We also assessed the benefits of incorporating two contemporary attention mechanisms, namely component attention and quantum attention, into the CTC framework. To the best of the authors’ knowledge, this is the first time that such a hybrid architecture has been employed for SER.We demonstrated the effectiveness of our approach on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) and FAU-Aibo Emotion corpora. The experimental results demonstrate that our proposed model outperforms current state-of-the-art approaches.The work presented in this paper substantially supported by the National Natural Science Foundation of China (Grant No. 61702370), the Key Program of the Natural Science Foundation of Tianjin (Grant No. 18JCZDJC36300), the Open Projects Program of the National Laboratory of Pattern Recognition, and the Senior Visiting Scholar Program of Tianjin Normal University.
Interspeech 2019
ISSN: 1990-977
YOLO-FaceV2: A Scale and Occlusion Aware Face Detector
In recent years, face detection algorithms based on deep learning have made
great progress. These algorithms can be generally divided into two categories,
i.e. two-stage detector like Faster R-CNN and one-stage detector like YOLO.
Because of the better balance between accuracy and speed, one-stage detectors
have been widely used in many applications. In this paper, we propose a
real-time face detector based on the one-stage detector YOLOv5, named
YOLO-FaceV2. We design a Receptive Field Enhancement module called RFE to
enhance receptive field of small face, and use NWD Loss to make up for the
sensitivity of IoU to the location deviation of tiny objects. For face
occlusion, we present an attention module named SEAM and introduce Repulsion
Loss to solve it. Moreover, we use a weight function Slide to solve the
imbalance between easy and hard samples and use the information of the
effective receptive field to design the anchor. The experimental results on
WiderFace dataset show that our face detector outperforms YOLO and its variants
can be find in all easy, medium and hard subsets. Source code in
https://github.com/Krasjet-Yu/YOLO-FaceV
Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition
Automatic emotion recognition from speech, which is an important and challenging task in the field of affective computing, heavily relies on the effectiveness of the speech features for classification. Previous approaches to emotion recognition have mostly focused on the extraction of carefully hand-crafted features. How to model spatio-temporal dynamics for speech emotion recognition effectively is still under active investigation. In this paper, we propose a method to tackle the problem of emotional relevant feature extraction from speech by leveraging Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks with fully convolutional networks in order to automatically learn the best spatio-temporal representations of speech signals. The learned high-level features are then fed into a deep neural network (DNN) to predict the final emotion. The experimental results on the Chinese Natural Audio-Visual Emotion Database (CHEAVD) and the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpora show that our method provides more accurate predictions compared with other existing emotion recognition algorithms
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining
Vision-language foundation models like CLIP have revolutionized the field of
artificial intelligence. Nevertheless, VLM models supporting multi-language,
e.g., in both Chinese and English, have lagged due to the relative scarcity of
large-scale pretraining datasets. Toward this end, we introduce a comprehensive
bilingual (Chinese-English) dataset BM-6B with over 6 billion image-text pairs,
aimed at enhancing multimodal foundation models to well understand images in
both languages. To handle such a scale of dataset, we propose a novel grouped
aggregation approach for image-text contrastive loss computation, which reduces
the communication overhead and GPU memory demands significantly, facilitating a
60% increase in training speed. We pretrain a series of bilingual image-text
foundation models with an enhanced fine-grained understanding ability on BM-6B,
the resulting models, dubbed as -Encoders (pronounced "M-Square"), set new
benchmarks in both languages for multimodal retrieval and classification tasks.
Notably, Our largest -Encoder-10B model has achieved top-1 accuracies of
88.5% on ImageNet and 80.7% on ImageNet-CN under a zero-shot classification
setting, surpassing previously reported SoTA methods by 2.2% and 21.1%,
respectively. The -Encoder series represents one of the most comprehensive
bilingual image-text foundation models to date, so we are making it available
to the research community for further exploration and development
- …