14,357 research outputs found

    Multi-level Attention Model for Weakly Supervised Audio Classification

    Get PDF
    In this paper, we propose a multi-level attention model to solve the weakly labelled audio classification problem. The objective of audio classification is to predict the presence or absence of audio events in an audio clip. Recently, Google published a large scale weakly labelled dataset called Audio Set, where each audio clip contains only the presence or absence of the audio events, without the onset and offset time of the audio events. Our multi-level attention model is an extension to the previously proposed single-level attention model. It consists of several attention modules applied on intermediate neural network layers. The output of these attention modules are concatenated to a vector followed by a multi-label classifier to make the final prediction of each class. Experiments shown that our model achieves a mean average precision (mAP) of 0.360, outperforms the state-of-the-art single-level attention model of 0.327 and Google baseline of 0.314.Comment: 5 pages, 3 figures, Submitted to Eusipco 201

    Bridging the Granularity Gap for Acoustic Modeling

    Full text link
    While Transformer has become the de-facto standard for speech, modeling upon the fine-grained frame-level features remains an open challenge of capturing long-distance dependencies and distributing the attention weights. We propose \textit{Progressive Down-Sampling} (PDS) which gradually compresses the acoustic features into coarser-grained units containing more complete semantic information, like text-level representation. In addition, we develop a representation fusion method to alleviate information loss that occurs inevitably during high compression. In this way, we compress the acoustic features into 1/32 of the initial length while achieving better or comparable performances on the speech recognition task. And as a bonus, it yields inference speedups ranging from 1.20×\times to 1.47×\times. By reducing the modeling burden, we also achieve competitive results when training on the more challenging speech translation task.Comment: ACL 2023 Finding

    First impressions: A survey on vision-based apparent personality trait analysis

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.Peer ReviewedPostprint (author's final draft

    Predictability of catastrophic events: material rupture, earthquakes, turbulence, financial crashes and human birth

    Full text link
    We propose that catastrophic events are "outliers" with statistically different properties than the rest of the population and result from mechanisms involving amplifying critical cascades. Applications and the potential for prediction are discussed in relation to the rupture of composite materials, great earthquakes, turbulence and abrupt changes of weather regimes, financial crashes and human parturition (birth).Comment: Latex document of 22 pages including 6 ps figures, in press in PNA

    Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

    Full text link
    Automatic speech recognition (ASR) has recently become an important challenge when using deep learning (DL). It requires large-scale training datasets and high computational and storage resources. Moreover, DL techniques and machine learning (ML) approaches in general, hypothesize that training and testing data come from the same domain, with the same input feature space and data distribution characteristics. This assumption, however, is not applicable in some real-world artificial intelligence (AI) applications. Moreover, there are situations where gathering real data is challenging, expensive, or rarely occurring, which can not meet the data requirements of DL models. deep transfer learning (DTL) has been introduced to overcome these issues, which helps develop high-performing models using real datasets that are small or slightly different but related to the training data. This paper presents a comprehensive survey of DTL-based ASR frameworks to shed light on the latest developments and helps academics and professionals understand current challenges. Specifically, after presenting the DTL background, a well-designed taxonomy is adopted to inform the state-of-the-art. A critical analysis is then conducted to identify the limitations and advantages of each framework. Moving on, a comparative study is introduced to highlight the current challenges before deriving opportunities for future research

    Attention-Block Deep Learning Based Features Fusion in Wearable Social Sensor for Mental Wellbeing Evaluations

    Get PDF
    With the progressive increase of stress, anxiety and depression in working and living environment, mental health assessment becomes an important social interaction research topic. Generally, clinicians evaluate the psychology of participants through an effective psychological evaluation and questionnaires. However, these methods suffer from subjectivity and memory effects. In this paper, a new multi- sensing wearable device has been developed and applied in self-designed psychological tests. Speech under different emotions as well as behavior signals are captured and analyzed. The mental state of the participants is objectively assessed through a group of psychological questionnaires. In particular, we propose an attention-based block deep learning architecture within the device for multi-feature classification and fusion analysis. This enables the deep learning architecture to autonomously train to obtain the optimum fusion weights of different domain features. The proposed attention-based architecture has led to improving performance compared with direct connecting fusion method. Experimental studies have been carried out in order to verify the effectiveness and robustness of the proposed architecture. The obtained results have shown that the wearable multi-sensing devices equipped with the attention-based block deep learning architecture can effectively classify mental state with better performance

    Earthquakes: from chemical alteration to mechanical rupture

    Full text link
    In the standard rebound theory of earthquakes, elastic deformation energy is progressively stored in the crust until a threshold is reached at which it is suddenly released in an earthquake. We review three important paradoxes, the strain paradox, the stress paradox and the heat flow paradox, that are difficult to account for in this picture, either individually or when taken together. Resolutions of these paradoxes usually call for additional assumptions on the nature of the rupture process (such as novel modes of deformations and ruptures) prior to and/or during an earthquake, on the nature of the fault and on the effect of trapped fluids within the crust at seismogenic depths. We review the evidence for the essential importance of water and its interaction with the modes of deformations. Water is usually seen to have mainly the mechanical effect of decreasing the normal lithostatic stress in the fault core on one hand and to weaken rock materials via hydrolytic weakening and stress corrosion on the other hand. We also review the evidences that water plays a major role in the alteration of minerals subjected to finite strains into other structures in out-of-equilibrium conditions. This suggests novel exciting routes to understand what is an earthquake, that requires to develop a truly multidisciplinary approach involving mineral chemistry, geology, rupture mechanics and statistical physics.Comment: 44 pages, 1 figures, submitted to Physics Report

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

    Prestressing wire breakage monitoring using sound event detection

    Get PDF
    Detecting prestressed wire breakage in concrete bridges is essential for ensuring safety and longevity and preventing catastrophic failures. This study proposes a novel approach for wire breakage detection using Mel-frequency cepstral coefficients (MFCCs) and back-propagation neural network (BPNN). Experimental data from two bridges in Italy were acquired to train and test the models. To overcome the limited availability of real-world training data, data augmentation techniques were employed to increase the data set size, enhancing the capability of the models and preventing over-fitting problems. The proposed method uses MFCCs to extract features from acoustic emission signals produced by wire breakage, which are then classified by the BPNN. The results show that the proposed method can detect and classify sound events effectively, demonstrating the promising potential of BPNN for real-time monitoring and diagnosis of bridges. The significance of this work lies in its contribution to improving bridge safety and preventing catastrophic failures. The combination of MFCCs and BPNN offers a new approach to wire breakage detection, while the use of real-world data and data augmentation techniques are significant contributions to overcoming the limited availability of training data. The proposed method has the potential to be a generalized and robust model for real-time monitoring of bridges, ultimately leading to safer and longer-lasting infrastructure
    corecore