14,357 research outputs found
Multi-level Attention Model for Weakly Supervised Audio Classification
In this paper, we propose a multi-level attention model to solve the weakly
labelled audio classification problem. The objective of audio classification is
to predict the presence or absence of audio events in an audio clip. Recently,
Google published a large scale weakly labelled dataset called Audio Set, where
each audio clip contains only the presence or absence of the audio events,
without the onset and offset time of the audio events. Our multi-level
attention model is an extension to the previously proposed single-level
attention model. It consists of several attention modules applied on
intermediate neural network layers. The output of these attention modules are
concatenated to a vector followed by a multi-label classifier to make the final
prediction of each class. Experiments shown that our model achieves a mean
average precision (mAP) of 0.360, outperforms the state-of-the-art single-level
attention model of 0.327 and Google baseline of 0.314.Comment: 5 pages, 3 figures, Submitted to Eusipco 201
Bridging the Granularity Gap for Acoustic Modeling
While Transformer has become the de-facto standard for speech, modeling upon
the fine-grained frame-level features remains an open challenge of capturing
long-distance dependencies and distributing the attention weights. We propose
\textit{Progressive Down-Sampling} (PDS) which gradually compresses the
acoustic features into coarser-grained units containing more complete semantic
information, like text-level representation. In addition, we develop a
representation fusion method to alleviate information loss that occurs
inevitably during high compression. In this way, we compress the acoustic
features into 1/32 of the initial length while achieving better or comparable
performances on the speech recognition task. And as a bonus, it yields
inference speedups ranging from 1.20 to 1.47. By reducing the
modeling burden, we also achieve competitive results when training on the more
challenging speech translation task.Comment: ACL 2023 Finding
First impressions: A survey on vision-based apparent personality trait analysis
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.Peer ReviewedPostprint (author's final draft
Predictability of catastrophic events: material rupture, earthquakes, turbulence, financial crashes and human birth
We propose that catastrophic events are "outliers" with statistically
different properties than the rest of the population and result from mechanisms
involving amplifying critical cascades. Applications and the potential for
prediction are discussed in relation to the rupture of composite materials,
great earthquakes, turbulence and abrupt changes of weather regimes, financial
crashes and human parturition (birth).Comment: Latex document of 22 pages including 6 ps figures, in press in PNA
Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization
Automatic speech recognition (ASR) has recently become an important challenge
when using deep learning (DL). It requires large-scale training datasets and
high computational and storage resources. Moreover, DL techniques and machine
learning (ML) approaches in general, hypothesize that training and testing data
come from the same domain, with the same input feature space and data
distribution characteristics. This assumption, however, is not applicable in
some real-world artificial intelligence (AI) applications. Moreover, there are
situations where gathering real data is challenging, expensive, or rarely
occurring, which can not meet the data requirements of DL models. deep transfer
learning (DTL) has been introduced to overcome these issues, which helps
develop high-performing models using real datasets that are small or slightly
different but related to the training data. This paper presents a comprehensive
survey of DTL-based ASR frameworks to shed light on the latest developments and
helps academics and professionals understand current challenges. Specifically,
after presenting the DTL background, a well-designed taxonomy is adopted to
inform the state-of-the-art. A critical analysis is then conducted to identify
the limitations and advantages of each framework. Moving on, a comparative
study is introduced to highlight the current challenges before deriving
opportunities for future research
Attention-Block Deep Learning Based Features Fusion in Wearable Social Sensor for Mental Wellbeing Evaluations
With the progressive increase of stress, anxiety and depression in working and living environment, mental health assessment becomes an important social interaction research topic. Generally, clinicians evaluate the psychology of participants through an effective psychological evaluation and questionnaires. However, these methods suffer from subjectivity and memory effects. In this paper, a new multi- sensing wearable device has been developed and applied in self-designed psychological tests. Speech under different emotions as well as behavior signals are captured and analyzed. The mental state of the participants is objectively assessed through a group of psychological questionnaires. In particular, we propose an attention-based block deep learning architecture within the device for multi-feature classification and fusion analysis. This enables the deep learning architecture to autonomously train to obtain the optimum fusion weights of different domain features. The proposed attention-based architecture has led to improving performance compared with direct connecting fusion method. Experimental studies have been carried out in order to verify the effectiveness and robustness of the proposed architecture. The obtained results have shown that the wearable multi-sensing devices equipped with the attention-based block deep learning architecture can effectively classify mental state with better performance
Earthquakes: from chemical alteration to mechanical rupture
In the standard rebound theory of earthquakes, elastic deformation energy is
progressively stored in the crust until a threshold is reached at which it is
suddenly released in an earthquake. We review three important paradoxes, the
strain paradox, the stress paradox and the heat flow paradox, that are
difficult to account for in this picture, either individually or when taken
together. Resolutions of these paradoxes usually call for additional
assumptions on the nature of the rupture process (such as novel modes of
deformations and ruptures) prior to and/or during an earthquake, on the nature
of the fault and on the effect of trapped fluids within the crust at
seismogenic depths. We review the evidence for the essential importance of
water and its interaction with the modes of deformations. Water is usually seen
to have mainly the mechanical effect of decreasing the normal lithostatic
stress in the fault core on one hand and to weaken rock materials via
hydrolytic weakening and stress corrosion on the other hand. We also review the
evidences that water plays a major role in the alteration of minerals subjected
to finite strains into other structures in out-of-equilibrium conditions. This
suggests novel exciting routes to understand what is an earthquake, that
requires to develop a truly multidisciplinary approach involving mineral
chemistry, geology, rupture mechanics and statistical physics.Comment: 44 pages, 1 figures, submitted to Physics Report
Artificial Intelligence for Multimedia Signal Processing
Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining
Prestressing wire breakage monitoring using sound event detection
Detecting prestressed wire breakage in concrete bridges is essential for ensuring safety and longevity and preventing catastrophic failures. This study proposes a novel approach for wire breakage detection using Mel-frequency cepstral coefficients (MFCCs) and back-propagation neural network (BPNN). Experimental data from two bridges in Italy were acquired to train and test the models. To overcome the limited availability of real-world training data, data augmentation techniques were employed to increase the data set size, enhancing the capability of the models and preventing over-fitting problems. The proposed method uses MFCCs to extract features from acoustic emission signals produced by wire breakage, which are then classified by the BPNN. The results show that the proposed method can detect and classify sound events effectively, demonstrating the promising potential of BPNN for real-time monitoring and diagnosis of bridges. The significance of this work lies in its contribution to improving bridge safety and preventing catastrophic failures. The combination of MFCCs and BPNN offers a new approach to wire breakage detection, while the use of real-world data and data augmentation techniques are significant contributions to overcoming the limited availability of training data. The proposed method has the potential to be a generalized and robust model for real-time monitoring of bridges, ultimately leading to safer and longer-lasting infrastructure
Recommended from our members
The role of HG in the analysis of temporal iteration and interaural correlation
- …