Search CORE

14,357 research outputs found

Multi-level Attention Model for Weakly Supervised Audio Classification

Author: Barsim Karim Said
Kong Qiuqiang
Yang Bin
Yu Changsong
Publication venue
Publication date: 01/01/2018
Field of study

In this paper, we propose a multi-level attention model to solve the weakly labelled audio classification problem. The objective of audio classification is to predict the presence or absence of audio events in an audio clip. Recently, Google published a large scale weakly labelled dataset called Audio Set, where each audio clip contains only the presence or absence of the audio events, without the onset and offset time of the audio events. Our multi-level attention model is an extension to the previously proposed single-level attention model. It consists of several attention modules applied on intermediate neural network layers. The output of these attention modules are concatenated to a vector followed by a multi-label classifier to make the final prediction of each class. Experiments shown that our model achieves a mean average precision (mAP) of 0.360, outperforms the state-of-the-art single-level attention model of 0.327 and Google baseline of 0.314.Comment: 5 pages, 3 figures, Submitted to Eusipco 201

arXiv.org e-Print Archive

University of Surrey

Surrey Research Insight

Bridging the Granularity Gap for Acoustic Modeling

Author: Hu Chi
Jiao Chengbo
Liu Xiaoqian
Ma Anxiang
Wang Huizhen
Xiao Tong
Xu Chen
Zeng Xin
Zhang Yuhao
Zhu JingBo
Publication venue
Publication date: 26/05/2023
Field of study

While Transformer has become the de-facto standard for speech, modeling upon the fine-grained frame-level features remains an open challenge of capturing long-distance dependencies and distributing the attention weights. We propose \textit{Progressive Down-Sampling} (PDS) which gradually compresses the acoustic features into coarser-grained units containing more complete semantic information, like text-level representation. In addition, we develop a representation fusion method to alleviate information loss that occurs inevitably during high compression. In this way, we compress the acoustic features into 1/32 of the initial length while achieving better or comparable performances on the speech recognition task. And as a bonus, it yields inference speedups ranging from 1.20

\times

to 1.47

\times

. By reducing the modeling burden, we also achieve competitive results when training on the more challenging speech translation task.Comment: ACL 2023 Finding

arXiv.org e-Print Archive

First impressions: A survey on vision-based apparent personality trait analysis

Author: Andújar Gran Carlos Antonio
Baró Solé Xavier
Escalante Balderas Hugo Jair
Escalera Guerrero Sergio
Guyon Isabelle
Güçlü Umut
Güçlütürk Yagmur
Jacques Junior Julio
Pérez Quintana Marc
van Gerven Marcel A. J.
van Lier Rob
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

VBN

Radboud Repository

Predictability of catastrophic events: material rupture, earthquakes, turbulence, financial crashes and human birth

Author: Broecker
D. Sornette
Geller
Gould
Jones
Lamaign re
McWilliams
Papiernik
Sornette
Sornette
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 09/07/2001
Field of study

We propose that catastrophic events are "outliers" with statistically different properties than the rest of the population and result from mechanisms involving amplifying critical cascades. Applications and the potential for prediction are discussed in relation to the rupture of composite materials, great earthquakes, turbulence and abrupt changes of weather regimes, financial crashes and human parturition (birth).Comment: Latex document of 22 pages including 6 ps figures, in press in PNA

arXiv.org e-Print Archive

Crossref

HAL-UNICE

PubMed Central

Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

Author: Al-Maadeed Somaya
Amira Abbes
Bensaali Faycal
Himeur Yassine
Kheddar Hamza
Publication venue
Publication date: 27/04/2023
Field of study

Automatic speech recognition (ASR) has recently become an important challenge when using deep learning (DL). It requires large-scale training datasets and high computational and storage resources. Moreover, DL techniques and machine learning (ML) approaches in general, hypothesize that training and testing data come from the same domain, with the same input feature space and data distribution characteristics. This assumption, however, is not applicable in some real-world artificial intelligence (AI) applications. Moreover, there are situations where gathering real data is challenging, expensive, or rarely occurring, which can not meet the data requirements of DL models. deep transfer learning (DTL) has been introduced to overcome these issues, which helps develop high-performing models using real datasets that are small or slightly different but related to the training data. This paper presents a comprehensive survey of DTL-based ASR frameworks to shed light on the latest developments and helps academics and professionals understand current challenges. Specifically, after presenting the DTL background, a well-designed taxonomy is adopted to inform the state-of-the-art. A critical analysis is then conducted to identify the limitations and advantages of each framework. Moving on, a comparative study is introduced to highlight the current challenges before deriving opportunities for future research

arXiv.org e-Print Archive

Attention-Block Deep Learning Based Features Fusion in Wearable Social Sensor for Mental Wellbeing Evaluations

Author: Gao Bin
Jin Jikun
Luo Lizhu
Woo Wai Lok
Yang Sihao
Zhao Bingmei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/05/2020
Field of study

With the progressive increase of stress, anxiety and depression in working and living environment, mental health assessment becomes an important social interaction research topic. Generally, clinicians evaluate the psychology of participants through an effective psychological evaluation and questionnaires. However, these methods suffer from subjectivity and memory effects. In this paper, a new multi- sensing wearable device has been developed and applied in self-designed psychological tests. Speech under different emotions as well as behavior signals are captured and analyzed. The mental state of the participants is objectively assessed through a group of psychological questionnaires. In particular, we propose an attention-based block deep learning architecture within the device for multi-feature classification and fusion analysis. This enables the deep learning architecture to autonomously train to obtain the optimum fusion weights of different domain features. The proposed attention-based architecture has led to improving performance compared with direct connecting fusion method. Experimental studies have been carried out in order to verify the effectiveness and robustness of the proposed architecture. The obtained results have shown that the wearable multi-sensing devices equipped with the attention-based block deep learning architecture can effectively classify mental state with better performance

Northumbria Research Link

Earthquakes: from chemical alteration to mechanical rupture

Author: Abercrombie
Aines
Aki
Aki
Aki
Alekseevskaya
Anderson
Anderson
Andrews
Anooshehpoor
Atkinson
Backus
Backus
Badro
Bailey
Bak
Bambauer
Bardet
Barton
Baumberger
Beeler
Beeler
Beeman
Behrmann
Ben-Zion
Ben-Zion
Beroza
Biarez
Bird
Bird
Blacic
Blanpied
Blanpied
Bocquet
Bolt
Bowden
Bowman
Brace
Brace
Brehm
Brodie
Bruhn
Brune
Bufe
Bufe
Burridge
Burst
Byerlee
Byerlee
Campillo
Carlson
Caroli
Caroli
Chen
Chester
Chester
Chirone
Choy
Cochard
Cochard
Cotton
Cowie
Dahl
David
Dieterich
Dieterich
Dieterich
Dieterich
Dieterich
Dieterich
Dieterich
Dieterich
Dodge
Ellsworth
Ellworth
Etheridge
Evans
Evans
Fisher
Fleischmann
Freiman
Freund
Frischbutter
Frohlich
Frohlich
Gabrielov
Gavrilenko
Geller
Gilbert
Gokhberg
Grasso
Green II
Griggs
Griggs
Gu
Hadizadeh
Harrison
Heaton
Heggie
Heggie
Heggie
Heine
Henyey
Herrmann
Heuze
Hickman
Hickman
Hill
Hobbs
Hobbs
Hobbs
Houston
Ishimaru
Ito
Janecke
Jeffreys
Jensen
Johansen
Jones
Jones
Jordan
Kagan
Kagan
Kanamori
Kanamori
Kanamori
Keilis-Borok
King
King
King
Kirby
Kirby
Klein
Knipe
Knipe
Knopoff
Knopoff
Knopoff
Knopoff
Knopoff
Kronenberg
Kuge
Lachenbruch
Lachenbruch
Lachenbruch
Lachenbruch
Lachenbruch
Langer
Li
Linde
Lindh
Lockner
Lomnitz
Lomnitz-Adler
Lonsdale
Madariaga
Maeda
Main
Massonnet
Massonnet
Melosh
Miltenberger
Mogi
Moore
Mora
Mori
Morrow
Mulargia
Nicolas
Nitsan
Nur
O'Neil
Ogawa
Ouillon
Paquet
Paterson
Paterson
Peacock
Pearson
Pecher
Pietronero
Pinkston
Pisarenko
Poole
Poole
Poole
Poole
Poole
Post
Purton
Ranalli
Rao
Renard
Renard
Rice
Rice
Robert
Rubie
Ruina
Rundle
Russo
Rydelek
Saleur
Salje
Salje
Salje
Sass
Schallamach
Schmittbuhl
Schmittbuhl
Scholz
Scholz
Scholz
Scholz
Scholz
Scholz
Scholz
Sciortino
Scott
Scott
Segall
Shaw
Shaw
Shaw
Shaw
Shen
Shen
Sibson
Sibson
Sibson
Sibson
Silver
Sleep
Smirnov
Smith
Snay
Sokoloff
Sornette
Sornette
Sornette
Sornette
Sornette
Sornette
Sornette
Sornette
Stel
Streit
Sturtevant
Sykes
Tanguy
Tanguy
Tsunogai
Tsunogai
Tsuruoka
Tsutsumi
Tsutsumi
Turcotte
Turner
Van Albada
Van Tiggelen
Varotsos
Vernon
Walcott
Walcott
Walmann
Weber
Westwood
Westwood
Wintsch
Wyss
Wyss
Xu
Zhang
Zhao
Zoback
Zoback
Zoback
Publication venue: 'Elsevier BV'
Publication date: 22/07/1998
Field of study

In the standard rebound theory of earthquakes, elastic deformation energy is progressively stored in the crust until a threshold is reached at which it is suddenly released in an earthquake. We review three important paradoxes, the strain paradox, the stress paradox and the heat flow paradox, that are difficult to account for in this picture, either individually or when taken together. Resolutions of these paradoxes usually call for additional assumptions on the nature of the rupture process (such as novel modes of deformations and ruptures) prior to and/or during an earthquake, on the nature of the fault and on the effect of trapped fluids within the crust at seismogenic depths. We review the evidence for the essential importance of water and its interaction with the modes of deformations. Water is usually seen to have mainly the mechanical effect of decreasing the normal lithostatic stress in the fault core on one hand and to weaken rock materials via hydrolytic weakening and stress corrosion on the other hand. We also review the evidences that water plays a major role in the alteration of minerals subjected to finite strains into other structures in out-of-equilibrium conditions. This suggests novel exciting routes to understand what is an earthquake, that requires to develop a truly multidisciplinary approach involving mineral chemistry, geology, rupture mechanics and statistical physics.Comment: 44 pages, 1 figures, submitted to Physics Report

arXiv.org e-Print Archive

Crossref

Artificial Intelligence for Multimedia Signal Processing

Author
Publication venue: 'MDPI AG'
Publication date: 16/09/2022
Field of study

Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

Directory of Open Access Books (DOAB)

Prestressing wire breakage monitoring using sound event detection

Author: Borla O.
Corrado M.
Farhadi S.
Ventura G.
Publication venue: WILEY
Publication date: 01/01/2023
Field of study

Detecting prestressed wire breakage in concrete bridges is essential for ensuring safety and longevity and preventing catastrophic failures. This study proposes a novel approach for wire breakage detection using Mel-frequency cepstral coefficients (MFCCs) and back-propagation neural network (BPNN). Experimental data from two bridges in Italy were acquired to train and test the models. To overcome the limited availability of real-world training data, data augmentation techniques were employed to increase the data set size, enhancing the capability of the models and preventing over-fitting problems. The proposed method uses MFCCs to extract features from acoustic emission signals produced by wire breakage, which are then classified by the BPNN. The results show that the proposed method can detect and classify sound events effectively, demonstrating the promising potential of BPNN for real-time monitoring and diagnosis of bridges. The significance of this work lies in its contribution to improving bridge safety and preventing catastrophic failures. The combination of MFCCs and BPNN offers a new approach to wire breakage detection, while the use of real-world data and data augmentation techniques are significant contributions to overcoming the limited availability of training data. The proposed method has the potential to be a generalized and robust model for real-time monitoring of bridges, ultimately leading to safer and longer-lasting infrastructure

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Recommended from our members

The role of HG in the analysis of temporal iteration and interaural correlation

Author: Barrett DJK
Hall DA
Publication venue
Publication date: 01/01/2004
Field of study

Nottingham Trent Institutional Repository (IRep)