Search CORE

875 research outputs found

Deep Learning for Audio Signal Processing

Author: Chang Shuo-yiin
Li Bo
Purwins Hendrik
Sainath Tara
Schlüter Jan
Virtanen Tuomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

arXiv.org e-Print Archive

VBN

Spectro-temporal modelling for human activity recognition using a radar sensor network

Author: Bodanese E
Khan S
Luo F
Wu K
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/04/2023
Field of study

Queen Mary Research Online

TeCNO: Surgical Phase Recognition with Multi-Stage Temporal Convolutional Networks

Author: A Graves
A Huaulmé
A Newell
AP Twinanda
C Lea
G Lecuyer
H Al Hajj
I Funke
L Maier-Hein
N Padoy
N Padoy
O Zisimopoulos
S Bodenstedt
S Hochreiter
U Klank
Y Jin
Y Jin
Publication venue
Publication date: 24/03/2020
Field of study

Automatic surgical phase recognition is a challenging and crucial task with the potential to improve patient safety and become an integral part of intra-operative decision-support systems. In this paper, we propose, for the first time in workflow analysis, a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition. Causal, dilated convolutions allow for a large receptive field and online inference with smooth predictions even during ambiguous transitions. Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information. Outperforming various state-of-the-art LSTM approaches, we verify the suitability of the proposed causal MS-TCN for surgical phase recognition.Comment: 10 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Deep Learning in Cardiology

Author: Bizopoulos Paschalis
Koutsouris Dimitrios
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/02/2021
Field of study

The medical field is creating large amount of data that physicians are unable to decipher and use efficiently. Moreover, rule-based expert systems are inefficient in solving complicated medical tasks or for creating insights using big data. Deep learning has emerged as a more accurate and effective technology in a wide range of medical problems such as diagnosis, prediction and intervention. Deep learning is a representation learning method that consists of layers that transform the data non-linearly, thus, revealing hierarchical relationships and structures. In this review we survey deep learning application papers that use structured data, signal and imaging modalities from cardiology. We discuss the advantages and limitations of applying deep learning in cardiology that also apply in medicine in general, while proposing certain directions as the most viable for clinical use.Comment: 27 pages, 2 figures, 10 table

arXiv.org e-Print Archive

Enhanced Exploration of Neural Network Models for Indoor Human Monitoring

Author: Lavagno Luciano
Lazarescu Mihai T.
Subbicini Giorgia
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

Indoor human monitoring can enable or enhance a wide range of applications, from medical to security and home or building automation. For effective ubiquitous deployment, the monitoring system should be easy to install and unobtrusive, reliable, low cost, tagless, and privacy-aware. Long-range capacitive sensors are good candidates, but they can be susceptible to environmental electromagnetic noise and require special signal processing. Neural networks (NNs), especially 1D convolutional neural networks (1D-CNNs), excel at extracting information and rejecting noise, but they lose important relationships in max/average pooling operations. We investigate the performance of NN architectures for time series analysis without this shortcoming, the capsule networks that use dynamic routing, and the temporal convolutional networks (TCNs) that use dilated convolutions to preserve input resolution across layers and extend their receptive field with fewer layers. The networks are optimized for both inference accuracy and resource consumption using two independent state-of-the-art methods, neural architecture search and knowledge distillation. Experimental results show that the TCN architecture performs the best, achieving 12.7% lower inference loss with 73.3% less resource consumption than the best 1D-CNN when processing noisy capacitive sensor data for indoor human localization and tracking

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Local Temporal Bilinear Pooling for Fine-grained Action Parsing

Author: Jarvers Christian
Muandet Krikamol
Neumann Heiko
Tang Siyu
Zhang Yan
Publication venue
Publication date: 01/01/2019
Field of study

Fine-grained temporal action parsing is important in many applications, such as daily activity understanding, human motion analysis, surgical robotics and others requiring subtle and precise operations in a long-term period. In this paper we propose a novel bilinear pooling operation, which is used in intermediate layers of a temporal convolutional encoder-decoder net. In contrast to other work, our proposed bilinear pooling is learnable and hence can capture more complex local statistics than the conventional counterpart. In addition, we introduce exact lower-dimension representations of our bilinear forms, so that the dimensionality is reduced with neither information loss nor extra computation. We perform intensive experiments to quantitatively analyze our model and show the superior performances to other state-of-the-art work on various datasets.Comment: 11 pages, 2 figures. Cam.

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Simultaneous lesion and neuroanatomy segmentation in Multiple Sclerosis using deep neural networks

Author: Aschwanden Fabian
Chan Andrew
Grunder Lorenz
McKinley Richard
Muri Raphaela
Reyes Mauricio
Rummel Christian
Salmen Anke
Verma Rajeev
Wagner Franca
Weisstanner Christian
Wepfer Rik
Wiest Roland
Publication venue
Publication date: 06/05/2019
Field of study

Segmentation of both white matter lesions and deep grey matter structures is an important task in the quantification of magnetic resonance imaging in multiple sclerosis. Typically these tasks are performed separately: in this paper we present a single segmentation solution based on convolutional neural networks (CNNs) for providing fast, reliable segmentations of multimodal magnetic resonance images into lesion classes and normal-appearing grey- and white-matter structures. We show substantial, statistically significant improvements in both Dice coefficient and in lesion-wise specificity and sensitivity, compared to previous approaches, and agreement with individual human raters in the range of human inter-rater variability. The method is trained on data gathered from a single centre: nonetheless, it performs well on data from centres, scanners and field-strengths not represented in the training dataset. A retrospective study found that the classifier successfully identified lesions missed by the human raters. Lesion labels were provided by human raters, while weak labels for other brain structures (including CSF, cortical grey matter, cortical white matter, cerebellum, amygdala, hippocampus, subcortical GM structures and choroid plexus) were provided by Freesurfer 5.3. The segmentations of these structures compared well, not only with Freesurfer 5.3, but also with FSL-First and Freesurfer 6.0

arXiv.org e-Print Archive

Bern Open Repository and Information System (BORIS)