875 research outputs found
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
TeCNO: Surgical Phase Recognition with Multi-Stage Temporal Convolutional Networks
Automatic surgical phase recognition is a challenging and crucial task with
the potential to improve patient safety and become an integral part of
intra-operative decision-support systems. In this paper, we propose, for the
first time in workflow analysis, a Multi-Stage Temporal Convolutional Network
(MS-TCN) that performs hierarchical prediction refinement for surgical phase
recognition. Causal, dilated convolutions allow for a large receptive field and
online inference with smooth predictions even during ambiguous transitions. Our
method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy
videos with and without the use of additional surgical tool information.
Outperforming various state-of-the-art LSTM approaches, we verify the
suitability of the proposed causal MS-TCN for surgical phase recognition.Comment: 10 pages, 2 figure
Deep Learning in Cardiology
The medical field is creating large amount of data that physicians are unable
to decipher and use efficiently. Moreover, rule-based expert systems are
inefficient in solving complicated medical tasks or for creating insights using
big data. Deep learning has emerged as a more accurate and effective technology
in a wide range of medical problems such as diagnosis, prediction and
intervention. Deep learning is a representation learning method that consists
of layers that transform the data non-linearly, thus, revealing hierarchical
relationships and structures. In this review we survey deep learning
application papers that use structured data, signal and imaging modalities from
cardiology. We discuss the advantages and limitations of applying deep learning
in cardiology that also apply in medicine in general, while proposing certain
directions as the most viable for clinical use.Comment: 27 pages, 2 figures, 10 table
Enhanced Exploration of Neural Network Models for Indoor Human Monitoring
Indoor human monitoring can enable or enhance a wide range of applications, from medical to security and home or building automation. For effective ubiquitous deployment, the monitoring system should be easy to install and unobtrusive, reliable, low cost, tagless, and privacy-aware. Long-range capacitive sensors are good candidates, but they can be susceptible to environmental electromagnetic noise and require special signal processing. Neural networks (NNs), especially 1D convolutional neural networks (1D-CNNs), excel at extracting information and rejecting noise, but they lose important relationships in max/average pooling operations. We investigate the performance of NN architectures for time series analysis without this shortcoming, the capsule networks that use dynamic routing, and the temporal convolutional networks (TCNs) that use dilated convolutions to preserve input resolution across layers and extend their receptive field with fewer layers. The networks are optimized for both inference accuracy and resource consumption using two independent state-of-the-art methods, neural architecture search and knowledge distillation. Experimental results show that the TCN architecture performs the best, achieving 12.7% lower inference loss with 73.3% less resource consumption than the best 1D-CNN when processing noisy capacitive sensor data for indoor human localization and tracking
Local Temporal Bilinear Pooling for Fine-grained Action Parsing
Fine-grained temporal action parsing is important in many applications, such
as daily activity understanding, human motion analysis, surgical robotics and
others requiring subtle and precise operations in a long-term period. In this
paper we propose a novel bilinear pooling operation, which is used in
intermediate layers of a temporal convolutional encoder-decoder net. In
contrast to other work, our proposed bilinear pooling is learnable and hence
can capture more complex local statistics than the conventional counterpart. In
addition, we introduce exact lower-dimension representations of our bilinear
forms, so that the dimensionality is reduced with neither information loss nor
extra computation. We perform intensive experiments to quantitatively analyze
our model and show the superior performances to other state-of-the-art work on
various datasets.Comment: 11 pages, 2 figures. Cam.
Simultaneous lesion and neuroanatomy segmentation in Multiple Sclerosis using deep neural networks
Segmentation of both white matter lesions and deep grey matter structures is
an important task in the quantification of magnetic resonance imaging in
multiple sclerosis. Typically these tasks are performed separately: in this
paper we present a single segmentation solution based on convolutional neural
networks (CNNs) for providing fast, reliable segmentations of multimodal
magnetic resonance images into lesion classes and normal-appearing grey- and
white-matter structures. We show substantial, statistically significant
improvements in both Dice coefficient and in lesion-wise specificity and
sensitivity, compared to previous approaches, and agreement with individual
human raters in the range of human inter-rater variability. The method is
trained on data gathered from a single centre: nonetheless, it performs well on
data from centres, scanners and field-strengths not represented in the training
dataset. A retrospective study found that the classifier successfully
identified lesions missed by the human raters.
Lesion labels were provided by human raters, while weak labels for other
brain structures (including CSF, cortical grey matter, cortical white matter,
cerebellum, amygdala, hippocampus, subcortical GM structures and choroid
plexus) were provided by Freesurfer 5.3. The segmentations of these structures
compared well, not only with Freesurfer 5.3, but also with FSL-First and
Freesurfer 6.0
- …