47,272 research outputs found
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Multi-task CNN Model for Attribute Prediction
This paper proposes a joint multi-task learning algorithm to better predict
attributes in images using deep convolutional neural networks (CNN). We
consider learning binary semantic attributes through a multi-task CNN model,
where each CNN will predict one binary attribute. The multi-task learning
allows CNN models to simultaneously share visual knowledge among different
attribute categories. Each CNN will generate attribute-specific feature
representations, and then we apply multi-task learning on the features to
predict their attributes. In our multi-task framework, we propose a method to
decompose the overall model's parameters into a latent task matrix and
combination matrix. Furthermore, under-sampled classifiers can leverage shared
statistics from other classifiers to improve their performance. Natural
grouping of attributes is applied such that attributes in the same group are
encouraged to share more knowledge. Meanwhile, attributes in different groups
will generally compete with each other, and consequently share less knowledge.
We show the effectiveness of our method on two popular attribute datasets.Comment: 11 pages, 3 figures, ieee transaction pape
Exploring Context with Deep Structured models for Semantic Segmentation
State-of-the-art semantic image segmentation methods are mostly based on
training deep convolutional neural networks (CNNs). In this work, we proffer to
improve semantic segmentation with the use of contextual information. In
particular, we explore `patch-patch' context and `patch-background' context in
deep CNNs. We formulate deep structured models by combining CNNs and
Conditional Random Fields (CRFs) for learning the patch-patch context between
image regions. Specifically, we formulate CNN-based pairwise potential
functions to capture semantic correlations between neighboring patches.
Efficient piecewise training of the proposed deep structured model is then
applied in order to avoid repeated expensive CRF inference during the course of
back propagation. For capturing the patch-background context, we show that a
network design with traditional multi-scale image inputs and sliding pyramid
pooling is very effective for improving performance. We perform comprehensive
evaluation of the proposed method. We achieve new state-of-the-art performance
on a number of challenging semantic segmentation datasets including ,
-, , -, -,
-, and datasets. Particularly, we report an
intersection-over-union score of on the - dataset.Comment: 16 pages. Accepted to IEEE T. Pattern Analysis & Machine
Intelligence, 2017. Extended version of arXiv:1504.0101
- …