1,809 research outputs found
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
A Transition-Based Directed Acyclic Graph Parser for UCCA
We present the first parser for UCCA, a cross-linguistically applicable
framework for semantic representation, which builds on extensive typological
work and supports rapid annotation. UCCA poses a challenge for existing parsing
techniques, as it exhibits reentrancy (resulting in DAG structures),
discontinuous structures and non-terminal nodes corresponding to complex
semantic units. To our knowledge, the conjunction of these formal properties is
not supported by any existing parser. Our transition-based parser, which uses a
novel transition set and features based on bidirectional LSTMs, has value not
just for UCCA parsing: its ability to handle more general graph structures can
inform the development of parsers for other semantic DAG structures, and in
languages that frequently use discontinuous structures.Comment: 16 pages; Accepted as long paper at ACL201
A Brief Introduction to Machine Learning for Engineers
This monograph aims at providing an introduction to key concepts, algorithms,
and theoretical results in machine learning. The treatment concentrates on
probabilistic models for supervised and unsupervised learning problems. It
introduces fundamental concepts and algorithms by building on first principles,
while also exposing the reader to more advanced topics with extensive pointers
to the literature, within a unified notation and mathematical framework. The
material is organized according to clearly defined categories, such as
discriminative and generative models, frequentist and Bayesian approaches,
exact and approximate inference, as well as directed and undirected models.
This monograph is meant as an entry point for researchers with a background in
probability and linear algebra.Comment: This is an expanded and improved version of the original posting.
Feedback is welcom
Nature of the learning algorithms for feedforward neural networks
The neural network model (NN) comprised of relatively simple computing elements, operating in parallel, offers an attractive and versatile framework for exploring a variety of learning
structures and processes for intelligent systems. Due to the amount of research developed in
the area many types of networks have been defined. The one of interest here is the multi-layer
perceptron as it is one of the simplest and it is considered a powerful representation tool whose
complete potential has not been adequately exploited and whose limitations need yet to be
specified in a formal and coherent framework. This dissertation addresses the theory of generalisation performance and architecture selection for the multi-layer perceptron; a subsidiary
aim is to compare and integrate this model with existing data analysis techniques and exploit
its potential by combining it with certain constructs from computational geometry creating a
reliable, coherent network design process which conforms to the characteristics of a generative
learning algorithm, ie. one including mechanisms for manipulating the connections and/or
units that comprise the architecture in addition to the procedure for updating the weights of
the connections. This means that it is unnecessary to provide an initial network as input to
the complete training process.After discussing in general terms the motivation for this study, the multi-layer perceptron
model is introduced and reviewed, along with the relevant supervised training algorithm, ie.
backpropagation. More particularly, it is argued that a network developed employing this model
can in general be trained and designed in a much better way by extracting more information
about the domains of interest through the application of certain geometric constructs in a preprocessing stage, specifically by generating the Voronoi Diagram and Delaunav Triangulation
[Okabe et al. 92] of the set of points comprising the training set and once a final architecture which performs appropriately on it has been obtained, Principal Component Analysis
[Jolliffe 86] is applied to the outputs produced by the units in the network's hidden layer to
eliminate the redundant dimensions of this space
Data-efficient deep representation learning
Current deep learning methods succeed in many data-intensive applications, but they are still not able to produce robust performance due to the lack of training samples. To investigate how to improve the performance of deep learning paradigms when training samples are limited, data-efficient deep representation learning (DDRL) is proposed in this study. DDRL as a sub area of representation learning mainly addresses the following problem: How can the performance of a deep learning method be maintained when the number of training samples is significantly reduced? This is vital for many applications where collecting data is highly costly, such as medical image analysis. Incorporating a certain kind of prior knowledge into the learning paradigm is key to achieving data efficiency.
Deep learning as a sub-area of machine learning can be divided into three parts (locations) in its learning process, namely Data, Optimisation and Model. Integrating prior knowledge into these three locations is expected to bring data efficiency into a learning paradigm, which can dramatically increase the model performance under the condition of limited training data.
In this thesis, we aim to develop novel deep learning methods for achieving data-efficient training, each of which integrates a certain kind of prior knowledge into three different locations respectively. We make the following contributions. First, we propose an iterative solution based on deep learning for medical image segmentation tasks, where dynamical systems are integrated into the segmentation labels in order to improve both performance and data efficiency. The proposed method not only shows a superior performance and better data efficiency compared to the state-of-the-art methods, but also has better interpretability and rotational invariance which are desired for medical imagining applications. Second, we propose a novel training framework which adaptively selects more informative samples for training during the optimization process. The adaptive selection or sampling is performed based on a hardness-aware strategy in the latent space constructed by a generative model.
We show that the proposed framework outperforms a random sampling method, which demonstrates effectiveness of the proposed framework. Thirdly, we propose a deep neural network model which produces the segmentation maps in a coarse-to-fine manner. The proposed architecture is a sequence of computational blocks containing a number of convolutional layers in which each block provides its successive block with a coarser segmentation map as a reference. Such mechanisms enable us to train the network with limited training samples and produce more interpretable results.Open Acces
Style Transfer with Generative Adversarial Networks
This dissertation is focused on trying to use concepts from style transfer and image-to-image translation to address the problem of defogging. Defogging (or dehazing) is the ability to remove fog from an image, restoring it as if the photograph was taken during optimal weather conditions. The task of defogging is of particular interest in many fields, such as surveillance or self driving cars.
In this thesis an unpaired approach to defogging is adopted, trying to translate a foggy image to the correspondent clear picture without having pairs of foggy and ground truth haze-free images during training. This approach is particularly significant, due to the difficult of gathering an image collection of exactly the same scenes with and without fog.
Many of the models and techniques used in this dissertation already existed in literature, but they are extremely difficult to train, and often it is highly problematic to obtain the desired behavior. Our contribute was a systematic implementative and experimental activity, conducted with the aim of attaining a comprehensive understanding of how these models work, and the role of datasets and training procedures in the final results. We also analyzed metrics and evaluation strategies, in order to seek to assess the quality of the presented model in the most correct and appropriate manner.
First, the feasibility of an unpaired approach to defogging was analyzed, using the cycleGAN model. Then, the base model was enhanced with a cycle perceptual loss, inspired by style transfer techniques. Next, the role of the training set was investigated, showing that improving the quality of data is at least as important as the utilization of more powerful models. Finally, our approach is compared with state-of-the art defogging methods, showing that the quality of our results is in line with preexisting approaches, even if our model was trained using unpaired data
- …