7,279 research outputs found
Feature Learning from Spectrograms for Assessment of Personality Traits
Several methods have recently been proposed to analyze speech and
automatically infer the personality of the speaker. These methods often rely on
prosodic and other hand crafted speech processing features extracted with
off-the-shelf toolboxes. To achieve high accuracy, numerous features are
typically extracted using complex and highly parameterized algorithms. In this
paper, a new method based on feature learning and spectrogram analysis is
proposed to simplify the feature extraction process while maintaining a high
level of accuracy. The proposed method learns a dictionary of discriminant
features from patches extracted in the spectrogram representations of training
speech segments. Each speech segment is then encoded using the dictionary, and
the resulting feature set is used to perform classification of personality
traits. Experiments indicate that the proposed method achieves state-of-the-art
results with a significant reduction in complexity when compared to the most
recent reference methods. The number of features, and difficulties linked to
the feature extraction process are greatly reduced as only one type of
descriptors is used, for which the 6 parameters can be tuned automatically. In
contrast, the simplest reference method uses 4 types of descriptors to which 6
functionals are applied, resulting in over 20 parameters to be tuned.Comment: 12 pages, 3 figure
Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning
This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/ licenses/by/4.0
Multilingual Adaptation of RNN Based ASR Systems
In this work, we focus on multilingual systems based on recurrent neural
networks (RNNs), trained using the Connectionist Temporal Classification (CTC)
loss function. Using a multilingual set of acoustic units poses difficulties.
To address this issue, we proposed Language Feature Vectors (LFVs) to train
language adaptive multilingual systems. Language adaptation, in contrast to
speaker adaptation, needs to be applied not only on the feature level, but also
to deeper layers of the network. In this work, we therefore extended our
previous approach by introducing a novel technique which we call "modulation".
Based on this method, we modulated the hidden layers of RNNs using LFVs. We
evaluated this approach in both full and low resource conditions, as well as
for grapheme and phone based systems. Lower error rates throughout the
different conditions could be achieved by the use of the modulation.Comment: 5 pages, 1 figure, to appear in 2018 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP 2018
Multi-modal dictionary learning for image separation with application in art investigation
In support of art investigation, we propose a new source separation method
that unmixes a single X-ray scan acquired from double-sided paintings. In this
problem, the X-ray signals to be separated have similar morphological
characteristics, which brings previous source separation methods to their
limits. Our solution is to use photographs taken from the front and back-side
of the panel to drive the separation process. The crux of our approach relies
on the coupling of the two imaging modalities (photographs and X-rays) using a
novel coupled dictionary learning framework able to capture both common and
disparate features across the modalities using parsimonious representations;
the common component models features shared by the multi-modal images, whereas
the innovation component captures modality-specific information. As such, our
model enables the formulation of appropriately regularized convex optimization
procedures that lead to the accurate separation of the X-rays. Our dictionary
learning framework can be tailored both to a single- and a multi-scale
framework, with the latter leading to a significant performance improvement.
Moreover, to improve further on the visual quality of the separated images, we
propose to train coupled dictionaries that ignore certain parts of the painting
corresponding to craquelure. Experimentation on synthetic and real data - taken
from digital acquisition of the Ghent Altarpiece (1432) - confirms the
superiority of our method against the state-of-the-art morphological component
analysis technique that uses either fixed or trained dictionaries to perform
image separation.Comment: submitted to IEEE Transactions on Images Processin
- …