1,622 research outputs found
Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data
Traditional convolutional layers extract features from patches of data by
applying a non-linearity on an affine function of the input. We propose a model
that enhances this feature extraction process for the case of sequential data,
by feeding patches of the data into a recurrent neural network and using the
outputs or hidden states of the recurrent units to compute the extracted
features. By doing so, we exploit the fact that a window containing a few
frames of the sequential data is a sequence itself and this additional
structure might encapsulate valuable information. In addition, we allow for
more steps of computation in the feature extraction process, which is
potentially beneficial as an affine function followed by a non-linearity can
result in too simple features. Using our convolutional recurrent layers we
obtain an improvement in performance in two audio classification tasks,
compared to traditional convolutional layers. Tensorflow code for the
convolutional recurrent layers is publicly available in
https://github.com/cruvadom/Convolutional-RNN
The Many-to-Many Mapping Between the Concordance Correlation Coefficient and the Mean Square Error
We derive the mapping between two of the most pervasive utility functions,
the mean square error () and the concordance correlation coefficient (CCC,
). Despite its drawbacks, is one of the most popular performance
metrics (and a loss function); along with lately in many of the
sequence prediction challenges. Despite the ever-growing simultaneous usage,
e.g., inter-rater agreement, assay validation, a mapping between the two
metrics is missing, till date. While minimisation of norm of the errors
or of its positive powers (e.g., ) is aimed at maximisation, we
reason the often-witnessed ineffectiveness of this popular loss function with
graphical illustrations. The discovered formula uncovers not only the
counterintuitive revelation that `' does not imply
`', but also provides the precise range for the
metric for a given . We discover the conditions for optimisation
for a given ; and as a logical next step, for a given set of errors. We
generalise and discover the conditions for any given norm, for an even p.
We present newly discovered, albeit apparent, mathematical paradoxes. The study
inspires and anticipates a growing use of -inspired loss functions
e.g., , replacing the traditional
-norm loss functions in multivariate regressions.Comment: Why this discovery, or the mapping formulation is important:
MSE1CCC2. In other words, MSE
minimisation does not necessarily guarantee CCC maximisatio
Scaling Speech Enhancement in Unseen Environments with Noise Embeddings
We address the problem of speech enhancement generalisation to unseen
environments by performing two manipulations. First, we embed an additional
recording from the environment alone, and use this embedding to alter
activations in the main enhancement subnetwork. Second, we scale the number of
noise environments present at training time to 16,784 different environments.
Experiment results show that both manipulations reduce word error rates of a
pretrained speech recognition system and improve enhancement quality according
to a number of performance measures. Specifically, our best model reduces the
word error rate from 34.04% on noisy speech to 15.46% on the enhanced speech.
Enhanced audio samples can be found in
https://speechenhancement.page.link/samples
Calibrated Prediction Intervals for Neural Network Regressors
Ongoing developments in neural network models are continually advancing the
state of the art in terms of system accuracy. However, the predicted labels
should not be regarded as the only core output; also important is a
well-calibrated estimate of the prediction uncertainty. Such estimates and
their calibration are critical in many practical applications. Despite their
obvious aforementioned advantage in relation to accuracy, contemporary neural
networks can, generally, be regarded as poorly calibrated and as such do not
produce reliable output probability estimates. Further, while post-processing
calibration solutions can be found in the relevant literature, these tend to be
for systems performing classification. In this regard, we herein present two
novel methods for acquiring calibrated predictions intervals for neural network
regressors: empirical calibration and temperature scaling. In experiments using
different regression tasks from the audio and computer vision domains, we find
that both our proposed methods are indeed capable of producing calibrated
prediction intervals for neural network regressors with any desired confidence
level, a finding that is consistent across all datasets and neural network
architectures we experimented with. In addition, we derive an additional
practical recommendation for producing more accurate calibrated prediction
intervals. We release the source code implementing our proposed methods for
computing calibrated predicted intervals. The code for computing calibrated
predicted intervals is publicly available
Fast Single-Class Classification and the Principle of Logit Separation
We consider neural network training, in applications in which there are many
possible classes, but at test-time, the task is a binary classification task of
determining whether the given example belongs to a specific class, where the
class of interest can be different each time the classifier is applied. For
instance, this is the case for real-time image search. We define the Single
Logit Classification (SLC) task: training the network so that at test-time, it
would be possible to accurately identify whether the example belongs to a given
class in a computationally efficient manner, based only on the output logit for
this class. We propose a natural principle, the Principle of Logit Separation,
as a guideline for choosing and designing losses suitable for the SLC. We show
that the cross-entropy loss function is not aligned with the Principle of Logit
Separation. In contrast, there are known loss functions, as well as novel batch
loss functions that we propose, which are aligned with this principle. In
total, we study seven loss functions. Our experiments show that indeed in
almost all cases, losses that are aligned with the Principle of Logit
Separation obtain at least 20% relative accuracy improvement in the SLC task
compared to losses that are not aligned with it, and sometimes considerably
more. Furthermore, we show that fast SLC does not cause any drop in binary
classification accuracy, compared to standard classification in which all
logits are computed, and yields a speedup which grows with the number of
classes. For instance, we demonstrate a 10x speedup when the number of classes
is 400,000. Tensorflow code for optimizing the new batch losses is publicly
available at https://github.com/cruvadom/Logit Separation.Comment: Published as a conference paper in ICDM 201
Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives
Over the past few years, adversarial training has become an extremely active
research topic and has been successfully applied to various Artificial
Intelligence (AI) domains. As a potentially crucial technique for the
development of the next generation of emotional AI systems, we herein provide a
comprehensive overview of the application of adversarial training to affective
computing and sentiment analysis. Various representative adversarial training
algorithms are explained and discussed accordingly, aimed at tackling diverse
challenges associated with emotional AI systems. Further, we highlight a range
of potential future research directions. We expect that this overview will help
facilitate the development of adversarial training for affective computing and
sentiment analysis in both the academic and industrial communities
- …