18 research outputs found
Recurrent neural networks for polyphonic sound event detection
The objective of this thesis is to investigate how a deep learning model called recurrent neural network (RNN) performs in the task of detecting overlapping sound events in real life environments. Examples of such sound events include dog barking, footsteps, and crowd applauding. When several sound sources are active simultaneously, as it is often the case in everyday contexts, identifying individual sound events from their polyphonic mixture is a challenging task. Other factors such as noise and distortions contribute to making even more difficult to explicitly implement a computer program to solve the detection task.
We present an approach to polyphonic sound event detection in real life recordings based on a RNN architecture called bidirectional long short term memory (BLSTM). A multilabel BLSTM RNN is trained to map the time-frequency representation of a mixture signal consisting of sounds from multiple sources, to binary activity indicators of each event class. Our method is tested on two large databases of recordings, both containing sound events from more than 60 different classes, and in one case from 10 different everyday contexts. Furthermore, in order to reduce overfitting we propose to use several data augmentation techniques: time stretching, sub-frame time shifting, and block mixing.
The proposed approach outperforms the previous state-of-the-art method, despite using half of the parameters, and the results are further largely improved using the block mixing data augmentation technique. Overall, for the first dataset our approach reports an average F1-score of 65.5% on 1 second blocks and 64.7% on single frames, a relative improvement over previous state-of-the-art approach of 6.8% and 15.1% respectively. For the second dataset our system reports an average F1- score of 84.4% on 1 second blocks and 85.1% on single frames, a relative improvement over the baseline approach of 38.4% and 35.9% respectively
Learning Independent Causal Mechanisms
Statistical learning relies upon data sampled from a distribution, and we
usually do not care what actually generated it in the first place. From the
point of view of causal modeling, the structure of each distribution is induced
by physical mechanisms that give rise to dependences between observables.
Mechanisms, however, can be meaningful autonomous modules of generative models
that make sense beyond a particular entailed data distribution, lending
themselves to transfer between problems. We develop an algorithm to recover a
set of independent (inverse) mechanisms from a set of transformed data points.
The approach is unsupervised and based on a set of experts that compete for
data generated by the mechanisms, driving specialization. We analyze the
proposed method in a series of experiments on image data. Each expert learns to
map a subset of the transformed data back to a reference distribution. The
learned mechanisms generalize to novel domains. We discuss implications for
transfer learning and links to recent trends in generative modeling.Comment: ICML 201
Tempered Adversarial Networks
Generative adversarial networks (GANs) have been shown to produce realistic
samples from high-dimensional distributions, but training them is considered
hard. A possible explanation for training instabilities is the inherent
imbalance between the networks: While the discriminator is trained directly on
both real and fake samples, the generator only has control over the fake
samples it produces since the real data distribution is fixed by the choice of
a given dataset. We propose a simple modification that gives the generator
control over the real samples which leads to a tempered learning process for
both generator and discriminator. The real data distribution passes through a
lens before being revealed to the discriminator, balancing the generator and
discriminator by gradually revealing more detailed features necessary to
produce high-quality results. The proposed module automatically adjusts the
learning process to the current strength of the networks, yet is generic and
easy to add to any GAN variant. In a number of experiments, we show that this
can improve quality, stability and/or convergence speed across a range of
different GAN architectures (DCGAN, LSGAN, WGAN-GP).Comment: accepted to ICML 201
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
Sound events often occur in unstructured environments where they exhibit wide
variations in their frequency content and temporal structure. Convolutional
neural networks (CNN) are able to extract higher level features that are
invariant to local spectral and temporal variations. Recurrent neural networks
(RNNs) are powerful in learning the longer term temporal context in the audio
signals. CNNs and RNNs as classifiers have recently shown improved performances
over established methods in various sound recognition tasks. We combine these
two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it
on a polyphonic sound event detection task. We compare the performance of the
proposed CRNN method with CNN, RNN, and other established methods, and observe
a considerable improvement for four different datasets consisting of everyday
sound events.Comment: Accepted for IEEE Transactions on Audio, Speech and Language
Processing, Special Issue on Sound Scene and Event Analysi
Avoiding Discrimination through Causal Reasoning
Recent work on fairness in machine learning has focused on various
statistical discrimination criteria and how they trade off. Most of these
criteria are observational: They depend only on the joint distribution of
predictor, protected attribute, features, and outcome. While convenient to work
with, observational criteria have severe inherent limitations that prevent them
from resolving matters of fairness conclusively.
Going beyond observational criteria, we frame the problem of discrimination
based on protected attributes in the language of causal reasoning. This
viewpoint shifts attention from "What is the right fairness criterion?" to
"What do we want to assume about the causal data generating process?" Through
the lens of causality, we make several contributions. First, we crisply
articulate why and when observational criteria fail, thus formalizing what was
before a matter of opinion. Second, our approach exposes previously ignored
subtleties and why they are fundamental to the problem. Finally, we put forward
natural causal non-discrimination criteria and develop algorithms that satisfy
them.Comment: Advances in Neural Information Processing Systems 30, 2017
http://papers.nips.cc/paper/6668-avoiding-discrimination-through-causal-reasonin
Predicting Ordinary Differential Equations with Transformers
We develop a transformer-based sequence-to-sequence model that recovers
scalar ordinary differential equations (ODEs) in symbolic form from irregularly
sampled and noisy observations of a single solution trajectory. We demonstrate
in extensive empirical evaluations that our model performs better or on par
with existing methods in terms of accurate recovery across various settings.
Moreover, our method is efficiently scalable: after one-time pretraining on a
large set of ODEs, we can infer the governing law of a new observed solution in
a few forward passes of the model.Comment: Published at ICML 202
Discovering ordinary differential equations that govern time-series
Natural laws are often described through differential equations yet finding a
differential equation that describes the governing law underlying observed data
is a challenging and still mostly manual task. In this paper we make a step
towards the automation of this process: we propose a transformer-based
sequence-to-sequence model that recovers scalar autonomous ordinary
differential equations (ODEs) in symbolic form from time-series data of a
single observed solution of the ODE. Our method is efficiently scalable: after
one-time pretraining on a large set of ODEs, we can infer the governing laws of
a new observed solution in a few forward passes of the model. Then we show that
our model performs better or on par with existing methods in various test cases
in terms of accurate symbolic recovery of the ODE, especially for more complex
expressions.Comment: Workshop paper at NeurIPS 2022 workshop "AI for Science