18 research outputs found
TasNet: time-domain audio separation network for real-time, single-channel speech separation
Robust speech processing in multi-talker environments requires effective
speech separation. Recent deep learning systems have made significant progress
toward solving this problem, yet it remains challenging particularly in
real-time, short latency applications. Most methods attempt to construct a mask
for each source in time-frequency representation of the mixture signal which is
not necessarily an optimal representation for speech separation. In addition,
time-frequency decomposition results in inherent problems such as
phase/magnitude decoupling and long time window which is required to achieve
sufficient frequency resolution. We propose Time-domain Audio Separation
Network (TasNet) to overcome these limitations. We directly model the signal in
the time-domain using an encoder-decoder framework and perform the source
separation on nonnegative encoder outputs. This method removes the frequency
decomposition step and reduces the separation problem to estimation of source
masks on encoder outputs which is then synthesized by the decoder. Our system
outperforms the current state-of-the-art causal and noncausal speech separation
algorithms, reduces the computational cost of speech separation, and
significantly reduces the minimum required latency of the output. This makes
TasNet suitable for applications where low-power, real-time implementation is
desirable such as in hearable and telecommunication devices.Comment: Camera ready version for ICASSP 2018, Calgary, Canad
Structured Dropout for Weak Label and Multi-Instance Learning and Its Application to Score-Informed Source Separation
Many success stories involving deep neural networks are instances of
supervised learning, where available labels power gradient-based learning
methods. Creating such labels, however, can be expensive and thus there is
increasing interest in weak labels which only provide coarse information, with
uncertainty regarding time, location or value. Using such labels often leads to
considerable challenges for the learning process. Current methods for
weak-label training often employ standard supervised approaches that
additionally reassign or prune labels during the learning process. The
information gain, however, is often limited as only the importance of labels
where the network already yields reasonable results is boosted. We propose
treating weak-label training as an unsupervised problem and use the labels to
guide the representation learning to induce structure. To this end, we propose
two autoencoder extensions: class activity penalties and structured dropout. We
demonstrate the capabilities of our approach in the context of score-informed
source separation of music
Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution
Strong intelligent machines powered by deep neural networks are increasingly
deployed as black boxes to make decisions in risk-sensitive domains, such as
finance and medical. To reduce potential risk and build trust with users, it is
critical to interpret how such machines make their decisions. Existing works
interpret a pre-trained neural network by analyzing hidden neurons, mimicking
pre-trained models or approximating local predictions. However, these methods
do not provide a guarantee on the exactness and consistency of their
interpretation. In this paper, we propose an elegant closed form solution named
to compute exact and consistent interpretations for the family of
Piecewise Linear Neural Networks (PLNN). The major idea is to first transform a
PLNN into a mathematically equivalent set of linear classifiers, then interpret
each linear classifier by the features that dominate its prediction. We further
apply to demonstrate the effectiveness of non-negative and sparse
constraints on improving the interpretability of PLNNs. The extensive
experiments on both synthetic and real world data sets clearly demonstrate the
exactness and consistency of our interpretation.Comment: KDD 201