5,841 research outputs found
NeuTM: A Neural Network-based Framework for Traffic Matrix Prediction in SDN
This paper presents NeuTM, a framework for network Traffic Matrix (TM)
prediction based on Long Short-Term Memory Recurrent Neural Networks (LSTM
RNNs). TM prediction is defined as the problem of estimating future network
traffic matrix from the previous and achieved network traffic data. It is
widely used in network planning, resource management and network security. Long
Short-Term Memory (LSTM) is a specific recurrent neural network (RNN)
architecture that is well-suited to learn from data and classify or predict
time series with time lags of unknown size. LSTMs have been shown to model
long-range dependencies more accurately than conventional RNNs. NeuTM is a LSTM
RNN-based framework for predicting TM in large networks. By validating our
framework on real-world data from GEEANT network, we show that our model
converges quickly and gives state of the art TM prediction performance.Comment: Submitted to NOMS18. arXiv admin note: substantial text overlap with
arXiv:1705.0569
The Microsoft 2017 Conversational Speech Recognition System
We describe the 2017 version of Microsoft's conversational speech recognition
system, in which we update our 2016 system with recent developments in
neural-network-based acoustic and language modeling to further advance the
state of the art on the Switchboard speech recognition task. The system adds a
CNN-BLSTM acoustic model to the set of model architectures we combined
previously, and includes character-based and dialog session aware LSTM language
models in rescoring. For system combination we adopt a two-stage approach,
whereby subsets of acoustic models are first combined at the senone/frame
level, followed by a word-level voting via confusion networks. We also added a
confusion network rescoring step after system combination. The resulting system
yields a 5.1\% word error rate on the 2000 Switchboard evaluation set
The Microsoft 2016 Conversational Speech Recognition System
We describe Microsoft's conversational speech recognition system, in which we
combine recent developments in neural-network-based acoustic and language
modeling to advance the state of the art on the Switchboard recognition task.
Inspired by machine learning ensemble techniques, the system uses a range of
convolutional and recurrent neural networks. I-vector modeling and lattice-free
MMI training provide significant gains for all acoustic model architectures.
Language model rescoring with multiple forward and backward running RNNLMs, and
word posterior-based system combination provide a 20% boost. The best single
system uses a ResNet architecture acoustic model with RNNLM rescoring, and
achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The
combined system has an error rate of 6.2%, representing an improvement over
previously reported results on this benchmark task
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Automatic speech recognition with deep neural networks for impaired speech
The final publication is available at https://link.springer.com/chapter/10.1007%2F978-3-319-49169-1_10Automatic Speech Recognition has reached almost human performance in some controlled scenarios. However, recognition of impaired speech is a difficult task for two main reasons: data is (i) scarce and (ii) heterogeneous. In this work we train different architectures on a database of dysarthric speech. A comparison between architectures shows that, even with a small database, hybrid DNN-HMM models outperform classical GMM-HMM according to word error rate measures. A DNN is able to improve the recognition word error rate a 13% for subjects with dysarthria with respect to the best classical architecture. This improvement is higher than the one given by other deep neural networks such as CNNs, TDNNs and LSTMs. All the experiments have been done with the Kaldi toolkit for speech recognition for which we have adapted several recipes to deal with dysarthric speech and work on the TORGO database. These recipes are publicly available.Peer ReviewedPostprint (author's final draft
A Long Short-Term Memory Recurrent Neural Network Framework for Network Traffic Matrix Prediction
Network Traffic Matrix (TM) prediction is defined as the problem of
estimating future network traffic from the previous and achieved network
traffic data. It is widely used in network planning, resource management and
network security. Long Short-Term Memory (LSTM) is a specific recurrent neural
network (RNN) architecture that is well-suited to learn from experience to
classify, process and predict time series with time lags of unknown size. LSTMs
have been shown to model temporal sequences and their long-range dependencies
more accurately than conventional RNNs. In this paper, we propose a LSTM RNN
framework for predicting short and long term Traffic Matrix (TM) in large
networks. By validating our framework on real-world data from GEANT network, we
show that our LSTM models converge quickly and give state of the art TM
prediction performance for relatively small sized models.Comment: Submitted for peer review. arXiv admin note: text overlap with
arXiv:1402.1128 by other author
- …