6,455 research outputs found
Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks
Multivariate time series forecasting is an important machine learning problem
across many domains, including predictions of solar plant energy output,
electricity consumption, and traffic jam situation. Temporal data arise in
these real-world applications often involves a mixture of long-term and
short-term patterns, for which traditional approaches such as Autoregressive
models and Gaussian Process may fail. In this paper, we proposed a novel deep
learning framework, namely Long- and Short-term Time-series network (LSTNet),
to address this open challenge. LSTNet uses the Convolution Neural Network
(CNN) and the Recurrent Neural Network (RNN) to extract short-term local
dependency patterns among variables and to discover long-term patterns for time
series trends. Furthermore, we leverage traditional autoregressive model to
tackle the scale insensitive problem of the neural network model. In our
evaluation on real-world data with complex mixtures of repetitive patterns,
LSTNet achieved significant performance improvements over that of several
state-of-the-art baseline methods. All the data and experiment codes are
available online.Comment: Accepted by SIGIR 201
Boosting local search thanks to {CDCL}
International audienceIn this paper, a novel hybrid and complete approach for propositional satisfiability, called SAT HYS (Sat Hybrid Solver), is introduced. It efficiently combines the strength of both local search and CDCL based SAT solvers. Considering the consistent partial assignment under construction by the CDCL SAT solver, local search is used to extend it to a model of the Boolean formula, while the CDCL component is used by the local search one as a strategy to escape from a local minimum. Additionally, both solvers heavily cooperate thanks to relevant information gathered during search. Experimentations on SAT instances taken from the last competitions demonstrate the efficiency and the robustness of our hybrid solver with respect to the state-of-the-art CDCL based, local search and hybrid SAT solvers
Practical recommendations for gradient-based training of deep architectures
Learning algorithms related to artificial neural networks and in particular
for Deep Learning may seem to involve many bells and whistles, called
hyper-parameters. This chapter is meant as a practical guide with
recommendations for some of the most commonly used hyper-parameters, in
particular in the context of learning algorithms based on back-propagated
gradient and gradient-based optimization. It also discusses how to deal with
the fact that more interesting results can be obtained when allowing one to
adjust many hyper-parameters. Overall, it describes elements of the practice
used to successfully and efficiently train and debug large-scale and often deep
multi-layer neural networks. It closes with open questions about the training
difficulties observed with deeper architectures
Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled
In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditions is presented. For automatic speech recognition systems based on statistical methods, it is important that the conditions of the audio used for training the statistical models match the conditions of the audio to be processed. Any mismatch will decrease the accuracy of the recognition. If it is unpredictable what kind of data can be expected, or in other words if the conditions of the audio to be processed are unknown, it is impossible to tune the models. If the material consists of `surprise data' the output of the system is likely to be poor. In this thesis methods are presented for which no external training data is required for training models. These novel methods have been implemented in a large vocabulary continuous speech recognition system called SHoUT. This system consists of three subsystems: speech/non-speech classification, speaker diarization and automatic speech recognition. The speech/non-speech classification subsystem separates speech from silence and unknown audible non-speech events. The type of non-speech present in audio recordings can vary from paper shuffling in recordings of meetings to sound effects in television shows. Because it is unknown what type of non-speech needs to be detected, it is not possible to train high quality statistical models for each type of non-speech sound. The speech/non-speech classification subsystem, also called the speech activity detection subsystem, does not attempt to classify all audible non-speech in a single run. Instead, first a bootstrap speech/silence classification is obtained using a standard speech activity component. Next, the models for speech, silence and audible non-speech are trained on the target audio using the bootstrap classification. This approach makes it possible to classify speech and non-speech with high accuracy, without the need to know what kinds of sound are present in the audio recording. Once all non-speech is filtered out of the audio, it is the task of the speaker diarization subsystem to determine how many speakers occur in the recording and exactly when they are speaking. The speaker diarization subsystem applies agglomerative clustering to create clusters of speech fragments for each speaker in the recording. First, statistical speaker models are created on random chunks of the recording and by iteratively realigning the data, retraining the models and merging models that represent the same speaker, accurate speaker models are obtained for speaker clustering. This method does not require any statistical models developed on a training set, which makes the diarization subsystem insensitive for variation in audio conditions. Unfortunately, because the algorithm is of complexity , this clustering method is slow for long recordings. Two variations of the subsystem are presented that reduce the needed computational effort, so that the subsystem is applicable for long audio recordings as well. The automatic speech recognition subsystem developed for this research, is based on Viterbi decoding on a fixed pronunciation prefix tree. Using the fixed tree, a flexible modular decoder could be developed, but it was not straightforward to apply full language model look-ahead efficiently. In this thesis a novel method is discussed that makes it possible to apply language model look-ahead effectively on the fixed tree. Also, to obtain higher speech recognition accuracy on audio with unknown acoustical conditions, a selection from the numerous known methods that exist for robust automatic speech recognition is applied and evaluated in this thesis. The three individual subsystems as well as the entire system have been successfully evaluated on three international benchmarks. The diarization subsystem has been evaluated at the NIST RT06s benchmark and the speech activity detection subsystem has been tested at RT07s. The entire system was evaluated at N-Best, the first automatic speech recognition benchmark for Dutch
- …