2 research outputs found
An Investigation on Deep Learning with Beta Stabilizer
Artificial neural networks (ANN) have been used in many applications such
like handwriting recognition and speech recognition. It is well-known that
learning rate is a crucial value in the training procedure for artificial
neural networks. It is shown that the initial value of learning rate can
confoundedly affect the final result and this value is always set manually in
practice. A new parameter called beta stabilizer has been introduced to reduce
the sensitivity of the initial learning rate. But this method has only been
proposed for deep neural network (DNN) with sigmoid activation function. In
this paper we extended beta stabilizer to long short-term memory (LSTM) and
investigated the effects of beta stabilizer parameters on different models,
including LSTM and DNN with relu activation function. It is concluded that beta
stabilizer parameters can reduce the sensitivity of learning rate with almost
the same performance on DNN with relu activation function and LSTM. However, it
is shown that the effects of beta stabilizer on DNN with relu activation
function and LSTM are fewer than the effects on DNN with sigmoid activation
function.Comment: Accepted by ICSP-201
Future Vector Enhanced LSTM Language Model for LVCSR
Language models (LM) play an important role in large vocabulary continuous
speech recognition (LVCSR). However, traditional language models only predict
next single word with given history, while the consecutive predictions on a
sequence of words are usually demanded and useful in LVCSR. The mismatch
between the single word prediction modeling in trained and the long term
sequence prediction in read demands may lead to the performance degradation. In
this paper, a novel enhanced long short-term memory (LSTM) LM using the future
vector is proposed. In addition to the given history, the rest of the sequence
will be also embedded by future vectors. This future vector can be incorporated
with the LSTM LM, so it has the ability to model much longer term sequence
level information. Experiments show that, the proposed new LSTM LM gets a
better result on BLEU scores for long term sequence prediction. For the speech
recognition rescoring, although the proposed LSTM LM obtains very slight gains,
the new model seems obtain the great complementary with the conventional LSTM
LM. Rescoring using both the new and conventional LSTM LMs can achieve a very
large improvement on the word error rate.Comment: Accepted by ASRU-201