10,267 research outputs found
A hypothesize-and-verify framework for Text Recognition using Deep Recurrent Neural Networks
Deep LSTM is an ideal candidate for text recognition. However text
recognition involves some initial image processing steps like segmentation of
lines and words which can induce error to the recognition system. Without
segmentation, learning very long range context is difficult and becomes
computationally intractable. Therefore, alternative soft decisions are needed
at the pre-processing level. This paper proposes a hybrid text recognizer using
a deep recurrent neural network with multiple layers of abstraction and long
range context along with a language model to verify the performance of the deep
neural network. In this paper we construct a multi-hypotheses tree architecture
with candidate segments of line sequences from different segmentation
algorithms at its different branches. The deep neural network is trained on
perfectly segmented data and tests each of the candidate segments, generating
unicode sequences. In the verification step, these unicode sequences are
validated using a sub-string match with the language model and best first
search is used to find the best possible combination of alternative hypothesis
from the tree structure. Thus the verification framework using language models
eliminates wrong segmentation outputs and filters recognition errors
LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices
Recent developments in speech synthesis have produced systems capable of
outcome intelligible speech, but now researchers strive to create models that
more accurately mimic human voices. One such development is the incorporation
of multiple linguistic styles in various languages and accents.
HMM-based Speech Synthesis is of great interest to many researchers, due to
its ability to produce sophisticated features with small footprint. Despite
such progress, its quality has not yet reached the level of the predominant
unit-selection approaches that choose and concatenate recordings of real
speech. Recent efforts have been made in the direction of improving these
systems.
In this paper we present the application of Long-Short Term Memory Deep
Neural Networks as a Postfiltering step of HMM-based speech synthesis, in order
to obtain closer spectral characteristics to those of natural speech. The
results show how HMM-voices could be improved using this approach.Comment: 5 pages, 5 figure
- …