2 research outputs found
Text normalization using memory augmented neural networks
We perform text normalization, i.e. the transformation of words from the
written to the spoken form, using a memory augmented neural network. With the
addition of dynamic memory access and storage mechanism, we present a neural
architecture that will serve as a language-agnostic text normalization system
while avoiding the kind of unacceptable errors made by the LSTM-based recurrent
neural networks. By successfully reducing the frequency of such mistakes, we
show that this novel architecture is indeed a better alternative. Our proposed
system requires significantly lesser amounts of data, training time and compute
resources. Additionally, we perform data up-sampling, circumventing the data
sparsity problem in some semiotic classes, to show that sufficient examples in
any particular class can improve the performance of our text normalization
system. Although a few occurrences of these errors still remain in certain
semiotic classes, we demonstrate that memory augmented networks with
meta-learning capabilities can open many doors to a superior text normalization
system.Comment: 9 pages, 10 tables, 3 figure
Neural Inverse Text Normalization
While there have been several contributions exploring state of the art
techniques for text normalization, the problem of inverse text normalization
(ITN) remains relatively unexplored. The best known approaches leverage finite
state transducer (FST) based models which rely on manually curated rules and
are hence not scalable. We propose an efficient and robust neural solution for
ITN leveraging transformer based seq2seq models and FST-based text
normalization techniques for data preparation. We show that this can be easily
extended to other languages without the need for a linguistic expert to
manually curate them. We then present a hybrid framework for integrating Neural
ITN with an FST to overcome common recoverable errors in production
environments. Our empirical evaluations show that the proposed solution
minimizes incorrect perturbations (insertions, deletions and substitutions) to
ASR output and maintains high quality even on out of domain data. A transformer
based model infused with pretraining consistently achieves a lower WER across
several datasets and is able to outperform baselines on English, Spanish,
German and Italian datasets.Comment: 5 pages, accepted to ICASSP 202