Numeracy of Language Models: Joint Modelling of Words and Numbers

Spithourakis, Georgios P

Numeracy of Language Models: Joint Modelling of Words and Numbers

Authors: Georgios P Spithourakis
Publication date: 28 May 2019
Publisher: UCL (University College London)

Abstract

Numeracy and literacy are the abilities to understand and work with numbers and words, respectively. While both skills are necessary for reading and writing documents in clinical, scientific, and other technical domains, existing statistical language models focus on words to the expense of numbers: numbers are ignored, masked, or treated similarly to words, which can obscure numerical content and cause sparsity issues, e.g. high out-of-vocabulary rates. In this thesis, we investigate whether the performance of neural language models can be improved by i) considering numerical information as additional inputs and ii) explicitly modelling the output of numerical tokens. In experiments with numbers as input, we find that numerical input features improve perplexity by 33% on a clinical dataset. In assisted text entry and verification tasks, numerical input features improve recall from 25.03% to 71.28% for word prediction with a list of 5 suggestions, keystroke savings from 34.35% to 44.81% for word completion, and F1 metric by 5 points for semantic error correction. Numerical information from an accompanying knowledge base helps improve performance further. In experiments with numerical tokens as output, we consider different strategies, e.g. memorisation and digit-by-digit composition, and propose a novel neural component based on Gaussian mixture density estimation. We propose the use of regression metrics to evaluate numerical accuracy and an adjusted perplexity metric that accounts for the high out-of-vocabulary rate of numerals. Our evaluation on clinical and scientific datasets shows that perplexity can be improved by more than 2 and 4 orders of magnitude, respectively, by modelling words and numerals with different sub-models through a hierarchical softmax. For the same datasets, our proposed mixture of Gaussians model achieved a 32% and 54% reduction of mean average percentage errors over the contender strategy, digit-by-digit composition. We conclude with a critical reflection of this thesis and suggestions for future work

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UCL Discovery

oai:eprints.ucl.ac.uk.OAI2:100...

Last time updated on 10/07/2019