The Transformer architecture is shown to provide a powerful framework as an
end-to-end model for building expression trees from online handwritten gestures
corresponding to glyph strokes. In particular, the attention mechanism was
successfully used to encode, learn and enforce the underlying syntax of
expressions creating latent representations that are correctly decoded to the
exact mathematical expression tree, providing robustness to ablated inputs and
unseen glyphs. For the first time, the encoder is fed with spatio-temporal data
tokens potentially forming an infinitely large vocabulary, which finds
applications beyond that of online gesture recognition. A new supervised
dataset of online handwriting gestures is provided for training models on
generic handwriting recognition tasks and a new metric is proposed for the
evaluation of the syntactic correctness of the output expression trees. A small
Transformer model suitable for edge inference was successfully trained to an
average normalised Levenshtein accuracy of 94%, resulting in valid postfix RPN
tree representation for 94% of predictions.Comment: 12 pages, 3 Figures, 4 Table