2,528 research outputs found
The Impact of Word, Multiple Word, and Sentence Input on Virtual Keyboard Decoding Performance
Entering text on non-desktop computing devices is often done
via an onscreen virtual keyboard. Input on such keyboards
normally consists of a sequence of noisy tap events that specify
some amount of text, most commonly a single word. But
is single word-at-a-time entry the best choice? This paper
compares user performance and recognition accuracy of wordat-
a-time, phrase-at-a-time, and sentence-at-a-time text entry
on a smartwatch keyboard. We evaluate the impact of differing
amounts of input in both text copy and free composition tasks.
We found providing input of an entire sentence significantly
improved entry rates from 26wpm to 32wpm while keeping
character error rates below 4%. In offline experiments with
more processing power and memory, sentence input was recognized
with a much lower 2.0% error rate. Our findings suggest
virtual keyboards can enhance performance by encouraging
users to provide more input per recognition event.This work was supported by Google Faculty awards (K.V. and
P.O.K.
VelociWatch: Designing and evaluating a virtual keyboard for the input of challenging text
© 2019 Association for Computing Machinery. Virtual keyboard typing is typically aided by an auto-correct method that decodes a user’s noisy taps into their intended text. This decoding process can reduce error rates and possibly increase entry rates by allowing users to type faster but less precisely. However, virtual keyboard decoders sometimes make mistakes that change a user’s desired word into another. This is particularly problematic for challenging text such as proper names. We investigate whether users can guess words that are likely to cause auto-correct problems and whether users can adjust their behavior to assist the decoder. We conduct computational experiments to decide what predictions to ofer in a virtual keyboard and design a smartwatch keyboard named VelociWatch. Novice users were able to use the features of VelociWatch to enter challenging text at 17 words-per-minute with a corrected error rate of 3%. Interestingly, they wrote slightly faster and just as accurately on a simpler keyboard with limited correction options. Our fnding suggest users may be able to type dif-fcult words on a smartwatch simply by tapping precisely without the use of auto-correct
Fast and precise touch-based text entry for head-mounted augmented reality with variable occlusion
We present the VISAR keyboard: An augmented reality (AR) head-mounted display (HMD) system that supports text entry via a virtualised input surface. Users select keys on the virtual keyboard by imitating the process of single-hand typing on a physical touchscreen display. Our system uses a statistical decoder to infer users’ intended text and to provide error-tolerant predictions. There is also a high-precision fall-back mechanism to support users in indicating which keys should be unmodified by the auto-correction process. A unique advantage of leveraging the well-established touch input paradigm is that our system enables text entry with minimal visual clutter on the see-through display, thus preserving the user’s field-of-view. We iteratively designed and evaluated our system and show that the final iteration of the system supports a mean entry rate of 17.75wpm with a mean character error rate less than 1%. This performance represents a 19.6% improvement relative to the state-of-the-art baseline investigated: A gaze-then-gesture text entry technique derived from the system keyboard on the Microsoft HoloLens. Finally, we validate that the system is effective in supporting text entry in a fully mobile usage scenario likely to be encountered in industrial applications of AR HMDs.Per Ola Kristensson was supported in part by a Google Faculty research award and EPSRC grants EP/N010558/1 and EP/N014278/1. Keith Vertanen was supported in part by a Google Faculty research award. John Dudley was supported by the Trimble Fund
Adapting End-to-End Speech Recognition for Readable Subtitles
Automatic speech recognition (ASR) systems are primarily evaluated on
transcription accuracy. However, in some use cases such as subtitling, verbatim
transcription would reduce output readability given limited screen size and
reading time. Therefore, this work focuses on ASR with output compression, a
task challenging for supervised approaches due to the scarcity of training
data. We first investigate a cascaded system, where an unsupervised compression
model is used to post-edit the transcribed speech. We then compare several
methods of end-to-end speech recognition under output length constraints. The
experiments show that with limited data far less than needed for training a
model from scratch, we can adapt a Transformer-based ASR model to incorporate
both transcription and compression capabilities. Furthermore, the best
performance in terms of WER and ROUGE scores is achieved by explicitly modeling
the length constraints within the end-to-end ASR system.Comment: IWSLT 202
- …