20,819 research outputs found

    A tool for facilitating OCR postediting in historical documents

    Get PDF
    Optical character recognition (OCR) for historical documents is a complex procedure subject to a unique set of material issues, including inconsistencies in typefaces and low quality scanning. Consequently, even the most sophisticated OCR engines produce errors. This paper reports on a tool built for postediting the output of Tesseract, more specifically for correcting common errors in digitized historical documents. The proposed tool suggests alternatives for word forms not found in a specified vocabulary. The assumed error is replaced by a presumably correct alternative in the post-edition based on the scores of a Language Model (LM). The tool is tested on a chapter of the book An Essay Towards Regulating the Trade and Employing the Poor of this Kingdom. As demonstrated below, the tool is successful in correcting a number of common errors. If sometimes unreliable, it is also transparent and subject to human intervention

    Multidimensional Pareto optimization of touchscreen keyboards for speed, familiarity and improved spell checking

    Get PDF
    The paper presents a new optimization technique for keyboard layouts based on Pareto front optimization. We used this multifactorial technique to create two new touchscreen phone keyboard layouts based on three design metrics: minimizing finger travel distance in order to maximize text entry speed, a new metric to maximize the quality of spell correction quality by minimizing neighbouring key ambiguity, and maximizing familiarity through a similarity function with the standard Qwerty layout. The paper describes the optimization process and resulting layouts for a standard trapezoid shaped keyboard and a more rectangular layout. Fitts' law modelling shows a predicted 11% improvement in entry speed without taking into account the significantly improved error correction potential and the subsequent effect on speed. In initial user tests typing speed dropped from approx. 21wpm with Qwerty to 13wpm (64%) on first use of our layout but recovered to 18wpm (85%) within four short trial sessions, and was still improving. NASA TLX forms showed no significant difference on load between Qwerty and our new layout use in the fourth session. Together we believe this shows the new layouts are faster and can be quickly adopted by users

    Year 1 phonics screening check consultation

    Get PDF
    "The Government is committed to raising children's achievement in reading, and has expressed the intention to establish a phonics screening check for children in Year 1. This will be a short, light-touch screening check designed to confirm that children have grasped the basics of phonic decoding and to identify those pupils who need extra help at an early stage, so that schools can provide support. The results of the screening check will provide valuable information to parents. The screening check will be part of the arrangements for the statutory assessment of children in respect of the first Key Stage. This consultation seeks views on proposals around the purpose, structure and administration of the screening check" -- front cover

    Learner autonomy and awareness through distance collaborative group work in English for Academic Purposes

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-40956-6_13Learner autonomy is considered to be both an important skill and attitude of learners, which involves responsibility for and control of the learning process. A key notion in autonomy is interdependence, developed through collaboration and which results in heightened awareness. Precisely, this concept lies at the core of technology applications, which facilitate interaction and collaboration at a distance. With a growing number of online ESP situations, more attention needs to be paid to virtual classrooms and the development of learner autonomy through collaboration. In the context of a distance EAP course, this chapter examines how students carry out a collaborative language awareness task, considering that peer interaction can be an appropriate setting to develop language awareness, whether in face-to-face or online situations. Based on the framework of 'community of inquiry' (Garrison et al. 2000), this study looks at how group members interact through forum posts and wiki edits, showing how students initiate, manage and carry out the task, together with the social, cognitive, and meta-cognitive processes that are generated. Given the nature of the task, creating a language learning activity, special attention is paid to students’ focus on and discussion of topics related to language and learning. From these observations we can derive implications for online language teaching and materials design.Peer ReviewedPreprin

    Errors lingüístics en el domini biomèdic: Cap a una tipologia d’errors per a l’espanyol

    Get PDF
    L’objectiu d’aquest treball és l’anàlisi d’errors continguts en un corpus d’informes mèdics en llenguatge natural i el disseny d’una tipologia d’errors, ja que no hi va haver una revisió sistemàtica sobre verificació i correcció d’errors en documentació clínica en castellà. En el desenvolupament de sistemes automàtics de detecció i correcció, és d’interès aprofundir en la naturalesa dels errors lingüístics que es produeixen en els informes clínics per tal de detectar-los i tractar-los adequadament. Els resultats mostren que els errors d’omissió són els més freqüents en la mostra analitzada i que la longitud de la paraula sens dubte influeix en la freqüència d’error. La tipificació dels patrons d’error proporcionats permet el desenvolupament d’un mòdul basat en coneixements lingüístics, actualment en curs, que serà capaç de millorar el rendiment dels sistemes de correcció de detecció i correcció d’errors per al domini biomèdicThe objective of this work is the analysis of errors contained in a corpus of medical reports in natural language and the design of a typology of errors, as there was no systematic review on verification and correction of errors in clinical documentation in Spanish. In the development of automatic detection and correction systems, it is of great interest to delve into the nature of the linguistic errors that occur in clinical reports, in order to detect and treat them properly. The results show that omission errors are the most frequent ones in the analyzed sample, and that word length certainly influences error frequency. The typification of error patterns provided is enabling the development of a module based on linguistic knowledge, which is currently in progress. This will help to improve the performance of error detection and correction systems for the biomedical domain.This work was supported by the Spanish National Research Agency (AEI) through project LaTe4PSP (PID2019-107652RB-I00/AEI/10.13039/501100011033). Furthermore, the main autor is supported by Ministerio de Universidades of Spain through the national program Ayudas para la formación de profesorado universitario (FPU), with reference FPU16/0332
    corecore