Search CORE

7,833 research outputs found

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

Author: Diem Markus
Fiel Stefan
Grüning Tobias
Kleber Florian
Labahn Roger
Publication venue
Publication date: 11/12/2017
Field of study

Text line detection is crucial for any application associated with Automatic Text Recognition or Keyword Spotting. Modern algorithms perform good on well-established datasets since they either comprise clean data or simple/homogeneous page layouts. We have collected and annotated 2036 archival document images from different locations and time periods. The dataset contains varying page layouts and degradations that challenge text line segmentation methods. Well established text line segmentation evaluation schemes such as the Detection Rate or Recognition Accuracy demand for binarized data that is annotated on a pixel level. Producing ground truth by these means is laborious and not needed to determine a method's quality. In this paper we propose a new evaluation scheme that is based on baselines. The proposed scheme has no need for binarization and it can handle skewed as well as rotated text lines. The ICDAR 2017 Competition on Baseline Detection and the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts used this evaluation scheme. Finally, we present results achieved by a recently published text line detection algorithm.Comment: Submitted to DAS201

arXiv.org e-Print Archive

Crossref

A survey of validity and utility of electronic patient records in a general practice

Author: Gerrett D.
Hassey A.
Wilson A.
Publication venue: 'BMJ'
Publication date: 09/06/2001
Field of study

Objective: To develop methods of measuring the validity and utility of electronic patient records in general practice. Design: A survey of the main functional areas of a practice and use of independent criteria to measure the validity of the practice database. Setting: A fully computerised general practice in Skipton, north Yorkshire. Subjects: The records of all registered practice patients. Main outcome measures: Validity of the main functional areas of the practice clinical system. Measures of the completeness, accuracy, validity, and utility of the morbidity data for 15 clinical diagnoses using recognised diagnostic standards to confirm diagnoses and identify further cases. Development of a method and statistical toolkit to validate clinical databases in general practice. Results: The practice electronic patient records were valid, complete, and accurate for prescribed items (99.7%), consultations (98.1%), laboratory tests (100%), hospital episodes (100%), and childhood immunisations (97%). The morbidity data for 15 clinical diagnoses were complete (mean sensitivity=87%) and accurate (mean positive predictive value=96%). The presence of the Read codes for the 15 diagnoses was strongly indicative of the true presence of those conditions (mean likelihood ratio=3917). New interpretations of descriptive statistics are described that can be used to estimate both the number of true cases that are unrecorded and quantify the benefits of validating a clinical database for coded entries. Conclusion: This study has developed a method and toolkit for measuring the validity and utility of general practice electronic patient records

Crossref

PubMed Central

White Rose Research Online

A relational post-processing approach for forms recognition

Author: Mao Zhenxing
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2003
Field of study

Optical Character Recognition (OCR) is used to convert paper documents into electronic form. Unfortunately the technology is not perfect and the output can be erroneous. Conversion then is generally augmented by manual error detection and correction procedures which can be very costly; One approach to minimizing cost is to apply an OCR post processing system that will reduce the amount of manual correction required. The post processor takes advantage of knowledge associated with a particular project; In this thesis, we look into the feasibility of using integrity constraints to detect and correct errors in forms recognition. The general idea is to construct a database of form values that can be used to direct recognition and consequently, make automatic correction

University of Nevada, Las Vegas Repository

An Efficient Automated Attendance Entering System by Eliminating Counterfeit Signatures using Kolmogorov Smirnov Test

Author: B.H. Sudantha
Lokesha Weerasinghe
Publication venue: Global Journals Inc. (US)
Publication date: 15/03/2019
Field of study

Maintaining the attendance database of thousands of students has become a tedious task in the universities in Sri Lanka This paper comprises of 3 phases signature extraction signature recognition and signature verification to automate the process We applied necessary image processing techniques and extracted useful features from each signature Support Vector Machine SVM multiclass Support Vector Machine and Kolmogorov Smirnov test is used to signature classification recognition and verification respectively The described method in this report represents an effective and accurate approach to automatic signature recognition and verification It is capable of matching classifying and verifying the test signatures with the database of 83 33 100 and 100 accuracy respectivel

Global Journal of Computer Science and Technology (GJCST)

Extraction and parsing of herbarium specimen data: Exploring the use of the Dublin core application profile framework

Author: Huang Jane
McCotter Melody J.
Moen William E.
Publication venue
Publication date: 03/02/2010
Field of study

Herbaria around the world house millions of plant specimens; botanists and other researchers value these resources as ingredients in biodiversity research. Even when the specimen sheets are digitized and made available online, the critical information about the specimen stored on the sheet are not in a usable (i.e., machine-processible) form. This paper describes a current research and development project that is designing and testing high-throughput workflows that combine machine- and human-processes to extract and parse the specimen label data. The primary focus of the paper is the metadata needs for the workflow and the creation of the structured metadata records describing the plant specimen. In the project, we are exploring the use of the new Dublin Core Metadata Initiative framework for application profiles. First articulated as the Singapore Framework for Dublin Core Application Profiles in 2007, the use of this framework is in its infancy. The promises of this framework for maximum interoperability and for documenting the use of metadata for maximum reusability, and for supporting metadata applications that are in conformance with Web architectural principles provide the incentive to explore and add implementation experience regarding this new framework

Illinois Digital Environment for Access to Learning and Scholarship Repository

HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition

Author: Dahl Christian M.
Johansen Torben
Sørensen Emil N.
Wittrock Simon
Publication venue
Publication date: 22/01/2021
Field of study

Methods for linking individuals across historical data sets, typically in combination with AI based transcription models, are developing rapidly. Probably the single most important identifier for linking is personal names. However, personal names are prone to enumeration and transcription errors and although modern linking methods are designed to handle such challenges these sources of errors are critical and should be minimized. For this purpose, improved transcription methods and large-scale databases are crucial components. This paper describes and provides documentation for HANA, a newly constructed large-scale database which consists of more than 1.1 million images of handwritten word-groups. The database is a collection of personal names, containing more than 105 thousand unique names with a total of more than 3.3 million examples. In addition, we present benchmark results for deep learning models that automatically can transcribe the personal names from the scanned documents. Focusing mainly on personal names, due to its vital role in linking, we hope to foster more sophisticated, accurate, and robust models for handwritten text recognition through making more challenging large-scale databases publicly available. This paper describes the data source, the collection process, and the image-processing procedures and methods that are involved in extracting the handwritten personal names and handwritten text in general from the forms

arXiv.org e-Print Archive

Explore Bristol Research

Analysis of errors presented by illiterate adults throughout a computerized program to teach reading and writing skills

Author: Barros Romariz da Silva
Calcagno Solange
de-Souza Deisy das Gracas
Sbravatti-Ferrari Isabela
Publication venue: Universidad Católica de Colombia. Facultad de Psicología
Publication date: 01/01/2015
Field of study

La tipología de los errores presentados por los niños en la adquisición de la lectura y la escritura ha sido ampliamente examinada. El análisis de errores permite inferir fuentes de control del comportamiento durante el aprendizaje, siendo así una herramienta importante para perfeccionar los programas de enseñanza. Sin embargo, pocos estudios han explorado los tipos de errores cometidos por adultos iletrados. Este es un estudio descriptivo que tuvo como objetivo identificar y analizar los errores cometidos por adultos que están aprendiendo a leer y escribir, a partir de un programa de enseñanza computarizado, con el fin de verificar la adecuación del programa a la necesidad de desarrollar procedimientos específicos para esta población. Quince adultos se sometieron individualmente al programa, el cual se compone de una secuencia de pasos de enseñanza y evaluación (pre y postpruebas y exámenes intermedios). Los errores fueron clasificados y analizados con referencia a las categorías descritas en la literatura y algunas nuevas creadas específicamente para este trabajo. Los datos muestran alta concentración de errores en algunas categorías, especialmente para el primer módulo de enseñanza, con indicación parcial de especificidad de los tipos de errores para la población objeto de estudio. Los participantes también presentaron dificultades en la tarea de dictado por construcción, lo cual indica la necesidad de perfeccionamiento del programa cuando es utilizado en la alfabetización de adultos.The typology of errors presented by children in the acquisition of reading and writing has been widely explored. Error analyses allow inferring sources of behavior control throughout the learning process and are an important tool for improving programs that teach reading and writing. Nevertheless, few studies have explored the types of errors made by illiterate adults. This is a descriptive study aiming to identify and analyze the errors made by adults participating in the process of learning to read and write using a computerized teaching program. The purpose was to evaluate the adequacy of the program and to point out whether there is a need to develop specific procedures for this population. Fifteen adults were individually submitted to the program, which comprises a sequence of teaching steps and assessments (pre and post-tests and intermediate tests). Errors made by the students were categorized and analyzed according to categories described in the literature as well as new ones created specifically for this study. The data show a high concentration of errors in some categories, particularly for the first teaching module, with partial indication of error type specificity for the population in focus. This study also shows the participants' difficulties in writing (construction spelling task), requiring improvement of the computerized program when applied to adult literacy.A tipologia de erros apresentados por crianças na aquisição de leitura e escrita tem sido amplamente explorada. A análise de erros permite inferir fontes de controle do comportamento ao longo da aprendizagem, sendo por isso um importante instrumento para o aperfeiçoamento de programas de ensino. No entanto, poucos estudos têm explorado a tipologia de erros apresentados por adultos iletrados. Este é um estudo descritivo que teve como objetivo identificar/analisar os erros cometidos por participantes adultos durante o processo de aprendizagem da leitura e escrita, submetidos a um programa informatizado de ensino, visando verificar a adequação do programa ou a necessidade de procedimentos específicos para esta população. Quinze adultos passaram individualmente pelo programa, constituído por uma sequência de passos de ensino e avaliação (pré e póstestes e testes intermediários). Os erros foram categorizados e analisados com base nas categorias descritas na literatura e categorias novas, criadas especificamente para este trabalho. Os dados mostraram grande concentração de erros em algumas categorias, principalmente para o primeiro módulo de ensino, com indicação parcial de especificidade de tipologia de erros para a população em foco. Os participantes também mostraram dificuldade na escrita (ditado por construção), indicando a necessidade de aperfeiçoamento do programa quando utilizado na alfabetização de adultos

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Portal de revistas electrónicas de la Universidad Católica de Colombia

Repositorio Institucional Universidad Católica de Colombia

Conducting inspections of local authority and voluntary adoption agencies : guidance on the inspection of adoption agencies

Author
Publication venue: Office for Standards in Education, Children’s Services and Skills
Publication date: 01/01/2012
Field of study

"This guidance is designed to assist inspectors from the Office for Standards in Education, Children’s Services and Skills (Ofsted) when conducting inspections of local authority and voluntary adoption agencies. It should be read in conjunction with the inspection framework and the evaluation schedule" - front cover

Digital Education Resource Archive