418 research outputs found

    An investigation into the use of linguistic context in cursive script recognition by computer

    Get PDF
    The automatic recognition of hand-written text has been a goal for over thirty five years. The highly ambiguous nature of cursive writing (with high variability between not only different writers, but even between different samples from the same writer), means that systems based only on visual information are prone to errors. It is suggested that the application of linguistic knowledge to the recognition task may improve recognition accuracy. If a low-level (pattern recognition based) recogniser produces a candidate lattice (i.e. a directed graph giving a number of alternatives at each word position in a sentence), then linguistic knowledge can be used to find the 'best' path through the lattice. There are many forms of linguistic knowledge that may be used to this end. This thesis looks specifically at the use of collocation as a source of linguistic knowledge. Collocation describes the statistical tendency of certain words to co-occur in a language, within a defined range. It is suggested that this tendency may be exploited to aid automatic text recognition. The construction and use of a post-processing system incorporating collocational knowledge is described, as are a number of experiments designed to test the effectiveness of collocation as an aid to text recognition. The results of these experiments suggest that collocational statistics may be a useful form of knowledge for this application and that further research may produce a system of real practical use

    Towards robust real-world historical handwriting recognition

    Get PDF
    In this thesis, we make a bridge from the past to the future by using artificial-intelligence methods for text recognition in a historical Dutch collection of the Natuurkundige Commissie that explored Indonesia (1820-1850). In spite of the successes of systems like 'ChatGPT', reading historical handwriting is still quite challenging for AI. Whereas GPT-like methods work on digital texts, historical manuscripts are only available as an extremely diverse collections of (pixel) images. Despite the great results, current DL methods are very data greedy, time consuming, heavily dependent on the human expert from the humanities for labeling and require machine-learning experts for designing the models. Ideally, the use of deep learning methods should require minimal human effort, have an algorithm observe the evolution of the training process, and avoid inefficient use of the already sparse amount of labeled data. We present several approaches towards dealing with these problems, aiming to improve the robustness of current methods and to improve the autonomy in training. We applied our novel word and line text recognition approaches on nine data sets differing in time period, language, and difficulty: three locally collected historical Latin-based data sets from Naturalis, Leiden; four public Latin-based benchmark data sets for comparability with other approaches; and two Arabic data sets. Using ensemble voting of just five neural networks, a level of accuracy was achieved which required hundreds of neural networks in earlier studies. Moreover, we increased the speed of evaluation of each training epoch without the need of labeled data

    Advances in Character Recognition

    Get PDF
    This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject

    Cognitive Information Processing

    Get PDF
    Contains research objectives, summary of research and reports on four research projects.National Institutes of Health (Grant 5 PO1 GM14940-02)National Institutes of Health (Grant 5 P01 GM15006-03)Joint Services Electronics Programs (U. S. Army, U. S. Navy, and U. S. Air Force) under Contract DA 28-043-AMC-02536(E)National Institutes of Health (Grant 5 T01 GM01555-03

    Understanding Relations Between Scripts II

    Get PDF
    Contexts of and Relations between Early Writing Systems (CREWS) is a project funded by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 677758), and based in the Faculty of Classics, University of Cambridge. Understanding Relations Between Scripts II: Early Alphabets is the first volume in this series, bringing together ten experts on ancient writing, languages and archaeology to present a set of diverse studies on the early development of alphabetic writing systems and their spread across the Levant and Mediterranean during the second and first millennia BC. By taking an interdisciplinary perspective, it sheds new light on alphabetic writing not just as a tool for recording language but also as an element of culture

    Redesigning Arabic Learning Books, An exploration of the role of graphic communication and typography as visual pedagogic tools in Arabic-Latin bilingual design

    Full text link
    What are ‘educational typefaces’ and why are they needed today? Do Arabic beginners need special typefaces that can simplify learning further? If so, what features should they have? Research findings on the complexity of learning Arabic confirm that the majority of language textbooks and pedagogic materials lead to challenging learning environments due to the poor quality of book design, text-heavy content and the restricted amount of visuals used. The complexity of the data and insufficient design quality of the learning materials reviewed in this practice-based research demand serious thought toward simplification, involving experts in the fields of graphic communication, learning and typeface design. The study offers solutions to some of the problems that arise in the course of designing language-learning books by reviewing selected English learning and information design books and methods of guidance for developing uniform learning material for basic Arabic. Key findings from this study confirm the significant role of Arabic designers and educators in the production of efficient and effective learning materials. Their role involves working closely with Arabic instructors, mastering good language skills and being aware of the knowledge available. Also, selecting legible typefaces with distinct design characteristics to help fulfil various objectives of the learning unit. This study raises awareness of the need for typefaces that can attract people to learn Arabic more easily within a globalized world. The absence of such typefaces led to the exploration of simplified twentieth-century Arabic typefaces that share a similar idea of facilitating reading and writing, and resolving script and language complexity issues. This study traces their historical context and studies their functional, technical and aesthetic features to incorporate their thinking and reassign them as learning tools within the right context. The final outcome is the construction of an experimental bilingual Arabic-English language book series for Arab and non-Arab adult beginners. The learning tools used to create the book series were tested through workshops in Kuwait and London to measure their level of simplification and accessibility. They have confirmed both accessibility and incompatibility within different areas of the learning material of the books and helped improve the final outcome of the practice. The tools have established the significant role of educational typefaces, bilingual and graphic communication within visual Arabic learning

    The Economy Of Typography (the Arrangement or Mode of Operation of Typography)

    Get PDF
    The thesis will show that the current research into legibility and readability regarding certain aspects or characters of type is incomplete, and will demonstrate what further research is necessary to complete the analysis of these aspects or characters in the economy of typography in continuous text. Chapter 1 will show that the development of reading depends on the legibility of the typography and characters ‘recognizing patterns, planning strategy, and feeling’ in other words reading and writing are interdependent all depend in some part on the construction of the characters and their relationship to each other. It will also show that readable writing is desirable and important for the reader’s sake. Chapter 2 will deal with the practical presentation of the characters of what the reading public read, and the role played by legibility and readability of typography in conveying their message. Printers and designers will also have a working knowledge and experience of legibility and readability which is incorporated into typograhy presentations, and this also is taken into account in chapter 2. Chapter 3 reviews the criteria and methods used in typography readability and legibility research. The research will show that readability is the ease with which the eye can absorb the message and move along the line, and legibility is based on the ease with which one letter can be identified from another. Chapter 4 entitled Analysis and Recommendations concludes the thesis with a summary of chapters 1, 2 and 3 before presenting a comparative analysis of current research into legibility, with particular emphasis on misreading or misrecognition of characters, and provides illustrations of the conclusions reached by way of bar chart and tables. Appendix One of the thesis contains a comprehensive list of the research into legibility and readability. Appendix Two contains the graphics of Benjamin Sherbow showing typography layout supportive of type spacing matters discussed in chapter 2. The thesis has an extensive bibliography of the works referred to throughout the thesis
    • …
    corecore