418 research outputs found
An investigation into the use of linguistic context in cursive script recognition by computer
The automatic recognition of hand-written text has been a goal
for over thirty five years. The highly ambiguous nature of cursive
writing (with high variability between not only different writers, but
even between different samples from the same writer), means that
systems based only on visual information are prone to errors.
It is suggested that the application of linguistic knowledge to
the recognition task may improve recognition accuracy. If a low-level
(pattern recognition based) recogniser produces a candidate lattice
(i.e. a directed graph giving a number of alternatives at each word
position in a sentence), then linguistic knowledge can be used to find
the 'best' path through the lattice.
There are many forms of linguistic knowledge that may be used
to this end. This thesis looks specifically at the use of collocation as a
source of linguistic knowledge. Collocation describes the statistical
tendency of certain words to co-occur in a language, within a defined
range. It is suggested that this tendency may be exploited to aid
automatic text recognition.
The construction and use of a post-processing system
incorporating collocational knowledge is described, as are a number
of experiments designed to test the effectiveness of collocation as an
aid to text recognition. The results of these experiments suggest that
collocational statistics may be a useful form of knowledge for this
application and that further research may produce a system of real
practical use
Towards robust real-world historical handwriting recognition
In this thesis, we make a bridge from the past to the future by using artificial-intelligence methods for text recognition in a historical Dutch collection of the Natuurkundige Commissie that explored Indonesia (1820-1850). In spite of the successes of systems like 'ChatGPT', reading historical handwriting is still quite challenging for AI. Whereas GPT-like methods work on digital texts, historical manuscripts are only available as an extremely diverse collections of (pixel) images. Despite the great results, current DL methods are very data greedy, time consuming, heavily dependent on the human expert from the humanities for labeling and require machine-learning experts for designing the models. Ideally, the use of deep learning methods should require minimal human effort, have an algorithm observe the evolution of the training process, and avoid inefficient use of the already sparse amount of labeled data. We present several approaches towards dealing with these problems, aiming to improve the robustness of current methods and to improve the autonomy in training. We applied our novel word and line text recognition approaches on nine data sets differing in time period, language, and difficulty: three locally collected historical Latin-based data sets from Naturalis, Leiden; four public Latin-based benchmark data sets for comparability with other approaches; and two Arabic data sets. Using ensemble voting of just five neural networks, a level of accuracy was achieved which required hundreds of neural networks in earlier studies. Moreover, we increased the speed of evaluation of each training epoch without the need of labeled data
Advances in Character Recognition
This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject
Cognitive Information Processing
Contains research objectives, summary of research and reports on four research projects.National Institutes of Health (Grant 5 PO1 GM14940-02)National Institutes of Health (Grant 5 P01 GM15006-03)Joint Services Electronics Programs (U. S. Army, U. S. Navy, and U. S. Air Force) under Contract DA 28-043-AMC-02536(E)National Institutes of Health (Grant 5 T01 GM01555-03
Understanding Relations Between Scripts II
Contexts of and Relations between Early Writing Systems (CREWS) is a project funded by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 677758), and based in the Faculty of Classics, University of Cambridge. Understanding Relations Between Scripts II: Early Alphabets is the first volume in this series, bringing together ten experts on ancient writing, languages and archaeology to present a set of diverse studies on the early development of alphabetic writing systems and their spread across the Levant and Mediterranean during the second and first millennia BC. By taking an interdisciplinary perspective, it sheds new light on alphabetic writing not just as a tool for recording language but also as an element of culture
Redesigning Arabic Learning Books, An exploration of the role of graphic communication and typography as visual pedagogic tools in Arabic-Latin bilingual design
What are ‘educational typefaces’ and why are they needed today? Do Arabic beginners need special typefaces that can simplify learning further? If so, what features should they have? Research findings on the complexity of learning Arabic confirm that the majority of language textbooks and pedagogic materials lead to challenging learning environments due to the poor quality of book design, text-heavy content and the restricted amount of visuals used. The complexity of the data and insufficient design quality of the learning materials reviewed in this practice-based research demand serious thought toward simplification, involving experts in the fields of graphic communication, learning and typeface design. The study offers solutions to some of the problems that arise in the course of designing language-learning books by reviewing selected English learning and information design books and methods of guidance for developing uniform learning material for basic Arabic.
Key findings from this study confirm the significant role of Arabic designers and educators in the production of efficient and effective learning materials. Their role involves working closely with Arabic instructors, mastering good language skills and being aware of the knowledge available. Also, selecting legible typefaces with distinct design characteristics to help fulfil various objectives of the learning unit.
This study raises awareness of the need for typefaces that can attract people to learn Arabic more easily within a globalized world. The absence of such typefaces led to the exploration of simplified twentieth-century Arabic typefaces that share a similar idea of facilitating reading and writing, and resolving script and language complexity issues. This study traces their historical context and studies their functional, technical and aesthetic features to incorporate their thinking and reassign them as learning tools within the right context. The final outcome is the construction of an experimental bilingual Arabic-English language book series for Arab and non-Arab adult beginners. The learning tools used to create the book series were tested through workshops in Kuwait and London to measure their level of simplification and accessibility. They have confirmed both accessibility and incompatibility within different areas of the learning material of the books and helped improve the final outcome of the practice. The tools have established the significant role of educational typefaces, bilingual and graphic communication within visual Arabic learning
The Economy Of Typography (the Arrangement or Mode of Operation of Typography)
The thesis will show that the current research into legibility and readability regarding certain aspects or characters of type is incomplete, and will demonstrate what further research is necessary to complete the analysis of these aspects or characters in the economy of typography in continuous text. Chapter 1 will show that the development of reading depends on the legibility of the typography and characters ‘recognizing patterns, planning strategy, and feeling’ in other words reading and writing are interdependent all depend in some part on the construction of the characters and their relationship to each other. It will also show that readable writing is desirable and important for the reader’s sake. Chapter 2 will deal with the practical presentation of the characters of what the reading public read, and the role played by legibility and readability of typography in conveying their message. Printers and designers will also have a working knowledge and experience of legibility and readability which is incorporated into typograhy presentations, and this also is taken into account in chapter 2. Chapter 3 reviews the criteria and methods used in typography readability and legibility research. The research will show that readability is the ease with which the eye can absorb the message and move along the line, and legibility is based on the ease with which one letter can be identified from another. Chapter 4 entitled Analysis and Recommendations concludes the thesis with a summary of chapters 1, 2 and 3 before presenting a comparative analysis of current research into legibility, with particular emphasis on misreading or misrecognition of characters, and provides illustrations of the conclusions reached by way of bar chart and tables. Appendix One of the thesis contains a comprehensive list of the research into legibility and readability. Appendix Two contains the graphics of Benjamin Sherbow showing typography layout supportive of type spacing matters discussed in chapter 2. The thesis has an extensive bibliography of the works referred to throughout the thesis
- …