12 research outputs found
Advances in Document Layout Analysis
[EN] Handwritten Text Segmentation (HTS) is a task within the Document Layout Analysis field that aims to detect and extract the different page regions of interest found in handwritten documents. HTS remains an active topic, that has gained importance with the years, due to the increasing demand to provide textual access to the myriads of handwritten document collections held by archives and libraries.
This thesis considers HTS as a task that must be tackled in two specialized phases: detection and extraction. We see the detection phase fundamentally as a recognition problem that yields the vertical positions of each region of interest as a by-product. The extraction phase consists in calculating the best contour coordinates of the region using the position information provided by the detection phase.
Our proposed detection approach allows us to attack both higher level regions: paragraphs, diagrams, etc., and lower level regions like text lines. In the case of text line detection we model the problem to ensure that the system's yielded vertical position approximates the fictitious line that connects the lower part of the grapheme bodies in a text line, commonly known as the
baseline.
One of the main contributions of this thesis, is that the proposed modelling approach allows us to include prior information regarding the layout of the documents being processed. This is performed via a Vertical Layout Model (VLM).
We develop a Hidden Markov Model (HMM) based framework to tackle both region detection and classification as an integrated task and study the performance and ease of use of the proposed approach in many corpora. We review the modelling simplicity of our approach to process regions at different levels of information: text lines, paragraphs, titles, etc. We study the impact of adding deterministic and/or probabilistic prior information and restrictions via the VLM that our approach provides.
Having a separate phase that accurately yields the detection position (base- lines in the case of text lines) of each region greatly simplifies the problem that must be tackled during the extraction phase. In this thesis we propose to use a distance map that takes into consideration the grey-scale information in the image. This allows us to yield extraction frontiers which are equidistant to the adjacent text regions. We study how our approach escalates its accuracy proportionally to the quality of the provided detection vertical position. Our extraction approach gives near perfect results when human reviewed baselines are provided.[ES] La Segmentaci贸n de Texto Manuscrito (STM) es una tarea dentro del campo de investigaci贸n de An谩lisis de Estructura de Documentos (AED) que tiene como objetivo detectar y extraer las diferentes regiones de inter茅s de las p谩ginas que se encuentran en documentos manuscritos. La STM es un tema de investigaci贸n activo que ha ganado importancia con los a帽os debido a la creciente demanda de proporcionar acceso textual a las miles de colecciones de documentos manuscritos que se conservan en archivos y bibliotecas.
Esta tesis entiende la STM como una tarea que debe ser abordada en dos fases especializadas: detecci贸n y extracci贸n. Consideramos que la fase de detecci贸n es, fundamentalmente, un problema de clasificaci贸n cuyo subproducto son las posiciones verticales de cada regi贸n de inter茅s. Por su parte, la fase de extracci贸n consiste en calcular las mejores coordenadas de contorno de la regi贸n utilizando la informaci贸n de posici贸n proporcionada por la fase de detecci贸n.
Nuestro enfoque de detecci贸n nos permite atacar tanto regiones de alto nivel (p谩rrafos, diagramas驴) como regiones de nivel bajo (l铆neas de texto principalmente). En el caso de la detecci贸n de l铆neas de texto, modelamos el problema para asegurar que la posici贸n vertical estimada por el sistema se aproxime a la l铆nea ficticia que conecta la parte inferior de los cuerpos de los grafemas en una l铆nea de texto, com煤nmente conocida como l铆nea base. Una de las principales aportaciones de esta tesis es que el enfoque de modelizaci贸n propuesto nos permite incluir informaci贸n conocida a priori sobre la disposici贸n de los documentos que se est谩n procesando. Esto se realiza mediante un Modelo de Estructura Vertical (MEV).
Desarrollamos un marco de trabajo basado en los Modelos Ocultos de Markov (MOM) para abordar tanto la detecci贸n de regiones como su clasificaci贸n de forma integrada, as铆 como para estudiar el rendimiento y la facilidad de uso del enfoque propuesto en numerosos corpus. As铆 mismo, revisamos la simplicidad del modelado de nuestro enfoque para procesar regiones en diferentes niveles de informaci贸n: l铆neas de texto, p谩rrafos, t铆tulos, etc. Finalmente, estudiamos el impacto de a帽adir informaci贸n y restricciones previas deterministas o probabilistas a trav茅s de el MEV propuesto que nuestro enfoque proporciona.
Disponer de un m茅todo independiente que obtiene con precisi贸n la posici贸n de cada regi贸n detectada (l铆neas base en el caso de las l铆neas de texto) simplifica enormemente el problema que debe abordarse durante la fase de extracci贸n. En esta tesis proponemos utilizar un mapa de distancias que tiene en cuenta la informaci贸n de escala de grises de la imagen. Esto nos permite obtener fronteras de extracci贸n que son equidistantes a las regiones de texto adyacentes. Estudiamos como nuestro enfoque aumenta su precisi贸n de manera proporcional a la calidad de la detecci贸n y descubrimos que da resultados casi perfectos cuando se le proporcionan l铆neas de base revisadas por
humanos.[CA] La Segmentaci贸 de Text Manuscrit (STM) 茅s una tasca dins del camp d'investigaci贸 d'An脿lisi d'Estructura de Documents (AED) que t茅 com a objectiu detectar I extraure les diferents regions d'inter猫s de les p脿gines que es troben en documents manuscrits. La STM 茅s un tema d'investigaci贸 actiu que ha guanyat import脿ncia amb els anys a causa de la creixent demanda per proporcionar acc茅s textual als milers de col路leccions de documents manuscrits que es conserven en arxius i biblioteques.
Aquesta tesi ent茅n la STM com una tasca que ha de ser abordada en dues fases especialitzades: detecci贸 i extracci贸. Considerem que la fase de detecci贸 茅s, fonamentalment, un problema de classificaci贸 el subproducte de la qual s贸n les posicions verticals de cada regi贸 d'inter猫s. Per la seva part, la fase d'extracci贸 consisteix a calcular les millors coordenades de contorn de la regi贸 utilitzant la informaci贸 de posici贸 proporcionada per la fase de detecci贸.
El nostre enfocament de detecci贸 ens permet atacar tant regions d'alt nivell (par脿grafs, diagrames ...) com regions de nivell baix (l铆nies de text principalment). En el cas de la detecci贸 de l铆nies de text, modelem el problema per a assegurar que la posici贸 vertical estimada pel sistema s'aproximi a la l铆nia fict铆cia que connecta la part inferior dels cossos dels grafemes en una l铆nia de
text, comunament coneguda com a l铆nia base.
Una de les principals aportacions d'aquesta tesi 茅s que l'enfocament de modelitzaci贸 proposat ens permet incloure informaci贸 coneguda a priori sobre la disposici贸 dels documents que s'estan processant. Aix貌 es realitza mitjan莽ant un Model d'Estructura Vertical (MEV).
Desenvolupem un marc de treball basat en els Models Ocults de Markov (MOM) per a abordar tant la detecci贸 de regions com la seva classificaci贸 de forma integrada, aix铆 com per a estudiar el rendiment i la facilitat d'煤s de l'enfocament proposat en nombrosos corpus. Aix铆 mateix, revisem la simplicitat del modelatge del nostre enfocament per a processar regions en diferents nivells d'informaci贸: l铆nies de text, par脿grafs, t铆tols, etc. Finalment, estudiem l'impacte d'afegir informaci贸 i restriccions pr猫vies deterministes o probabilistes a trav茅s del MEV que el nostre m猫tode proporciona.
Disposar d'un m猫tode independent que obt茅 amb precisi贸 la posici贸 de cada regi贸 detectada (l铆nies base en el cas de les l铆nies de text) simplifica enormement el problema que ha d'abordar-se durant la fase d'extracci贸. En aquesta tesi proposem utilitzar un mapa de dist脿ncies que t茅 en compte la informaci贸 d'escala de grisos de la imatge. Aix貌 ens permet obtenir fronteres d'extracci贸 que s贸n equidistants de les regions de text adjacents. Estudiem com el nostre enfocament augmenta la seva precisi贸 de manera proporcional a la qualitat de la detecci贸 i descobrim que dona resultats quasi perfectes quan se li proporcionen l铆nies de base revisades per humans.Bosch Campos, V. (2020). Advances in Document Layout Analysis [Tesis doctoral no publicada]. Universitat Polit猫cnica de Val猫ncia. https://doi.org/10.4995/Thesis/10251/138397TESI
Advances in Character Recognition
This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject
Neural Networks for Document Image and Text Processing
Nowadays, the main libraries and document archives are investing a considerable effort on digitizing their collections. Indeed, most of them are scanning the documents and publishing the resulting images without their corresponding transcriptions. This seriously limits the document exploitation possibilities. When the transcription is necessary, it is manually performed by human experts, which is a very expensive and error-prone task. Obtaining transcriptions to the level of required quality demands the intervention of human experts to review and correct the resulting output of the recognition engines. To this end, it is extremely useful to provide interactive tools to obtain and edit the transcription.
Although text recognition is the final goal, several previous steps (known as preprocessing) are necessary in order to get a fine transcription from a digitized image. Document cleaning, enhancement, and binarization (if they are needed) are the first stages of the recognition pipeline. Historical Handwritten Documents, in addition, show several degradations, stains, ink-trough and other artifacts. Therefore, more sophisticated and elaborate methods are required when dealing with these kind of documents, even expert supervision in some cases is needed. Once images have been cleaned, main zones of the image have to be detected: those that contain text and other parts such as images, decorations, versal letters. Moreover, the relations among them and the final text have to be detected. Those preprocessing steps are critical for the final performance of the system since an error at this point will be propagated during the rest of the transcription process.
The ultimate goal of the Document Image Analysis pipeline is to receive the transcription of the text (Optical Character Recognition and Handwritten Text Recognition). During this thesis we aimed to improve the main stages of the recognition pipeline, from the scanned documents as input to the final transcription. We focused our effort on applying Neural Networks and deep learning techniques directly on the document images to extract suitable features that will be used by the different tasks dealt during the following work: Image Cleaning and Enhancement (Document Image Binarization), Layout Extraction, Text Line Extraction, Text Line Normalization and finally decoding (or text line recognition). As one can see, the following work focuses on small improvements through the several Document Image Analysis stages, but also deals with some of the real challenges: historical manuscripts and documents without clear layouts or very degraded documents.
Neural Networks are a central topic for the whole work collected in this document.
Different convolutional models have been applied for document image cleaning and enhancement. Connectionist models have been used, as well, for text line extraction:
first, for detecting interest points and combining them in text segments and, finally, extracting the lines by means of aggregation techniques; and second, for pixel labeling to extract the main body area of the text and then the limits of the lines. For text line preprocessing, i.e., to normalize the text lines before recognizing them, similar models have been used to detect the main body area and then to height-normalize the images giving more importance to the central area of the text. Finally, Convolutional Neural Networks and deep multilayer perceptrons have been combined with hidden Markov models to improve our transcription engine significantly.
The suitability of all these approaches has been tested with different corpora for any of the stages dealt, giving competitive results for most of the methodologies presented.Hoy en d铆a, las principales librer铆as y archivos est谩 invirtiendo un esfuerzo considerable en la digitalizaci贸n de sus colecciones. De hecho, la mayor铆a est谩n escaneando estos documentos y publicando 煤nicamente las im谩genes sin transcripciones, limitando seriamente la posibilidad de explotar estos documentos. Cuando la transcripci贸n es necesaria, esta se realiza normalmente por expertos de forma manual, lo cual es una tarea costosa y propensa a errores. Si se utilizan sistemas de reconocimiento autom谩tico se necesita la intervenci贸n de expertos humanos para revisar y corregir la salida de estos motores de reconocimiento.
Por ello, es extremadamente 煤til para proporcionar herramientas interactivas con el fin de generar y corregir la transcripciones.
Aunque el reconocimiento de texto es el objetivo final del An谩lisis de Documentos, varios pasos previos (preprocesamiento) son necesarios para conseguir una buena transcripci贸n a partir de una imagen digitalizada. La limpieza, mejora y binarizaci贸n de las im谩genes son las primeras etapas del proceso de reconocimiento. Adem谩s, los manuscritos hist贸ricos tienen una mayor dificultad en el preprocesamiento, puesto que pueden mostrar varios tipos de degradaciones, manchas, tinta a trav茅s del papel y dem谩s dificultades. Por lo tanto, este tipo de documentos requiere m茅todos de preprocesamiento m谩s sofisticados. En algunos casos, incluso, se precisa de la supervisi贸n de expertos para garantizar buenos resultados en esta etapa. Una vez que las im谩genes han sido limpiadas, las diferentes zonas de la imagen deben de ser localizadas: texto, gr谩ficos, dibujos, decoraciones, letras versales, etc. Por otra parte, tambi茅n es importante conocer las relaciones entre estas entidades. Estas etapas del pre-procesamiento son cr铆ticas para el rendimiento final del sistema, ya que los errores cometidos en aqu铆 se propagar谩n al resto del proceso de transcripci贸n.
El objetivo principal del trabajo presentado en este documento es mejorar las principales etapas del proceso de reconocimiento completo: desde las im谩genes escaneadas hasta la transcripci贸n final. Nuestros esfuerzos se centran en aplicar t茅cnicas de Redes Neuronales (ANNs) y aprendizaje profundo directamente sobre las im谩genes de los documentos, con la intenci贸n de extraer caracter铆sticas adecuadas para las diferentes tareas: Limpieza y Mejora de Documentos, Extracci贸n de L铆neas, Normalizaci贸n de L铆neas de Texto y, finalmente, transcripci贸n del texto. Como se puede apreciar, el trabajo se centra en peque帽as mejoras en diferentes etapas del An谩lisis y Procesamiento de Documentos, pero tambi茅n trata de abordar tareas m谩s complejas: manuscritos hist贸ricos, o documentos que presentan degradaciones.
Las ANNs y el aprendizaje profundo son uno de los temas centrales de esta tesis.
Diferentes modelos neuronales convolucionales se han desarrollado para la limpieza y mejora de im谩genes de documentos. Tambi茅n se han utilizado modelos conexionistas para la extracci贸n de l铆neas: primero, para detectar puntos de inter茅s y segmentos de texto y, agregarlos para extraer las l铆neas del documento; y en segundo lugar, etiquetando directamente los p铆xeles de la imagen para extraer la zona central del texto y as铆 definir los l铆mites de las l铆neas. Para el preproceso de las l铆neas de texto, es decir, la normalizaci贸n del texto antes del reconocimiento final, se han utilizado modelos similares a los mencionados para detectar la zona central del texto. Las imagenes se rescalan a una altura fija dando m谩s importancia a esta zona central. Por 煤ltimo, en cuanto a reconocimiento de escritura manuscrita, se han combinado t茅cnicas de ANNs y aprendizaje profundo con Modelos Ocultos de Markov, mejorando significativamente los resultados obtenidos previamente por nuestro motor de reconocimiento.
La idoneidad de todos estos enfoques han sido testeados con diferentes corpus en cada una de las tareas tratadas., obtenieAvui en dia, les principals llibreries i arxius hist貌rics estan invertint un esfor莽 considerable en la digitalitzaci贸 de les seues col路leccions de documents. De fet, la majoria estan escanejant aquests documents i publicant 煤nicament les imatges sense les seues transcripcions, fet que limita seriosament la possibilitat d'explotaci贸 d'aquests documents. Quan la transcripci贸 del text 茅s necess脿ria, normalment aquesta 茅s realitzada per experts de forma manual, la qual cosa 茅s una tasca costosa i pot provocar errors. Si s'utilitzen sistemes de reconeixement autom脿tic es necessita la intervenci贸 d'experts humans per a revisar i corregir l'eixida d'aquests motors de reconeixement. Per aquest motiu, 茅s extremadament 煤til proporcionar eines interactives amb la finalitat de generar i corregir les transcripcions generades pels motors de reconeixement.
Tot i que el reconeixement del text 茅s l'objectiu final de l'An脿lisi de Documents, diversos passos previs (coneguts com preprocessament) s贸n necessaris per a l'obtenci贸 de transcripcions acurades a partir d'imatges digitalitzades.
La neteja, millora i binaritzaci贸 de les imatges (si calen) s贸n les primeres etapes pr猫vies al reconeixement. A m茅s a m茅s, els manuscrits hist貌rics presenten una major dificultat d'analisi i preprocessament, perqu猫 poden mostrar diversos tipus de degradacions, taques, tinta a trav茅s del paper i altres peculiaritats. Per tant, aquest tipus de documents requereixen m猫todes de preprocessament m茅s sofisticats. En alguns casos, fins i tot, es precisa de la supervisi贸 d'experts per a garantir bons resultats en aquesta etapa. Una vegada que les imatges han sigut netejades, les diferents zones de la imatge han de ser localitzades: text, gr脿fics, dibuixos, decoracions, versals, etc. D'altra banda, tamb茅 茅s important con茅ixer les relacions entre aquestes entitats i el text que contenen. Aquestes etapes del preprocessament s贸n cr铆tiques per al rendiment final del sistema, ja que els errors comesos en aquest moment es propagaran a la resta del proc茅s de transcripci贸.
L'objectiu principal del treball que estem presentant 茅s millorar les principals etapes del proc茅s de reconeixement, 茅s a dir, des de les imatges escanejades fins a l'obtenci贸 final de la transcripci贸 del text. Els nostres esfor莽os se centren en aplicar t猫cniques de Xarxes Neuronals (ANNs) i aprenentatge profund directament sobre les imatges de documents, amb la intenci贸 d'extraure caracter铆stiques adequades per a les diferents tasques analitzades: neteja i millora de documents, extracci贸 de l铆nies, normalitzaci贸 de l铆nies de text i, finalment, transcripci贸. Com es pot apreciar, el treball realitzat aplica xicotetes millores en diferents etapes de l'An脿lisi de Documents, per貌 tamb茅 tracta d'abordar tasques m茅s complexes: manuscrits hist貌rics, o documents que presenten degradacions.
Les ANNs i l'aprenentatge profund s贸n un dels temes centrals d'aquesta tesi.
Diferents models neuronals convolucionals s'han desenvolupat per a la neteja i millora de les dels documents. Tamb茅 s'han utilitzat models connexionistes per a la tasca d'extracci贸 de l铆nies: primer, per a detectar punts d'inter茅s i segments de text i, agregar-los per a extraure les l铆nies del document; i en segon lloc, etiquetant directament els pixels de la imatge per a extraure la zona central del text i aix铆 definir els l铆mits de les l铆nies. Per al preproc茅s de les l铆nies de text, 茅s a dir, la normalitzaci贸 del text abans del reconeixement final, s'han utilitzat models similars als utilitzats per a l'extracci贸 de l铆nies. Finalment, quant al reconeixement d'escriptura manuscrita, s'han combinat t猫cniques de ANNs i aprenentatge profund amb Models Ocults de Markov, que han millorat significativament els resultats obtinguts pr猫viament pel nostre motor de reconeixement.
La idone茂tat de tots aquests enfocaments han sigut testejats amb diferents corpus en cadascuna de les tasques tractadPastor Pellicer, J. (2017). Neural Networks for Document Image and Text Processing [Tesis doctoral no publicada]. Universitat Polit猫cnica de Val猫ncia. https://doi.org/10.4995/Thesis/10251/90443TESI
DOCUMENT AND NATURAL IMAGE APPLICATIONS OF DEEP LEARNING
A tremendous amount of digital visual data is being collected every day, and we need efficient and effective algorithms to extract useful information from that data. Considering the complexity of visual data and the expense of human labor, we expect algorithms to have enhanced generalization capability and depend less on domain knowledge. While many topics in computer vision have benefited from machine learning, some document analysis and image quality assessment problems still have not found the best way to utilize it. In the context of document images, a compelling need exists for reliable methods to categorize and extract key information from captured images. In natural image content analysis, accurate quality assessment has become a critical component for many applications. Most current approaches, however, rely on the heuristics designed by human observations on severely limited data. These approaches typically work only on specific types of images and are hard to generalize on complex data from real applications.
This dissertation looks to address the challenges of processing heterogeneous visual data by applying effective learning methods that directly model the data with minimal preprocessing and feature engineering. We focus on three important problems - text line detection, document image categorization, and image quality assessment. The data we work on typically contains unconstrained layouts, styles, or noise, which resemble the real data from applications. First, we present a graph-based method, learning the line structure from training data for text line segmentation in handwritten document images, and a general framework to detect multi-oriented scene text lines using Higher-Order Correlation Clustering. Our method depends less on domain knowledge and is robust to variations in fonts or languages. Second, we introduce a general approach for document image genre classification using Convolutional Neural Networks (CNN). The introduction of CNNs for document image genre classification largely reduces the needs of hand-crafted features or domain knowledge. Third, we present our CNN based methods to general-purpose No-Reference Image Quality Assessment (NR-IQA). Our methods bridge the gap between NR-IQA and CNN and opens the door to a broad range of deep learning methods. With excellent local quality estimation ability, our methods demonstrate the state of art performance on both distortion identification and quality estimation
A framework for ancient and machine-printed manuscripts categorization
Document image understanding (DIU) has attracted a lot of attention and became an of active fields of research. Although, the ultimate goal of DIU is extracting textual information of a document image, many steps are involved in a such a process such as categorization, segmentation and layout analysis. All of these steps are needed in order to obtain an accurate result from character recognition or word recognition of a document image. One of the important steps in DIU is document image categorization (DIC) that is needed in many situations such as document image written or printed in more than one script, font or language. This step provides useful information for recognition system and helps in reducing its error by allowing to incorporate a category-specific Optical Character Recognition (OCR) system or word recognition (WR) system. This research focuses on the problem of DIC in different categories of scripts, styles and languages and establishes a framework for flexible representation and feature extraction that can be adapted to many DIC problem. The current methods for DIC have many limitations and drawbacks that restrict the practical usage of these methods. We proposed an efficient framework for categorization of document image based on patch representation and Non-negative Matrix Factorization (NMF). This framework is flexible and can be adapted to different categorization problem.
Many methods exist for script identification of document image but few of them addressed the problem in handwritten manuscripts and they have many limitations and drawbacks. Therefore, our first goal is to introduce a novel method for script identification of ancient manuscripts. The proposed method is based on patch representation in which the patches are extracted using skeleton map of a document images. This representation overcomes the limitation of the current methods about the fixed level of layout. The proposed feature extraction scheme based on Projective Non-negative Matrix Factorization (PNMF) is robust against noise and handwriting variation and can be used for different scripts. The proposed method has higher performance compared to state of the art methods and can be applied to different levels of layout.
The current methods for font (style) identification are mostly proposed to be applied on machine-printed document image and many of them can only be used for a specific level of layout. Therefore, we proposed new method for font and style identification of printed and handwritten manuscripts based on patch representation and Non-negative Matrix Tri-Factorization (NMTF). The images are represented by overlapping patches obtained from the foreground pixels. The position of these patches are set based on skeleton map to reduce the number of patches. Non-Negative Matrix Tri-Factorization is used to learn bases from each fonts (style) and then these bases are used to classify a new image based on minimum representation error. The proposed method can easily be extended to new fonts as the bases for each font are learned separately from the other fonts. This method is tested on two datasets of machine-printed and ancient manuscript and the results confirmed its performance compared to the state of the art methods.
Finally, we proposed a novel method for language identification of printed and handwritten manuscripts based on patch representation and Non-negative Matrix Tri-Factorization (NMTF). The current methods for language identification are based on textual data obtained by OCR engine or images data through coding and comparing with textual data. The OCR based method needs lots of processing and the current image based method are not applicable to cursive scripts such as Arabic. In this work we introduced a new method for language identification of machine-printed and handwritten manuscripts based on patch representation and NMTF. The patch representation provides the component of the Arabic script (letters) that can not be extracted simply by segmentation methods. Then NMTF is used for dictionary learning and generating codebooks that will be used to represent document image with a histogram. The proposed method is tested on two datasets of machine-printed and handwritten manuscripts and compared to n-gram features (text-based), texture features and codebook features (imagebased) to validate the performance.
The above proposed methods are robust against variation in handwritings, changes in the font (handwriting style) and presence of degradation and are flexible that can be used to various levels of layout (from a textline to paragraph). The methods in this research have been tested on datasets of handwritten and machine-printed manuscripts and compared to state-of-the-art methods. All of the evaluations show the efficiency, robustness and flexibility of the proposed methods for categorization of document image. As mentioned before the proposed strategies provide a framework for efficient and flexible representation and feature extraction for document image categorization. This frame work can be applied to different levels of layout, the information from different levels of layout can be merged and mixed and this framework can be extended to more complex situations and different tasks
Towards robust real-world historical handwriting recognition
In this thesis, we make a bridge from the past to the future by using artificial-intelligence methods for text recognition in a historical Dutch collection of the Natuurkundige Commissie that explored Indonesia (1820-1850). In spite of the successes of systems like 'ChatGPT', reading historical handwriting is still quite challenging for AI. Whereas GPT-like methods work on digital texts, historical manuscripts are only available as an extremely diverse collections of (pixel) images. Despite the great results, current DL methods are very data greedy, time consuming, heavily dependent on the human expert from the humanities for labeling and require machine-learning experts for designing the models. Ideally, the use of deep learning methods should require minimal human effort, have an algorithm observe the evolution of the training process, and avoid inefficient use of the already sparse amount of labeled data. We present several approaches towards dealing with these problems, aiming to improve the robustness of current methods and to improve the autonomy in training. We applied our novel word and line text recognition approaches on nine data sets differing in time period, language, and difficulty: three locally collected historical Latin-based data sets from Naturalis, Leiden; four public Latin-based benchmark data sets for comparability with other approaches; and two Arabic data sets. Using ensemble voting of just five neural networks, a level of accuracy was achieved which required hundreds of neural networks in earlier studies. Moreover, we increased the speed of evaluation of each training epoch without the need of labeled data