43 research outputs found
OTS: A One-shot Learning Approach for Text Spotting in Historical Manuscripts
Historical manuscript processing poses challenges like limited annotated
training data and novel class emergence. To address this, we propose a novel
One-shot learning-based Text Spotting (OTS) approach that accurately and
reliably spots novel characters with just one annotated support sample. Drawing
inspiration from cognitive research, we introduce a spatial alignment module
that finds, focuses on, and learns the most discriminative spatial regions in
the query image based on one support image. Especially, since the low-resource
spotting task often faces the problem of example imbalance, we propose a novel
loss function called torus loss which can make the embedding space of distance
metric more discriminative. Our approach is highly efficient and requires only
a few training samples while exhibiting the remarkable ability to handle novel
characters, and symbols. To enhance dataset diversity, a new manuscript dataset
that contains the ancient Dongba hieroglyphics (DBH) is created. We conduct
experiments on publicly available VML-HD, TKH, NC datasets, and the new
proposed DBH dataset. The experimental results demonstrate that OTS outperforms
the state-of-the-art methods in one-shot text spotting. Overall, our proposed
method offers promising applications in the field of text spotting in
historical manuscripts
Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples
International audienceThis work focuses on the layout analysis of historical handwritten registers, in which local religious ceremonies were recorded. The aim of this work is to delimit each record in these registers. To this end, two approaches are proposed. Firstly, object detection networks are explored, as three state-of-the-art architectures are compared. Further experiments are then conducted on Mask R-CNN, as it yields the best performance. Secondly, we introduce and investigate Deep Syntax, a hybrid system that takes advantages of recurrent patterns to delimit each record, by combining ushaped networks and logical rules. Finally, these two approaches are evaluated on 3708 French records (16-18th centuries), as well as on the Esposalles public database, containing 253 Spanish records (17th century). While both systems perform well on homogeneous documents, we observe a significant drop in performance with Mask R-CNN on heterogeneous documents, especially when trained on a non-representative subset. By contrast, Deep Syntax relies on steady patterns, and is therefore able to process a wider range of documents with less training data. Not only Deep Syntax produces 15% more match configurations and reduces the ZoneMap surface error metric by 30% when both systems are trained on 120 images, but it also outperforms Mask R-CNN when trained on a database three times smaller. As Deep Syntax generalizes better, we believe it can be used in the context of massive document processing, as collecting and annotating a sufficiently large and representative set of training data is not always achievable
Machine learning for ancient languages: a survey
Ancient languages preserve the cultures and histories of the past. However, their study is fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from deciphering lost languages to restoring damaged inscriptions, to determining the authorship of works of literature. Technological aids have long supported the study of ancient texts, but in recent years advances in artificial intelligence and machine learning have enabled analyses on a scale and in a detail that are reshaping the field of humanities, similarly to how microscopes and telescopes have contributed to the realm of science. This article aims to provide a comprehensive survey of published research using machine learning for the study of ancient texts written in any language, script, and medium, spanning over three and a half millennia of civilizations around the ancient world. To analyze the relevant literature, we introduce a taxonomy of tasks inspired by the steps involved in the study of ancient documents: digitization, restoration, attribution, linguistic analysis, textual criticism, translation, and decipherment. This work offers three major contributions: first, mapping the interdisciplinary field carved out by the synergy between the humanities and machine learning; second, highlighting how active collaboration between specialists from both fields is key to producing impactful and compelling scholarship; third, highlighting promising directions for future work in this field. Thus, this work promotes and supports the continued collaborative impetus between the humanities and machine learning
End-to-End Page-Level Assessment of Handwritten Text Recognition
The evaluation of Handwritten Text Recognition (HTR) systems has traditionally used metrics based on the edit distance between HTR and ground truth (GT) transcripts, at both the character and word levels. This is very adequate when the experimental protocol assumes that both GT and HTR text lines are the same, which allows edit distances to be independently computed to each given line. Driven by recent advances in pattern recognition, HTR systems increasingly face the end-to-end page-level transcription of a document, where the precision of locating the different text lines and their corresponding reading order (RO) play a key role. In such a case, the standard metrics do not take into account the inconsistencies that might appear. In this paper, the problem of evaluating HTR systems at the page level is introduced in detail. We analyse the convenience of using a two-fold evaluation, where the transcription accuracy and the RO goodness are considered separately. Different alternatives are proposed, analysed and empirically compared both through partially simulated and through real, full end-to-end experiments. Results support the validity of the proposed two-fold evaluation approach. An important conclusion is that such an evaluation can be adequately achieved by just two simple and well-known metrics: the Word Error Rate (WER), that takes transcription sequentiality into account, and the here re-formulated Bag of Words Word Error Rate (bWER), that ignores order. While the latter directly and very accurately assess intrinsic word recognition errors, the difference between both metrics (ÎWER) gracefully correlates with the Normalised Spearmanâs Foot Rule Distance (NSFD), a metric which explicitly measures RO errors associated with layout analysis flaws. To arrive to these conclusions, we have introduced another metric called Hungarian Word Word Rate (hWER), based on a here proposed regularised version of the Hungarian Algorithm. This metric is shown to be always almost identical to bWER and both bWER and hWER are also almost identical to WER whenever HTR transcripts and GT references are guarantee to be in the same RO.This paper is part of the I+D+i projects: PID2020-118447RA-I00 (MultiScore) and PID2020-116813RB-I00a (SimancasSearch), funded by MCIN/AEI/10.13039/501100011033. The first author research was developed in part with the Valencian Graduate School and Research Network of Artificial Intelligence (valgrAI, co-funded by Generalitat Valenciana and the European Union). The second author is supported by a MarĂa Zambrano grant from the Spanish Ministerio de Universidades and the European Union NextGenerationEU/PRTR. The third author is supported by grant ACIF/2021/356 from the âPrograma I+D+i de la Generalitat Valencianaâ
Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics
This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ⌠7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p
Machine Learning for handwriting text recognition in historical documents
Olmos
ABSTRACT
In this thesis, we focus on the handwriting text recognition task over historical
documents that are difficult to read for any person that is not an expert in ancient
languages and writing style.
We aim to take advantage and improve the neural networks architectures and
techniques that other authors are proposing for handwriting text recognition in
modern handwritten documents. These models perform this task very precisely
when a large amount of data is available. However, the low availability of labeled
data is a widespread problem in historical documents. The type of writing is
singular, and it is pretty expensive to hire an expert to transcribe a large number
of pages.
After investigating and analyzing the state-of-the-art, we propose the efficient
application of methods such as transfer learning and data augmentation. We also
contribute an algorithm for purging mislabeled samples that affect the learning of
models. Finally, we develop a variational auto encoder method for generating
synthetic samples of handwritten text images for data augmentation.
Experiments are performed on various historical handwritten text databases to
validate the performance of the proposed algorithms. The various included
analyses focus on the evolution of the character and word error rate (CER and
WER) as we increase the training dataset.
One of the most important results is the participation in a contest for transcription
of historical handwritten text. The organizers provided us with a dataset of
documents to train the model, then just a few labeled pages of 5 new documents
were handled to adjust the solution further. Finally, the transcription of nonlabeled
images was requested to evaluate the algorithm. Our method raked
second in this contest
Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology
Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/