117 research outputs found

    OTS: A One-shot Learning Approach for Text Spotting in Historical Manuscripts

    Full text link
    Historical manuscript processing poses challenges like limited annotated training data and novel class emergence. To address this, we propose a novel One-shot learning-based Text Spotting (OTS) approach that accurately and reliably spots novel characters with just one annotated support sample. Drawing inspiration from cognitive research, we introduce a spatial alignment module that finds, focuses on, and learns the most discriminative spatial regions in the query image based on one support image. Especially, since the low-resource spotting task often faces the problem of example imbalance, we propose a novel loss function called torus loss which can make the embedding space of distance metric more discriminative. Our approach is highly efficient and requires only a few training samples while exhibiting the remarkable ability to handle novel characters, and symbols. To enhance dataset diversity, a new manuscript dataset that contains the ancient Dongba hieroglyphics (DBH) is created. We conduct experiments on publicly available VML-HD, TKH, NC datasets, and the new proposed DBH dataset. The experimental results demonstrate that OTS outperforms the state-of-the-art methods in one-shot text spotting. Overall, our proposed method offers promising applications in the field of text spotting in historical manuscripts

    Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples

    Get PDF
    International audienceThis work focuses on the layout analysis of historical handwritten registers, in which local religious ceremonies were recorded. The aim of this work is to delimit each record in these registers. To this end, two approaches are proposed. Firstly, object detection networks are explored, as three state-of-the-art architectures are compared. Further experiments are then conducted on Mask R-CNN, as it yields the best performance. Secondly, we introduce and investigate Deep Syntax, a hybrid system that takes advantages of recurrent patterns to delimit each record, by combining ushaped networks and logical rules. Finally, these two approaches are evaluated on 3708 French records (16-18th centuries), as well as on the Esposalles public database, containing 253 Spanish records (17th century). While both systems perform well on homogeneous documents, we observe a significant drop in performance with Mask R-CNN on heterogeneous documents, especially when trained on a non-representative subset. By contrast, Deep Syntax relies on steady patterns, and is therefore able to process a wider range of documents with less training data. Not only Deep Syntax produces 15% more match configurations and reduces the ZoneMap surface error metric by 30% when both systems are trained on 120 images, but it also outperforms Mask R-CNN when trained on a database three times smaller. As Deep Syntax generalizes better, we believe it can be used in the context of massive document processing, as collecting and annotating a sufficiently large and representative set of training data is not always achievable

    Buddhist and Islamic Orders in Southern Asia

    Get PDF
    This volume aims to foster interaction between scholars in the subfields of Islamic and Buddhist studies by increasing understanding of the circulation and localization of religious texts, institutional models, and ritual practices across Asia and beyond. Buddhist and Islamic Orders in Southern Asia scrutinizes religious orders (here referring to Sufi ?ar?qas and Buddhist monastic and other ritual lineages) that enabled far-flung local communities to be recognized and engaged as part of a broader world of co-religionists, while presenting their traditions and human representatives as attractive and authoritative to new devotees. Contributors to the volume direct their attention toward analogous developments mutually illuminating for both fields of study, drawing readers' attention to the fact that networked persons were not always strongly institutionalized and often moved through Southern Asia and developed local bases without the oversight of complex corporate organizations

    Buddhist and Islamic Orders in Southern Asia

    Get PDF
    This volume aims to foster interaction between scholars in the subfields of Islamic and Buddhist studies by increasing understanding of the circulation and localization of religious texts, institutional models, and ritual practices across Asia and beyond. Buddhist and Islamic Orders in Southern Asia scrutinizes religious orders (here referring to Sufi ?ar?qas and Buddhist monastic and other ritual lineages) that enabled far-flung local communities to be recognized and engaged as part of a broader world of co-religionists, while presenting their traditions and human representatives as attractive and authoritative to new devotees. Contributors to the volume direct their attention toward analogous developments mutually illuminating for both fields of study, drawing readers' attention to the fact that networked persons were not always strongly institutionalized and often moved through Southern Asia and developed local bases without the oversight of complex corporate organizations

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∌ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    Article Segmentation in Digitised Newspapers

    Get PDF
    Digitisation projects preserve and make available vast quantities of historical text. Among these, newspapers are an invaluable resource for the study of human culture and history. Article segmentation identifies each region in a digitised newspaper page that contains an article. Digital humanities, information retrieval (IR), and natural language processing (NLP) applications over digitised archives improve access to text and allow automatic information extraction. The lack of article segmentation impedes these applications. We contribute a thorough review of the existing approaches to article segmentation. Our analysis reveals divergent interpretations of the task, and inconsistent and often ambiguously defined evaluation metrics, making comparisons between systems challenging. We solve these issues by contributing a detailed task definition that examines the nuances and intricacies of article segmentation that are not immediately apparent. We provide practical guidelines on handling borderline cases and devise a new evaluation framework that allows insightful comparison of existing and future approaches. Our review also reveals that the lack of large datasets hinders meaningful evaluation and limits machine learning approaches. We solve these problems by contributing a distant supervision method for generating large datasets for article segmentation. We manually annotate a portion of our dataset and show that our method produces article segmentations over characters nearly as well as costly human annotators. We reimplement the seminal textual approach to article segmentation (Aiello and Pegoretti, 2006) and show that it does not generalise well when evaluated on a large dataset. We contribute a framework for textual article segmentation that divides the task into two distinct phases: block representation and clustering. We propose several techniques for block representation and contribute a novel highly-compressed semantic representation called similarity embeddings. We evaluate and compare different clustering techniques, and innovatively apply label propagation (Zhu and Ghahramani, 2002) to spread headline labels to similar blocks. Our similarity embeddings and label propagation approach substantially outperforms Aiello and Pegoretti but still falls short of human performance. Exploring visual approaches to article segmentation, we reimplement and analyse the state-of-the-art Bansal et al. (2014) approach. We contribute an innovative 2D Markov model approach that captures reading order dependencies and reduces the structured labelling problem to a Markov chain that we decode with Viterbi (1967). Our approach substantially outperforms Bansal et al., achieves accuracy as good as human annotators, and establishes a new state of the art in article segmentation. Our task definition, evaluation framework, and distant supervision dataset will encourage progress in the task of article segmentation. Our state-of-the-art textual and visual approaches will allow sophisticated IR and NLP applications over digitised newspaper archives, supporting research in the digital humanities

    Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology

    Get PDF
    Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/

    Computer Vision and Architectural History at Eye Level:Mixed Methods for Linking Research in the Humanities and in Information Technology

    Get PDF
    Information on the history of architecture is embedded in our daily surroundings, in vernacular and heritage buildings and in physical objects, photographs and plans. Historians study these tangible and intangible artefacts and the communities that built and used them. Thus valuableinsights are gained into the past and the present as they also provide a foundation for designing the future. Given that our understanding of the past is limited by the inadequate availability of data, the article demonstrates that advanced computer tools can help gain more and well-linked data from the past. Computer vision can make a decisive contribution to the identification of image content in historical photographs. This application is particularly interesting for architectural history, where visual sources play an essential role in understanding the built environment of the past, yet lack of reliable metadata often hinders the use of materials. The automated recognition contributes to making a variety of image sources usable forresearch.<br/
    • 

    corecore