374 research outputs found

    CharFormer: A Glyph Fusion based Attentive Framework for High-precision Character Image Denoising

    Full text link
    Degraded images commonly exist in the general sources of character images, leading to unsatisfactory character recognition results. Existing methods have dedicated efforts to restoring degraded character images. However, the denoising results obtained by these methods do not appear to improve character recognition performance. This is mainly because current methods only focus on pixel-level information and ignore critical features of a character, such as its glyph, resulting in character-glyph damage during the denoising process. In this paper, we introduce a novel generic framework based on glyph fusion and attention mechanisms, i.e., CharFormer, for precisely recovering character images without changing their inherent glyphs. Unlike existing frameworks, CharFormer introduces a parallel target task for capturing additional information and injecting it into the image denoising backbone, which will maintain the consistency of character glyphs during character image denoising. Moreover, we utilize attention-based networks for global-local feature interaction, which will help to deal with blind denoising and enhance denoising performance. We compare CharFormer with state-of-the-art methods on multiple datasets. The experimental results show the superiority of CharFormer quantitatively and qualitatively.Comment: Accepted by ACM MM 202

    Extracting Maya Glyphs from Degraded Ancient Documents via Image Segmentation

    Get PDF
    We present a system for automatically extracting hieroglyph strokes from images of degraded ancient Maya codices. Our system adopts a region-based image segmentation framework. Multi-resolution super-pixels are first extracted to represent each image. A Support Vector Machine (SVM) classifier is used to label each super-pixel region with a probability to belong to foreground glyph strokes. Pixelwise probability maps from multiple super-pixel resolution scales are then aggregated to cope with various stroke widths and background noise. A fully connected Conditional Random Field model is then applied to improve the labeling consistency. Segmentation results show that our system preserves delicate local details of the historic Maya glyphs with various stroke widths and also reduces background noise. As an application, we conduct retrieval experiments using the extracted binary images. Experimental results show that our automatically extracted glyph strokes achieve comparable retrieval results to those obtained using glyphs manually segmented by epigraphers in our team

    Efficient and effective OCR engine training

    Get PDF
    We present an efficient and effective approach to train OCR engines using the Aletheia document analysis system. All components required for training are seamlessly integrated into Aletheia: training data preparation, the OCR engine’s training processes themselves, text recognition, and quantitative evaluation of the trained engine. Such a comprehensive training and evaluation system, guided through a GUI, allows for iterative incremental training to achieve best results. The widely used Tesseract OCR engine is used as a case study to demonstrate the efficiency and effectiveness of the proposed approach. Experimental results are presented validating the training approach with two different historical datasets, representative of recent significant digitisation projects. The impact of different training strategies and training data requirements is presented in detail

    Handwritten Amharic Character Recognition Using a Convolutional Neural Network

    Full text link
    Amharic is the official language of the Federal Democratic Republic of Ethiopia. There are lots of historic Amharic and Ethiopic handwritten documents addressing various relevant issues including governance, science, religious, social rules, cultures and art works which are very reach indigenous knowledge. The Amharic language has its own alphabet derived from Ge'ez which is currently the liturgical language in Ethiopia. Handwritten character recognition for non Latin scripts like Amharic is not addressed especially using the advantages of the state of the art techniques. This research work designs for the first time a model for Amharic handwritten character recognition using a convolutional neural network. The dataset was organized from collected sample handwritten documents and data augmentation was applied for machine learning. The model was further enhanced using multi-task learning from the relationships of the characters. Promising results are observed from the later model which can further be applied to word prediction.Comment: ECDA2019 Conference Oral Presentatio

    The Mesoamerican Corpus of Formative Period Art and Writing

    Get PDF
    This project explores the origins and development of the first writing in the New World by constructing a comprehensive database of Formative period, 1500-400 BCE, iconography and a suite of database-driven digital tools. In collaboration with two of the largest repositories of Formative period Mesoamerican art in Mexico, the project integrates the work of archaeologists, art historians, and scientific computing specialists to plan and begin the production of a database, digital assets, and visual search software that permit the visualization of spatial, chronological, and contextual relationships among iconographic and archaeological datasets. These resources will eventually support mobile and web based applications that allow for the search, comparison, and analysis of a corpus of material currently only partially documented. The start-up phase will generate a functional prototype database, project website, wireframe user interfaces, and a report summarizing project development

    Handwritten Amharic Character Recognition Using a Convolutional Neural Network

    Get PDF
    Amharic is the official language of the Federal Democratic Republic of Ethiopia. There are lots of historic Amharic and Ethiopic handwritten documents addressing various relevant issues including governance, science, religious, social rules, cultures and art works which are very rich indigenous knowledge. The Amharic language has its own alphabet derived from Ge’ez which is currently the liturgical language in Ethiopia. Handwritten character recognition for non Latin scripts like Amharic is not addressed especially using the advantages of state-of-the-art techniques. This research work designs for the first time a model for Amharic handwritten character recognition using a convolutional neural network. The dataset was organized from collected sample handwritten documents and data augmentation was applied for machine learning

    Prehistoric Drawings in Mammoth Cave

    Get PDF
    During a recent Earthwatch Institute survey of archaeological remains in Mammoth Cave, a project was begun to find and record prehistoric images on the cave walls. I chose to analyze petroglyphs and pictographs on three panels in Main Cave. This article offers a hypothesis for the circumstances surrounding the rock art’s production: the geometric and anthropomorphic figures in Mammoth Cave are representative of a series of visual percepts experienced cross-culturally and caused by various conditions — including sensory deprivation, fatigue, and psychoactive drug use — acting on the ocular anatomy and nervous system. That is, the glyphs might be visual representations of simple hallucinations experienced by early cavers. These forms, “entoptic phenomena,” frequently occur in cave images and other artwork around the world, and are often ethnographically linked to shamanistic visions and other activities involving altered states of consciousness. The images in Mammoth Cave appear to represent several of the entoptic forms, and conditions of prehistoric cave exploration would have been ideal for experiencing them. Given this evidence, and considering the frequent use of caves for ritual activities across cultures, it is likely that Mammoth Cave Rock Art is linked to entoptic phenomena

    My Text in Your Handwriting

    Get PDF
    There are many scenarios where we wish to imitate a specific author’s pen-on-paper handwriting style. Rendering new text in someone’s handwriting is difficult because natural handwriting is highly variable, yet follows both intentional and involuntary structure that makes a person’s style self-consistent. The variability means that naive example-based texture synthesis can be conspicuously repetitive. We propose an algorithm that renders a desired input string in an author’s handwriting. An annotated sample of the author’s handwriting is required; the system is flexible enough that historical documents can usually be used with only a little extra effort. Experiments show that our glyph-centric approach, with learned parameters for spacing, line thickness, and pressure, produces novel images of handwriting that look hand-made to casual observers, even when printed on paper

    Adaptive Algorithms for Automated Processing of Document Images

    Get PDF
    Large scale document digitization projects continue to motivate interesting document understanding technologies such as script and language identification, page classification, segmentation and enhancement. Typically, however, solutions are still limited to narrow domains or regular formats such as books, forms, articles or letters and operate best on clean documents scanned in a controlled environment. More general collections of heterogeneous documents challenge the basic assumptions of state-of-the-art technology regarding quality, script, content and layout. Our work explores the use of adaptive algorithms for the automated analysis of noisy and complex document collections. We first propose, implement and evaluate an adaptive clutter detection and removal technique for complex binary documents. Our distance transform based technique aims to remove irregular and independent unwanted foreground content while leaving text content untouched. The novelty of this approach is in its determination of best approximation to clutter-content boundary with text like structures. Second, we describe a page segmentation technique called Voronoi++ for complex layouts which builds upon the state-of-the-art method proposed by Kise [Kise1999]. Our approach does not assume structured text zones and is designed to handle multi-lingual text in both handwritten and printed form. Voronoi++ is a dynamically adaptive and contextually aware approach that considers components' separation features combined with Docstrum [O'Gorman1993] based angular and neighborhood features to form provisional zone hypotheses. These provisional zones are then verified based on the context built from local separation and high-level content features. Finally, our research proposes a generic model to segment and to recognize characters for any complex syllabic or non-syllabic script, using font-models. This concept is based on the fact that font files contain all the information necessary to render text and thus a model for how to decompose them. Instead of script-specific routines, this work is a step towards a generic character and recognition scheme for both Latin and non-Latin scripts
    • …
    corecore