Search CORE

19,200 research outputs found

Recognition and identification of form document layouts

Author: Luo Kai
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2003
Field of study

In this thesis, a hierarchical tree representation is introduced to represent the logical structure of a form document. But different forms might have the same logical structure, so the representation will be ambiguous. In this thesis, an improvement is proposed to solve the ambiguity problem by using the physical information of the blocks. To fulfill the application of hierarchical tree representation and extract the physical information of blocks, a pixel tracing approach is used to extract form layout structures from form documents. Compared with Hough transform, the pixel tracing algorithm requires less computation. This algorithm has been tested on 50 different table forms. It effectively extracts all the line information required for the hierarchical tree representation, represents the form by a hierarchical tree, and distinguishes the different forms. The algorithm applies to table form documents

University of Nevada, Las Vegas Repository

Recommended from our members

Music-reading expertise modulates the visual span for English letters but not Chinese characters.

Author: Chung Susana TL
Hsiao Janet H
Li Sara TK
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

Recent research has suggested that the visual span in stimulus identification can be enlarged through perceptual learning. Since both English and music reading involve left-to-right sequential symbol processing, music-reading experience may enhance symbol identification through perceptual learning particularly in the right visual field (RVF). In contrast, as Chinese can be read in all directions, and components of Chinese characters do not consistently form a left-right structure, this hypothesized RVF enhancement effect may be limited in Chinese character identification. To test these hypotheses, here we recruited musicians and nonmusicians who read Chinese as their first language (L1) and English as their second language (L2) to identify music notes, English letters, Chinese characters, and novel symbols (Tibetan letters) presented at different eccentricities and visual field locations on the screen while maintaining central fixation. We found that in English letter identification, significantly more musicians achieved above-chance performance in the center-RVF locations than nonmusicians. This effect was not observed in Chinese character or novel symbol identification. We also found that in music note identification, musicians outperformed nonmusicians in accuracy in the center-RVF condition, consistent with the RVF enhancement effect in the visual span observed in English-letter identification. These results suggest that the modulation of music-reading experience on the visual span for stimulus identification depends on the similarities in the perceptual processes involved

eScholarship - University of California

CloudScan - A configuration-free invoice analysis system using recurrent neural networks

Author: Laws Florian
Palm Rasmus Berg
Winther Ole
Publication venue
Publication date: 01/01/2017
Field of study

We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely. We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system. We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice layouts. For the harder task of unseen invoice layouts, the recurrent neural network model outperforms the baseline with 0.840 average F1 compared to 0.788.Comment: Presented at ICDAR 201

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

A survey of comics research in computer science

Author: Augereau Olivier
Iwata Motoi
Kise Koichi
Publication venue
Publication date: 15/04/2018
Field of study

Graphical novels such as comics and mangas are well known all over the world. The digital transition started to change the way people are reading comics, more and more on smartphones and tablets and less and less on paper. In the recent years, a wide variety of research about comics has been proposed and might change the way comics are created, distributed and read in future years. Early work focuses on low level document image analysis: indeed comic books are complex, they contains text, drawings, balloon, panels, onomatopoeia, etc. Different fields of computer science covered research about user interaction and content generation such as multimedia, artificial intelligence, human-computer interaction, etc. with different sets of values. We propose in this paper to review the previous research about comics in computer science, to state what have been done and to give some insights about the main outlooks

arXiv.org e-Print Archive

Directory of Open Access Journals

XLIndy: interactive recognition and information extraction in spreadsheets

Author: Gonsior Julius
Koci Elvis
Kuban Dana
Lehner Wolfgang
Luetting Nico
Olwig Dominik
Romero Moral Óscar
Thiele Maik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Over the years, spreadsheets have established their presence in many domains, including business, government, and science. However, challenges arise due to spreadsheets being partially-structured and carrying implicit (visual and textual) information. This translates into a bottleneck, when it comes to automatic analysis and extraction of information. Therefore, we present XLIndy, a Microsoft Excel add-in with a machine learning back-end, written in Python. It showcases our novel methods for layout inference and table recognition in spreadsheets. For a selected task and method, users can visually inspect the results, change configurations, and compare different runs. This enables iterative fine-tuning. Additionally, users can manually revise the predicted layout and tables, and subsequently save them as annotations. The latter is used to measure performance and (re-)train classifiers. Finally, data in the recognized tables can be extracted for further processing. XLIndy supports several standard formats, such as CSV and JSON.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Identification of Technical Journals by Image Processing Techniques

Author: Wang Ling Ling
Wen Pei Chun
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

The emphasis of this study is put on developing an automatic approach to identifying a given unknown technical journal from its cover page. Since journal cover pages contain a great deal of information, determining the title of an unknown journal using optical character recognition techniques seems difficult. Comparing the layout structures of text blocks on the journal cover pages is an effective method for distinguishing one journal from the other. In order to achieve efficient layout-structure comparison, a left-to-right hidden Markov model (HMM) is used to represent the layout structure of text blocks for each kind of journal. Accordingly, title determination of an input unknown journal can be effectively achieved by comparing the layout structure of the unknown journal to each HMM in the database. Besides, from the layout structure of the best matched HMM, we can locate the text block of the issue date, which will be recognized by OCR techniques for accomplishing an automatic journal registration system. Experimental results show the feasibility of the proposed approach

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents

Author: Ali Tofik
Roy Partha Pratim
Publication venue
Publication date: 25/10/2023
Field of study

This paper introduces a deep learning model tailored for document information analysis, emphasizing document classification, entity relation extraction, and document visual question answering. The proposed model leverages transformer-based models to encode all the information present in a document image, including textual, visual, and layout information. The model is pre-trained and subsequently fine-tuned for various document image analysis tasks. The proposed model incorporates three additional tasks during the pre-training phase, including reading order identification of different layout segments in a document image, layout segments categorization as per PubLayNet, and generation of the text sequence within a given layout segment (text block). The model also incorporates a collective pre-training scheme where losses of all the tasks under consideration, including pre-training and fine-tuning tasks with all datasets, are considered. Additional encoder and decoder blocks are added to the RoBERTa network to generate results for all tasks. The proposed model achieved impressive results across all tasks, with an accuracy of 95.87% on the RVL-CDIP dataset for document classification, F1 scores of 0.9306, 0.9804, 0.9794, and 0.8742 on the FUNSD, CORD, SROIE, and Kleister-NDA datasets respectively for entity relation extraction, and an ANLS score of 0.8468 on the DocVQA dataset for visual question answering. The results highlight the effectiveness of the proposed model in understanding and interpreting complex document layouts and content, making it a promising tool for document analysis tasks

arXiv.org e-Print Archive

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Author: Barman Raphaël
Clematide Simon
Ehrmann Maud
Kaplan Frédéric
Oliveira Sofia Ares
Publication venue: 'Centre pour la Communication Scientifique Directe (CCSD)'
Publication date: 14/12/2020
Field of study

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Episciences.org

Directory of Open Access Journals

Program documentation standards

Author: Frum C. A.
Parker D. A.
Publication venue
Publication date
Field of study

A style manual is presented to serve as a reference and guide for system and program documentation. It is intended to set standards for documentation, prescribing the procedures to be followed, format to be used, and information to be produced. The standards for program documentation specify the extent to which the programmer should support his efforts in writing. The first three sections of the manual (system, program, and operation descriptions) contain information of particular interest to management, operators, and program users, respectively. Each section was designed as a self-sufficient description from the management, operator, or user point of view

NASA Technical Reports Server