137 research outputs found
Historical Document Digitization through Layout Analysis and Deep Content Classification
Document layout segmentation and recognition is an important task in the creation of digitized documents collections, especially when dealing with historical documents.
This paper presents an hybrid approach to layout segmentation as well as a strategy to classify document regions, which is applied to the process of digitization of an historical encyclopedia. Our layout analysis method merges a classic top-down approach and a bottom-up classification process based on local geometrical features, while regions are classified by means of features extracted from a Convolutional Neural Network merged in a Random Forest classifier. Experiments are conducted on the first volume of the ``Enciclopedia Treccani'', a large dataset containing 999 manually annotated pages from the historical Italian encyclopedia
Ensemble of Anchor-Free Models for Robust Bangla Document Layout Segmentation
In this research paper, we introduce a novel approach designed for the
purpose of segmenting the layout of Bangla documents. Our methodology involves
the utilization of a sophisticated ensemble of YOLOv8 models, which were
trained for the DL Sprint 2.0 - BUET CSE Fest 2023 Competition focused on
Bangla document layout segmentation. Our primary emphasis lies in enhancing
various aspects of the task, including techniques such as image augmentation,
model architecture, and the incorporation of model ensembles. We deliberately
reduce the quality of a subset of document images to enhance the resilience of
model training, thereby resulting in an improvement in our cross-validation
score. By employing Bayesian optimization, we determine the optimal confidence
and Intersection over Union (IoU) thresholds for our model ensemble. Through
our approach, we successfully demonstrate the effectiveness of anchor-free
models in achieving robust layout segmentation in Bangla documents.Comment: 4 pages, 5 figures, 6 Table
ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents
Transforming documents into machine-processable representations is a
challenging task due to their complex structures and variability in formats.
Recovering the layout structure and content from PDF files or scanned material
has remained a key problem for decades. ICDAR has a long tradition in hosting
competitions to benchmark the state-of-the-art and encourage the development of
novel solutions to document layout understanding. In this report, we present
the results of our \textit{ICDAR 2023 Competition on Robust Layout Segmentation
in Corporate Documents}, which posed the challenge to accurately segment the
page layout in a broad range of document styles and domains, including
corporate reports, technical literature and patents. To raise the bar over
previous competitions, we engineered a hard competition dataset and proposed
the recent DocLayNet dataset for training. We recorded 45 team registrations
and received official submissions from 21 teams. In the presented solutions, we
recognize interesting combinations of recent computer vision models, data
augmentation strategies and ensemble methods to achieve remarkable accuracy in
the task we posed. A clear trend towards adoption of vision-transformer based
methods is evident. The results demonstrate substantial progress towards
achieving robust and highly generalizing methods for document layout
understanding.Comment: ICDAR 2023, 10 pages, 4 figure
Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images
Automatic table detection in PDF documents has achieved a great success but
tabular data extraction are still challenging due to the integrity and noise
issues in detected table areas. The accurate data extraction is extremely
crucial in finance area. Inspired by this, the aim of this research is
proposing an automated table detection and tabular data extraction from
financial PDF documents. We proposed a method that consists of three main
processes, which are detecting table areas with a Faster R-CNN (Region-based
Convolutional Neural Network) model with Feature Pyramid Network (FPN) on each
page image, extracting contents and structures by a compounded layout
segmentation technique based on optical character recognition (OCR) and
formulating regular expression rules for table header separation. The tabular
data extraction feature is embedded with rule-based filtering and restructuring
functions that are highly scalable. We annotate a new Financial Documents
dataset with table regions for the experiment. The excellent table detection
performance of the detection model is obtained from our customized dataset. The
main contributions of this paper are proposing the Financial Documents dataset
with table-area annotations, the superior detection model and the rule-based
layout segmentation technique for the tabular data extraction from PDF files
Dynamic instance generation for few-shot handwritten document layout segmentation (short paper)
Historical handwritten document analysis is an important activity to retrieve information about our past. Given that this type of process is slow and time-consuming, the humanities community is searching for new techniques that could aid them in this activity. Document layout analysis is a branch of machine learning that aims to extract semantic informations from digitised documents. Here we propose a new framework for handwritten document layout analysis that differentiates from the current state-of-the-art by the fact that it features few-shot learning, thus allowing for good results with little manually labelled data and the dynamic instance generation process. Our results were obtained using the DIVA - HisDB dataset
AI2D-RST : A multimodal corpus of 1000 primary school science diagrams
This article introduces AI2D-RST, a multimodal corpus of 1000 English-language diagrams that represent topics in primary school natural sciences, such as food webs, life cycles, moon phases and human physiology. The corpus is based on the Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset, a collection of diagrams with crowdsourced descriptions, which was originally developed to support research on automatic diagram understanding and visual question answering. Building on the segmentation of diagram layouts in AI2D, the AI2D-RST corpus presents a new multi-layer annotation schema that provides a rich description of their multimodal structure. Annotated by trained experts, the layers describe (1) the grouping of diagram elements into perceptual units, (2) the connections set up by diagrammatic elements such as arrows and lines, and (3) the discourse relations between diagram elements, which are described using Rhetorical Structure Theory (RST). Each annotation layer in AI2D-RST is represented using a graph. The corpus is freely available for research and teaching.Peer reviewe
Introducing the diagrammatic semiotic mode
Peer reviewe
A semantic-based system for querying personal digital libraries
This is the author's accepted manuscript. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-540-28640-0_4. Copyright @ Springer 2004.The decreasing cost and the increasing availability of new technologies is enabling people to create their own digital libraries. One of the main topic in personal digital libraries is allowing people to select interesting information among all the different digital formats available today (pdf, html, tiff, etc.). Moreover the increasing availability of these on-line libraries, as well as the advent of the so called Semantic Web [1], is raising the demand for converting paper documents into digital, possibly semantically annotated, documents. These motivations drove us to design a new system which could enable the user to interact and query documents independently from the digital formats in which they are represented. In order to achieve this independence from the format we consider all the digital documents contained in a digital library as images. Our system tries to automatically detect the layout of the digital documents and recognize the geometric regions of interest. All the extracted information is then encoded with respect to a reference ontology, so that the user can query his digital library by typing free text or browsing the ontology
- …