123 research outputs found
Processing Camera-captured Document Images: Geometric Rectification, Mosaicing, and Layout Structure Recognition
This dissertation explores three topics: 1) geometric rectification of cameracaptured document images, 2) camera-captured document mosaicing, and 3) layout structure recognition. The first two topics pertain to camera-based document image analysis, a new trend within the OCR community. Compared to typical scanners,cameras offer convenient, flexible, portable, and non-contact image capture, which enables many new applications and breathes new life into existing ones. The third topic is related to the need for efficient metadata extraction methods, critical for managing digitized documents.
The kernel of our geometric rectification framework is a novel method for estimating document shape from a single camera-captured image. Our method uses texture flows detected in printed text areas and is insensitive to occlusion. Classification of planar versus curved documents is done automatically. For planar pages, we obtain full metric rectification. For curved pages, we estimate a planar-strip approximation based on properties of developable surfaces. Our method can process any planar or smoothly curved document captured from an arbitrary position without requiring 3D data, metric data, or camera calibration.
For the second topic, we design a novel registration method for document images, which produces good results in difficult situations including large displacements, severe projective distortion, small overlapping areas, and lack of distinguishable feature points. We implement a selective image composition method that outperforms conventional image blending methods in overlapping areas. It eliminates double images caused by mis-registration and preserves the sharpness in overlapping areas.
We solve the third topic with a graph-based model matching framework. Layout structures are modeled by graphs, which integrate local and global features and are extensible to new features in the future. Our model can handle large variation within a class and subtle differences between classes. Through graph matching, the layout structure of a document is discovered. Our layout structure recognition technique accomplishes document classification and logical component labeling at the same time. Our model learning method enables a model to adapt to changes in classes over time
Recommended from our members
Fast embedding for image classification & retrieval and its application to the hostel industry
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonContent-based image classification and retrieval are the automatic processes of taking
an unseen image input and extracting its features representing the input image. Then,
for the classification task, this mathematically measured input is categorized according
to established criteria in the server and consequently shows the output as a result. On
the other hand, for the retrieval task, the extracted features of an unseen query image
are sent to the server to search for the most visually similar images to a given image
and retrieve these images as a result. Despite image features could be represented
by classical features, artificial intelligence-based features, Convolutional Neural
Networks (CNN) to be precise, have become powerful tools in the field. Nonetheless,
the high dimensional CNN features have been a challenge in particular for applications
on mobile or Internet of Things devices. Therefore, in this thesis, several fast
embeddings are explored and proposed to overcome the constraints of low memory,
bandwidth, and power. Furthermore, the first hostel image database is created with
three datasets, hostel image dataset containing 13,908 interior and exterior images of
hostels across the world, and Hostels-900 dataset and Hostels-2K dataset containing
972 images and 2,380 images, respectively, of 20 London hostel buildings. The results
demonstrate that the proposed fast embeddings such as the application of GHM-Rand
operator, GHM-Fix operator, and binary feature vectors are able to outperform or give
competitive results to those state-of-the-art methods with a lot less computational
resource. Additionally, the findings from a ten-year literature review of CBIR study in
the tourism industry could picturize the relevant research activities in the past decade
which are not only beneficial to the hostel industry or tourism sector but also to the
computer science and engineering research communities for the potential real-life
applications of the existing and developing technologies in the field
A survey of face detection, extraction and recognition
The goal of this paper is to present a critical survey of existing literatures on human face recognition over the last 4-5 years. Interest and research activities in face recognition have increased significantly over the past few years, especially after the American airliner tragedy on September 11 in 2001. While this growth largely is driven by growing application demands, such as static matching of controlled photographs as in mug shots matching, credit card verification to surveillance video images, identification for law enforcement and authentication for banking and security system access, advances in signal analysis techniques, such as wavelets and neural networks, are also important catalysts. As the number of proposed techniques increases, survey and evaluation becomes important
Knowledge based text indexing and retrieval utilizing case based reasoning
Information retrieval systems for documents normally rely on the use of keywords that describe the text in some fashion or another, or are contained in the text itself, for indexing and searching. These keywords may be associated with standard boolean operators, where presence or absence in the text or text description is used as the truth value, or other oper ators indicating their proximity to one another in the text. Another emerging approach is the use of content or knowledge based indexing and retrieval. In this approach the text is not represented or treated as a collection keywords, rather its meaning or semantic content is abstracted and the meaning is used to search for the text desired. This approach may have several advantages over the standard keyword approach. Both precision and recall of the search may be improved, increasing the likelihood that relevant texts will be found while decreasing the probability of finding irrelevant ones. The knowl edge based approach may also allow more sophisticated query techniques, for instance queries based on the purpose for which the text will be used. This thesis will explore the possibility and usefulness of applying case based reasoning to the problem of text search and retrieval. An easy-to-use expert system for information retrieval that utilizes case-based reasoning to improve, over time, its capability to find those items that are relevant and useful, and only those items that are relevant and useful will be implemented. It will support formulation of a search in an intuitive manner that avoids complicated command syntax and occult operators. It will present retrieved docu ments to the user in a logical, useful way and will allow the user to easily refine his search criteria based on a selection of documents from his original results that he has judged to be good examples of what he is searching for
- …