Search CORE

123 research outputs found

Processing Camera-captured Document Images: Geometric Rectification, Mosaicing, and Layout Structure Recognition

Author: Liang Jian
Publication venue
Publication date: 12/06/2006
Field of study

This dissertation explores three topics: 1) geometric rectification of cameracaptured document images, 2) camera-captured document mosaicing, and 3) layout structure recognition. The first two topics pertain to camera-based document image analysis, a new trend within the OCR community. Compared to typical scanners,cameras offer convenient, flexible, portable, and non-contact image capture, which enables many new applications and breathes new life into existing ones. The third topic is related to the need for efficient metadata extraction methods, critical for managing digitized documents. The kernel of our geometric rectification framework is a novel method for estimating document shape from a single camera-captured image. Our method uses texture flows detected in printed text areas and is insensitive to occlusion. Classification of planar versus curved documents is done automatically. For planar pages, we obtain full metric rectification. For curved pages, we estimate a planar-strip approximation based on properties of developable surfaces. Our method can process any planar or smoothly curved document captured from an arbitrary position without requiring 3D data, metric data, or camera calibration. For the second topic, we design a novel registration method for document images, which produces good results in difficult situations including large displacements, severe projective distortion, small overlapping areas, and lack of distinguishable feature points. We implement a selective image composition method that outperforms conventional image blending methods in overlapping areas. It eliminates double images caused by mis-registration and preserves the sharpness in overlapping areas. We solve the third topic with a graph-based model matching framework. Layout structures are modeled by graphs, which integrate local and global features and are extensible to new features in the future. Our model can handle large variation within a class and subtle differences between classes. Through graph matching, the layout structure of a document is discovered. Our layout structure recognition technique accomplishes document classification and logical component labeling at the same time. Our model learning method enables a model to adapt to changes in classes over time

Digital Repository at the University of Maryland

Recommended from our members

Fast embedding for image classification & retrieval and its application to the hostel industry

Author: Ammatmanee Chanattra
Publication venue: Brunel University London
Publication date: 01/01/2022
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonContent-based image classification and retrieval are the automatic processes of taking an unseen image input and extracting its features representing the input image. Then, for the classification task, this mathematically measured input is categorized according to established criteria in the server and consequently shows the output as a result. On the other hand, for the retrieval task, the extracted features of an unseen query image are sent to the server to search for the most visually similar images to a given image and retrieve these images as a result. Despite image features could be represented by classical features, artificial intelligence-based features, Convolutional Neural Networks (CNN) to be precise, have become powerful tools in the field. Nonetheless, the high dimensional CNN features have been a challenge in particular for applications on mobile or Internet of Things devices. Therefore, in this thesis, several fast embeddings are explored and proposed to overcome the constraints of low memory, bandwidth, and power. Furthermore, the first hostel image database is created with three datasets, hostel image dataset containing 13,908 interior and exterior images of hostels across the world, and Hostels-900 dataset and Hostels-2K dataset containing 972 images and 2,380 images, respectively, of 20 London hostel buildings. The results demonstrate that the proposed fast embeddings such as the application of GHM-Rand operator, GHM-Fix operator, and binary feature vectors are able to outperform or give competitive results to those state-of-the-art methods with a lot less computational resource. Additionally, the findings from a ten-year literature review of CBIR study in the tourism industry could picturize the relevant research activities in the past decade which are not only beneficial to the hostel industry or tourism sector but also to the computer science and engineering research communities for the potential real-life applications of the existing and developing technologies in the field

Brunel University Research Archive

A survey of face detection, extraction and recognition

Author: Lu Yongzhong
Yu Shengsheng
Zhou Jingli
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 20/02/2012
Field of study

The goal of this paper is to present a critical survey of existing literatures on human face recognition over the last 4-5 years. Interest and research activities in face recognition have increased significantly over the past few years, especially after the American airliner tragedy on September 11 in 2001. While this growth largely is driven by growing application demands, such as static matching of controlled photographs as in mug shots matching, credit card verification to surveillance video images, identification for law enforcement and authentication for banking and security system access, advances in signal analysis techniques, such as wavelets and neural networks, are also important catalysts. As the number of proposed techniques increases, survey and evaluation becomes important

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Knowledge based text indexing and retrieval utilizing case based reasoning

Author: Mick Alan
Publication venue: RIT Scholar Works
Publication date: 01/01/1994
Field of study

Information retrieval systems for documents normally rely on the use of keywords that describe the text in some fashion or another, or are contained in the text itself, for indexing and searching. These keywords may be associated with standard boolean operators, where presence or absence in the text or text description is used as the truth value, or other oper ators indicating their proximity to one another in the text. Another emerging approach is the use of content or knowledge based indexing and retrieval. In this approach the text is not represented or treated as a collection keywords, rather its meaning or semantic content is abstracted and the meaning is used to search for the text desired. This approach may have several advantages over the standard keyword approach. Both precision and recall of the search may be improved, increasing the likelihood that relevant texts will be found while decreasing the probability of finding irrelevant ones. The knowl edge based approach may also allow more sophisticated query techniques, for instance queries based on the purpose for which the text will be used. This thesis will explore the possibility and usefulness of applying case based reasoning to the problem of text search and retrieval. An easy-to-use expert system for information retrieval that utilizes case-based reasoning to improve, over time, its capability to find those items that are relevant and useful, and only those items that are relevant and useful will be implemented. It will support formulation of a search in an intuitive manner that avoids complicated command syntax and occult operators. It will present retrieved docu ments to the user in a logical, useful way and will allow the user to easily refine his search criteria based on a selection of documents from his original results that he has judged to be good examples of what he is searching for

RIT Scholar Works