624 research outputs found
Recovering Homography from Camera Captured Documents using Convolutional Neural Networks
Removing perspective distortion from hand held camera captured document
images is one of the primitive tasks in document analysis, but unfortunately,
no such method exists that can reliably remove the perspective distortion from
document images automatically. In this paper, we propose a convolutional neural
network based method for recovering homography from hand-held camera captured
documents.
Our proposed method works independent of document's underlying content and is
trained end-to-end in a fully automatic way. Specifically, this paper makes
following three contributions: Firstly, we introduce a large scale synthetic
dataset for recovering homography from documents images captured under
different geometric and photometric transformations; secondly, we show that a
generic convolutional neural network based architecture can be successfully
used for regressing the corners positions of documents captured under wild
settings; thirdly, we show that L1 loss can be reliably used for corners
regression. Our proposed method gives state-of-the-art performance on the
tested datasets, and has potential to become an integral part of document
analysis pipeline.Comment: 10 pages, 8 figure
Image Projective Transformation Rectification with Synthetic Data for Smartphone-captured Chest X-ray Photos Classification
Classification on smartphone-captured chest X-ray (CXR) photos to detect
pathologies is challenging due to the projective transformation caused by the
non-ideal camera position. Recently, various rectification methods have been
proposed for different photo rectification tasks such as document photos,
license plate photos, etc. Unfortunately, we found that none of them is
suitable for CXR photos, due to their specific transformation type, image
appearance, annotation type, etc. In this paper, we propose an innovative deep
learning-based Projective Transformation Rectification Network (PTRN) to
automatically rectify CXR photos by predicting the projective transformation
matrix. To the best of our knowledge, it is the first work to predict the
projective transformation matrix as the learning goal for photo rectification.
Additionally, to avoid the expensive collection of natural data, synthetic CXR
photos are generated under the consideration of natural perturbations, extra
screens, etc. We evaluate the proposed approach in the CheXphoto
smartphone-captured CXR photos classification competition hosted by the
Stanford University Machine Learning Group, our approach won first place with a
huge performance improvement (ours 0.850, second-best 0.762, in AUC). A deeper
study demonstrates that the use of PTRN successfully achieves the
classification performance on the spatially transformed CXR photos to the same
level as on the high-quality digital CXR images, indicating PTRN can eliminate
all negative impacts of projective transformation on the CXR photos
Unfolder: Fast localization and image rectification of a document with a crease from folding in half
Presentation of folded documents is not an uncommon case in modern society.
Digitizing such documents by capturing them with a smartphone camera can be
tricky since a crease can divide the document contents into separate planes. To
unfold the document, one could hold the edges potentially obscuring it in a
captured image. While there are many geometrical rectification methods, they
were usually developed for arbitrary bends and folds. We consider such
algorithms and propose a novel approach Unfolder developed specifically for
images of documents with a crease from folding in half. Unfolder is robust to
projective distortions of the document image and does not fragment the image in
the vicinity of a crease after rectification. A new Folded Document Images
dataset was created to investigate the rectification accuracy of folded (2, 3,
4, and 8 folds) documents. The dataset includes 1600 images captured when
document placed on a table and when held in hand. The Unfolder algorithm
allowed for a recognition error rate of 0.33, which is better than the advanced
neural network methods DocTr (0.44) and DewarpNet (0.57). The average runtime
for Unfolder was only 0.25 s/image on an iPhone XR.Comment: This is a preprint of the article accepted for publication in the
journal "Computer Optics
A Review of Recent Advances and Challenges in Grocery Label Detection and Recognition
When compared with traditional local shops where the customer has a personalised service,
in large retail departments, the client has to make his purchase decisions independently, mostly
supported by the information available in the package. Additionally, people are becoming more
aware of the importance of the food ingredients and demanding about the type of products they buy
and the information provided in the package, despite it often being hard to interpret. Big shops such
as supermarkets have also introduced important challenges for the retailer due to the large number
of different products in the store, heterogeneous affluence and the daily needs of item repositioning.
In this scenario, the automatic detection and recognition of products on the shelves or off the shelves
has gained increased interest as the application of these technologies may improve the shopping
experience through self-assisted shopping apps and autonomous shopping, or even benefit stock
management with real-time inventory, automatic shelf monitoring and product tracking. These
solutions can also have an important impact on customers with visual impairments. Despite recent
developments in computer vision, automatic grocery product recognition is still very challenging,
with most works focusing on the detection or recognition of a small number of products, often under
controlled conditions. This paper discusses the challenges related to this problem and presents a
review of proposed methods for retail product label processing, with a special focus on assisted
analysis for customer support, including for the visually impaired. Moreover, it details the public
datasets used in this topic and identifies their limitations, and discusses future research directions of
related fields.info:eu-repo/semantics/publishedVersio
Single-Image Depth Prediction Makes Feature Matching Easier
Good local features improve the robustness of many 3D re-localization and
multi-view reconstruction pipelines. The problem is that viewing angle and
distance severely impact the recognizability of a local feature. Attempts to
improve appearance invariance by choosing better local feature points or by
leveraging outside information, have come with pre-requisites that made some of
them impractical. In this paper, we propose a surprisingly effective
enhancement to local feature extraction, which improves matching. We show that
CNN-based depths inferred from single RGB images are quite helpful, despite
their flaws. They allow us to pre-warp images and rectify perspective
distortions, to significantly enhance SIFT and BRISK features, enabling more
good matches, even when cameras are looking at the same scene but in opposite
directions.Comment: 14 pages, 7 figures, accepted for publication at the European
conference on computer vision (ECCV) 202
Um modelo para suporte automatizado ao reconhecimento, extração, personalização e reconstrução de gráficos estáticos
Data charts are widely used in our daily lives, being present in regular media,
such as newspapers, magazines, web pages, books, and many others. A well constructed
data chart leads to an intuitive understanding of its underlying data
and in the same way, when data charts have wrong design choices, a redesign
of these representations might be needed. However, in most cases, these
charts are shown as a static image, which means that the original data are not
usually available. Therefore, automatic methods could be applied to extract the
underlying data from the chart images to allow these changes. The task of
recognizing charts and extracting data from them is complex, largely due to the
variety of chart types and their visual characteristics.
Computer Vision techniques for image classification and object detection are
widely used for the problem of recognizing charts, but only in images without
any disturbance. Other features in real-world images that can make this task
difficult are not present in most literature works, like photo distortions, noise,
alignment, etc. Two computer vision techniques that can assist this task and
have been little explored in this context are perspective detection and
correction. These methods transform a distorted and noisy chart in a clear
chart, with its type ready for data extraction or other uses. The task of
reconstructing data is straightforward, as long the data is available the
visualization can be reconstructed, but the scenario of reconstructing it on the
same context is complex.
Using a Visualization Grammar for this scenario is a key component, as these
grammars usually have extensions for interaction, chart layers, and multiple
views without requiring extra development effort.
This work presents a model for automated support for custom recognition, and
reconstruction of charts in images. The model automatically performs the
process steps, such as reverse engineering, turning a static chart back into its
data table for later reconstruction, while allowing the user to make modifications
in case of uncertainties. This work also features a model-based architecture
along with prototypes for various use cases. Validation is performed step by
step, with methods inspired by the literature. This work features three use
cases providing proof of concept and validation of the model.
The first use case features usage of chart recognition methods focused on
documents in the real-world, the second use case focus on vocalization of
charts, using a visualization grammar to reconstruct a chart in audio format,
and the third use case presents an Augmented Reality application that
recognizes and reconstructs charts in the same context (a piece of paper)
overlaying the new chart and interaction widgets. The results showed that with
slight changes, chart recognition and reconstruction methods are now ready for
real-world charts, when taking time, accuracy and precision into consideration.Os gráficos de dados são amplamente utilizados na nossa vida diária, estando
presentes nos meios de comunicação regulares, tais como jornais, revistas,
páginas web, livros, e muitos outros. Um gráfico bem construído leva a uma
compreensão intuitiva dos seus dados inerentes e da mesma forma, quando
os gráficos de dados têm escolhas de conceção erradas, poderá ser
necessário um redesenho destas representações. Contudo, na maioria dos
casos, estes gráficos são mostrados como uma imagem estática, o que
significa que os dados originais não estão normalmente disponíveis. Portanto,
poderiam ser aplicados métodos automáticos para extrair os dados inerentes
das imagens dos gráficos, a fim de permitir estas alterações. A tarefa de
reconhecer os gráficos e extrair dados dos mesmos é complexa, em grande
parte devido à variedade de tipos de gráficos e às suas características visuais.
As técnicas de Visão Computacional para classificação de imagens e deteção
de objetos são amplamente utilizadas para o problema de reconhecimento de
gráficos, mas apenas em imagens sem qualquer ruído. Outras características
das imagens do mundo real que podem dificultar esta tarefa não estão
presentes na maioria das obras literárias, como distorções fotográficas, ruído,
alinhamento, etc. Duas técnicas de visão computacional que podem ajudar
nesta tarefa e que têm sido pouco exploradas neste contexto são a deteção e
correção da perspetiva. Estes métodos transformam um gráfico distorcido e
ruidoso em um gráfico limpo, com o seu tipo pronto para extração de dados
ou outras utilizações. A tarefa de reconstrução de dados é simples, desde que
os dados estejam disponíveis a visualização pode ser reconstruída, mas o
cenário de reconstrução no mesmo contexto é complexo.
A utilização de uma Gramática de Visualização para este cenário é um
componente chave, uma vez que estas gramáticas têm normalmente
extensões para interação, camadas de gráficos, e visões múltiplas sem exigir
um esforço extra de desenvolvimento.
Este trabalho apresenta um modelo de suporte automatizado para o
reconhecimento personalizado, e reconstrução de gráficos em imagens
estáticas. O modelo executa automaticamente as etapas do processo, tais
como engenharia inversa, transformando um gráfico estático novamente na
sua tabela de dados para posterior reconstrução, ao mesmo tempo que
permite ao utilizador fazer modificações em caso de incertezas. Este trabalho
também apresenta uma arquitetura baseada em modelos, juntamente com
protótipos para vários casos de utilização. A validação é efetuada passo a
passo, com métodos inspirados na literatura. Este trabalho apresenta três
casos de uso, fornecendo prova de conceito e validação do modelo.
O primeiro caso de uso apresenta a utilização de métodos de reconhecimento
de gráficos focando em documentos no mundo real, o segundo caso de uso
centra-se na vocalização de gráficos, utilizando uma gramática de visualização
para reconstruir um gráfico em formato áudio, e o terceiro caso de uso
apresenta uma aplicação de Realidade Aumentada que reconhece e reconstrói
gráficos no mesmo contexto (um pedaço de papel) sobrepondo os novos
gráficos e widgets de interação. Os resultados mostraram que com pequenas
alterações, os métodos de reconhecimento e reconstrução dos gráficos estão
agora prontos para os gráficos do mundo real, tendo em consideração o
tempo, a acurácia e a precisão.Programa Doutoral em Engenharia Informátic
A Book Reader Design for Persons with Visual Impairment and Blindness
The objective of this dissertation is to provide a new design approach to a fully automated book reader for individuals with visual impairment and blindness that is portable and cost effective. This approach relies on the geometry of the design setup and provides the mathematical foundation for integrating, in a unique way, a 3-D space surface map from a low-resolution time of flight (ToF) device with a high-resolution image as means to enhance the reading accuracy of warped images due to the page curvature of bound books and other magazines. The merits of this low cost, but effective automated book reader design include: (1) a seamless registration process of the two imaging modalities so that the low resolution (160 x 120 pixels) height map, acquired by an Argos3D-P100 camera, accurately covers the entire book spread as captured by the high resolution image (3072 x 2304 pixels) of a Canon G6 Camera; (2) a mathematical framework for overcoming the difficulties associated with the curvature of open bound books, a process referred to as the dewarping of the book spread images, and (3) image correction performance comparison between uniform and full height map to determine which map provides the highest Optical Character Recognition (OCR) reading accuracy possible. The design concept could also be applied to address the challenging process of book digitization. This method is dependent on the geometry of the book reader setup for acquiring a 3-D map that yields high reading accuracy once appropriately fused with the high-resolution image. The experiments were performed on a dataset consisting of 200 pages with their corresponding computed and co-registered height maps, which are made available to the research community (cate-book3dmaps.fiu.edu). Improvements to the characters reading accuracy, due to the correction steps, were quantified and measured by introducing the corrected images to an OCR engine and tabulating the number of miss-recognized characters. Furthermore, the resilience of the book reader was tested by introducing a rotational misalignment to the book spreads and comparing the OCR accuracy to those obtained with the standard alignment. The standard alignment yielded an average reading accuracy of 95.55% with the uniform height map (i.e., the height values of the central row of the 3-D map are replicated to approximate all other rows), and 96.11% with the full height maps (i.e., each row has its own height values as obtained from the 3D camera). When the rotational misalignments were taken into account, the results obtained produced average accuracies of 90.63% and 94.75% for the same respective height maps, proving added resilience of the full height map method to potential misalignments
- …