48 research outputs found
Vanishing Point Detection with Direct and Transposed Fast Hough Transform inside the neural network
In this paper, we suggest a new neural network architecture for vanishing
point detection in images. The key element is the use of the direct and
transposed Fast Hough Transforms separated by convolutional layer blocks with
standard activation functions. It allows us to get the answer in the
coordinates of the input image at the output of the network and thus to
calculate the coordinates of the vanishing point by simply selecting the
maximum. Besides, it was proved that calculation of the transposed Fast Hough
Transform can be performed using the direct one. The use of integral operators
enables the neural network to rely on global rectilinear features in the image,
and so it is ideal for detecting vanishing points. To demonstrate the
effectiveness of the proposed architecture, we use a set of images from a DVR
and show its superiority over existing methods. Note, in addition, that the
proposed neural network architecture essentially repeats the process of direct
and back projection used, for example, in computed tomography.Comment: 9 pages, 9 figures, submitted to "Computer Optics"; extra experiment
added, new theorem proof added, references added; typos correcte
Advanced Hough-based method for on-device document localization
The demand for on-device document recognition systems increases in
conjunction with the emergence of more strict privacy and security
requirements. In such systems, there is no data transfer from the end device to
a third-party information processing servers. The response time is vital to the
user experience of on-device document recognition. Combined with the
unavailability of discrete GPUs, powerful CPUs, or a large RAM capacity on
consumer-grade end devices such as smartphones, the time limitations put
significant constraints on the computational complexity of the applied
algorithms for on-device execution.
In this work, we consider document location in an image without prior
knowledge of the document content or its internal structure. In accordance with
the published works, at least 5 systems offer solutions for on-device document
location. All these systems use a location method which can be considered
Hough-based. The precision of such systems seems to be lower than that of the
state-of-the-art solutions which were not designed to account for the limited
computational resources.
We propose an advanced Hough-based method. In contrast with other approaches,
it accounts for the geometric invariants of the central projection model and
combines both edge and color features for document boundary detection. The
proposed method allowed for the second best result for SmartDoc dataset in
terms of precision, surpassed by U-net like neural network. When evaluated on a
more challenging MIDV-500 dataset, the proposed algorithm guaranteed the best
precision compared to published methods. Our method retained the applicability
to on-device computations.Comment: This is a preprint of the article submitted for publication in the
journal "Computer Optics
MIDV-2019: Challenges of the modern mobile-based document OCR
Recognition of identity documents using mobile devices has become a topic of
a wide range of computer vision research. The portfolio of methods and
algorithms for solving such tasks as face detection, document detection and
rectification, text field recognition, and other, is growing, and the scarcity
of datasets has become an important issue. One of the openly accessible
datasets for evaluating such methods is MIDV-500, containing video clips of 50
identity document types in various conditions. However, the variability of
capturing conditions in MIDV-500 did not address some of the key issues, mainly
significant projective distortions and different lighting conditions. In this
paper we present a MIDV-2019 dataset, containing video clips shot with modern
high-resolution mobile cameras, with strong projective distortions and with low
lighting conditions. The description of the added data is presented, and
experimental baselines for text field recognition in different conditions. The
dataset is available for download at
ftp://smartengines.com/midv-500/extra/midv-2019/.Comment: 6 pages, 3 figures, 3 tables, 18 references, submitted and accepted
to the 12th International Conference on Machine Vision (ICMV 2019
Unfolder: Fast localization and image rectification of a document with a crease from folding in half
Presentation of folded documents is not an uncommon case in modern society.
Digitizing such documents by capturing them with a smartphone camera can be
tricky since a crease can divide the document contents into separate planes. To
unfold the document, one could hold the edges potentially obscuring it in a
captured image. While there are many geometrical rectification methods, they
were usually developed for arbitrary bends and folds. We consider such
algorithms and propose a novel approach Unfolder developed specifically for
images of documents with a crease from folding in half. Unfolder is robust to
projective distortions of the document image and does not fragment the image in
the vicinity of a crease after rectification. A new Folded Document Images
dataset was created to investigate the rectification accuracy of folded (2, 3,
4, and 8 folds) documents. The dataset includes 1600 images captured when
document placed on a table and when held in hand. The Unfolder algorithm
allowed for a recognition error rate of 0.33, which is better than the advanced
neural network methods DocTr (0.44) and DewarpNet (0.57). The average runtime
for Unfolder was only 0.25 s/image on an iPhone XR.Comment: This is a preprint of the article accepted for publication in the
journal "Computer Optics