198 research outputs found

    Vanishing Point Detection with Direct and Transposed Fast Hough Transform inside the neural network

    Get PDF
    In this paper, we suggest a new neural network architecture for vanishing point detection in images. The key element is the use of the direct and transposed Fast Hough Transforms separated by convolutional layer blocks with standard activation functions. It allows us to get the answer in the coordinates of the input image at the output of the network and thus to calculate the coordinates of the vanishing point by simply selecting the maximum. Besides, it was proved that calculation of the transposed Fast Hough Transform can be performed using the direct one. The use of integral operators enables the neural network to rely on global rectilinear features in the image, and so it is ideal for detecting vanishing points. To demonstrate the effectiveness of the proposed architecture, we use a set of images from a DVR and show its superiority over existing methods. Note, in addition, that the proposed neural network architecture essentially repeats the process of direct and back projection used, for example, in computed tomography.Comment: 9 pages, 9 figures, submitted to "Computer Optics"; extra experiment added, new theorem proof added, references added; typos correcte

    Weighted combination of per-frame recognition results for text recognition in a video stream

    Get PDF
    The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.This work is partially supported by the Russian Foundation for Basic Research (projects 17-29-03236 and 18-07-01387)

    MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream

    Get PDF
    A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses. The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.This work is partially supported by Russian Foundation for Basic Research (projects 17-29-03170 and 17-29-03370). All source images for MIDV-500 dataset are obtained from Wikimedia Commons. Author attributions for each source images are listed in the description table at ftp://smartengines.com/midv-500/documents.pdf

    MIDV-2019: Challenges of the modern mobile-based document OCR

    Full text link
    Recognition of identity documents using mobile devices has become a topic of a wide range of computer vision research. The portfolio of methods and algorithms for solving such tasks as face detection, document detection and rectification, text field recognition, and other, is growing, and the scarcity of datasets has become an important issue. One of the openly accessible datasets for evaluating such methods is MIDV-500, containing video clips of 50 identity document types in various conditions. However, the variability of capturing conditions in MIDV-500 did not address some of the key issues, mainly significant projective distortions and different lighting conditions. In this paper we present a MIDV-2019 dataset, containing video clips shot with modern high-resolution mobile cameras, with strong projective distortions and with low lighting conditions. The description of the added data is presented, and experimental baselines for text field recognition in different conditions. The dataset is available for download at ftp://smartengines.com/midv-500/extra/midv-2019/.Comment: 6 pages, 3 figures, 3 tables, 18 references, submitted and accepted to the 12th International Conference on Machine Vision (ICMV 2019

    X-ray tomography: the way from layer-by-layer radiography to computed tomography

    Get PDF
    The methods of X-ray computed tomography allow us to study the internal morphological structure of objects in a non-destructive way. The evolution of these methods is similar in many respects to the evolution of photography, where complex optics were replaced by mobile phone cameras, and the computers built into the phone took over the functions of high-quality image generation. X-ray tomography originated as a method of hardware non-invasive imaging of a certain internal cross-section of the human body. Today, thanks to the advanced reconstruction algorithms, a method makes it possible to reconstruct a digital 3D image of an object with a submicron resolution. In this article, we will analyze the tasks that the software part of the tomographic complex has to solve in addition to managing the process of data collection. The issues that are still considered open are also discussed. The relationship between the spatial resolution of the method, sensitivity and the radiation load is reviewed. An innovative approach to the organization of tomographic imaging, called β€œreconstruction with monitoring”, is described. This approach makes it possible to reduce the radiation load on the object by at least 2 – 3 times. In this work, we show that when X-ray computed tomography moves towards increasing the spatial resolution and reducing the radiation load, the software part of the method becomes increasingly important.This work was supported by Russian Foundation for Basic Research (Projects No.18-29-26033, 18-29-26020)

    HoughNet: neural network architecture for vanishing points detection

    Full text link
    In this paper we introduce a novel neural network architecture based on Fast Hough Transform layer. The layer of this type allows our neural network to accumulate features from linear areas across the entire image instead of local areas. We demonstrate its potential by solving the problem of vanishing points detection in the images of documents. Such problem occurs when dealing with camera shots of the documents in uncontrolled conditions. In this case, the document image can suffer several specific distortions including projective transform. To train our model, we use MIDV-500 dataset and provide testing results. The strong generalization ability of the suggested method is proven with its applying to a completely different ICDAR 2011 dewarping contest. In previously published papers considering these dataset authors measured the quality of vanishing point detection by counting correctly recognized words with open OCR engine Tesseract. To compare with them, we reproduce this experiment and show that our method outperforms the state-of-the-art result.Comment: 6 pages, 6 figures, 2 tables, 28 references, conferenc
    • …
    corecore