146 research outputs found

    Recovering Homography from Camera Captured Documents using Convolutional Neural Networks

    Get PDF
    Removing perspective distortion from hand held camera captured document images is one of the primitive tasks in document analysis, but unfortunately, no such method exists that can reliably remove the perspective distortion from document images automatically. In this paper, we propose a convolutional neural network based method for recovering homography from hand-held camera captured documents. Our proposed method works independent of document's underlying content and is trained end-to-end in a fully automatic way. Specifically, this paper makes following three contributions: Firstly, we introduce a large scale synthetic dataset for recovering homography from documents images captured under different geometric and photometric transformations; secondly, we show that a generic convolutional neural network based architecture can be successfully used for regressing the corners positions of documents captured under wild settings; thirdly, we show that L1 loss can be reliably used for corners regression. Our proposed method gives state-of-the-art performance on the tested datasets, and has potential to become an integral part of document analysis pipeline.Comment: 10 pages, 8 figure

    A Robust Algorithm for Emoji Detection in Smartphone Screenshot Images

    Get PDF
    The increasing use of smartphones and social media apps for communication results in a massive number of screenshot images. These images enrich the written language through text and emojis. In this regard, several studies in the image analysis field have considered text. However, they ignored the use of emojis. In this study, a robust two-stage algorithm for detecting emojis in screenshot images is proposed. The first stage localizes the regions of candidate emojis by using the proposed RGB-channel analysis method followed by a connected component method with a set of proposed rules. In the second verification stage, each of the emojis and non-emojis are classified by using proposed features with a decision tree classifier. Experiments were conducted to evaluate each stage independently and assess the performance of the proposed algorithm completely by using a self-collected dataset. The results showed that the proposed RGB-channel analysis method achieved better performance than the Niblack and Sauvola methods. Moreover, the proposed feature extraction method with decision tree classifier achieved more satisfactory performance than the LBP feature extraction method with all Bayesian network, perceptron neural network, and decision table rules. Overall, the proposed algorithm exhibited high efficiency in detecting emojis in screenshot images

    A Book Reader Design for Persons with Visual Impairment and Blindness

    Get PDF
    The objective of this dissertation is to provide a new design approach to a fully automated book reader for individuals with visual impairment and blindness that is portable and cost effective. This approach relies on the geometry of the design setup and provides the mathematical foundation for integrating, in a unique way, a 3-D space surface map from a low-resolution time of flight (ToF) device with a high-resolution image as means to enhance the reading accuracy of warped images due to the page curvature of bound books and other magazines. The merits of this low cost, but effective automated book reader design include: (1) a seamless registration process of the two imaging modalities so that the low resolution (160 x 120 pixels) height map, acquired by an Argos3D-P100 camera, accurately covers the entire book spread as captured by the high resolution image (3072 x 2304 pixels) of a Canon G6 Camera; (2) a mathematical framework for overcoming the difficulties associated with the curvature of open bound books, a process referred to as the dewarping of the book spread images, and (3) image correction performance comparison between uniform and full height map to determine which map provides the highest Optical Character Recognition (OCR) reading accuracy possible. The design concept could also be applied to address the challenging process of book digitization. This method is dependent on the geometry of the book reader setup for acquiring a 3-D map that yields high reading accuracy once appropriately fused with the high-resolution image. The experiments were performed on a dataset consisting of 200 pages with their corresponding computed and co-registered height maps, which are made available to the research community (cate-book3dmaps.fiu.edu). Improvements to the characters reading accuracy, due to the correction steps, were quantified and measured by introducing the corrected images to an OCR engine and tabulating the number of miss-recognized characters. Furthermore, the resilience of the book reader was tested by introducing a rotational misalignment to the book spreads and comparing the OCR accuracy to those obtained with the standard alignment. The standard alignment yielded an average reading accuracy of 95.55% with the uniform height map (i.e., the height values of the central row of the 3-D map are replicated to approximate all other rows), and 96.11% with the full height maps (i.e., each row has its own height values as obtained from the 3D camera). When the rotational misalignments were taken into account, the results obtained produced average accuracies of 90.63% and 94.75% for the same respective height maps, proving added resilience of the full height map method to potential misalignments

    Error Modeling and Analysis of Star Cameras for a Class of 1U Spacecraft

    Get PDF
    As spacecraft today become increasingly smaller, the demand for smaller components and sensors rises as well. The smartphone, a cutting edge consumer technology, has impressive collections of both sensors and processing capabilities and may have the potential to fill this demand in the spacecraft market. If the technologies of a smartphone can be used in space, the cost of building miniature satellites would drop significantly and give a boost to the aerospace and scientific communities.Concentrating on the problem of spacecraft orientation, this study sets ground to determine the capabilities of a smartphone camera when acting as a star camera. Orientations determined from star images taken from a smartphone camera are compared to those of higher quality cameras in order to determine the associated accuracies. The results of the study reveal the abilities of low-cost off-the-shelf imagers in space and give a starting point for future research in the field.The study began with a complete geometric calibration of each analyzed imager such that all comparisons start from the same base. After the cameras were calibrated, image processing techniques were introduced to correct for atmospheric, lens, and image sensor effects. Orientations for each test image are calculated through methods of identifying the stars exposed on each image. Analyses of these orientations allow the overall errors of each camera to be defined and provide insight into the abilities of low-cost imagers

    Image Stitching

    Get PDF
    Projecte final de carrera fet en col.laboració amb University of Limerick. Department of Electronic and Computer EngineeringEnglish: Image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or, a set of characteristics or parameters related to the image. Most image processing techniques involve treating the image as a two-dimensional signal and applying standard signal processing techniques to it. Specifically, image stitching presents different stages to render two or more overlapping images into a seamless stitched image, from the detection of features to blending in a final image. In this process, Scale Invariant Feature Transform (SIFT) algorithm can be applied to perform the detection and matching control points step, due to its good properties. The process of create an automatic and effective whole stitching process leads to analyze different methods of the stitching stages. Several commercial and online software tools are available to perform the stitching process, offering diverse options in different situations. This analysis involves the creation of a script to deal with images and project data files. Once the whole script is generated, the stitching process is able to achieve an automatic execution allowing good quality results in the final composite image.Castellano: Procesado de imagen es cualquier tipo de procesado de señal en aquel que la entrada es una imagen, como una fotografía o fotograma de video; la salida puede ser una imagen o conjunto de características y parámetros relacionados con la imagen. Muchas de las técnicas de procesado de imagen implican un tratamiento de la imagen como señal en dos dimensiones, y para ello se aplican técnicas estándar de procesado de señal. Concretamente, la costura o unión de imágenes presenta diferentes etapas para unir dos o más imágenes superpuestas en una imagen perfecta sin costuras, desde la detección de puntos clave en las imágenes hasta su mezcla en la imagen final. En este proceso, el algoritmo Scale Invariant Feature Transform (SIFT) puede ser aplicado para desarrollar la fase de detección y selección de correspondencias entre imágenes debido a sus buenas cualidades. El desarrollo de la creación de un completo proceso de costura automático y efectivo, pasa por analizar diferentes métodos de las etapas del cosido de las imágenes. Varios software comerciales y gratuitos son capaces de llevar a cabo el proceso de costura, ofreciendo diferentes alternativas en distintas situaciones. Este análisis implica la creación de una secuencia de comandos que trabaja con las imágenes y con archivos de datos del proyecto generado. Una vez esta secuencia es creada, el proceso de cosido de imágenes es capaz de lograr una ejecución automática permitiendo unos resultados de calidad en la imagen final.Català: Processament d'imatge és qualsevol tipus de processat de senyal en aquell que l'entrada és una imatge, com una fotografia o fotograma de vídeo, i la sortida pot ser una imatge o conjunt de característiques i paràmetres relacionats amb la imatge. Moltes de les tècniques de processat d'imatge impliquen un tractament de la imatge com a senyal en dues dimensions, i per això s'apliquen tècniques estàndard de processament de senyal. Concretament, la costura o unió d'imatges presenta diferents etapes per unir dues o més imatges superposades en una imatge perfecta sense costures, des de la detecció de punts clau en les imatges fins a la seva barreja en la imatge final. En aquest procés, l'algoritme Scale Invariant Feature Transform (SIFT) pot ser aplicat per desenvolupar la fase de detecció i selecció de correspondències entre imatges a causa de les seves bones qualitats. El desenvolupament de la creació d'un complet procés de costura automàtic i efectiu, passa per analitzar diferents mètodes de les etapes del cosit de les imatges. Diversos programari comercials i gratuïts són capaços de dur a terme el procés de costura, oferint diferents alternatives en diverses situacions. Aquesta anàlisi implica la creació d'una seqüència de commandes que treballa amb les imatges i amb arxius de dades del projecte generat. Un cop aquesta seqüència és creada, el procés de cosit d'imatges és capaç d'aconseguir una execució automàtica permetent uns resultats de qualitat en la imatge final

    Single-Image Depth Prediction Makes Feature Matching Easier

    Get PDF
    Good local features improve the robustness of many 3D re-localization and multi-view reconstruction pipelines. The problem is that viewing angle and distance severely impact the recognizability of a local feature. Attempts to improve appearance invariance by choosing better local feature points or by leveraging outside information, have come with pre-requisites that made some of them impractical. In this paper, we propose a surprisingly effective enhancement to local feature extraction, which improves matching. We show that CNN-based depths inferred from single RGB images are quite helpful, despite their flaws. They allow us to pre-warp images and rectify perspective distortions, to significantly enhance SIFT and BRISK features, enabling more good matches, even when cameras are looking at the same scene but in opposite directions.Comment: 14 pages, 7 figures, accepted for publication at the European conference on computer vision (ECCV) 202

    Multimedia Forensics

    Get PDF
    This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field

    Orbit Determination with Event-Based Cameras to Improve Space Domain Awareness

    Get PDF
    The objective of this research is to assess the utility of a COTS EBC for SDA applications by evaluating its ability to produce data for orbit updates of resident space objects. Unlike traditional frame-based imaging sensors, the pixels on an EBC activate independently when a change in brightness is detected to produce a continuous data flow on a per pixel basis. This unique functionality provides much higher temporal resolution than traditional frame-based sensors, such that an EBC can generate far more data points from a single observation than a frame-based sensor. However, current COTS EBCs have less spatial resolution than current COTS frame-based sensors, and no research has yet investigated whether the increased volume of data from an EBC can compensate for the lack of spatial resolution of each data point. Using a beamsplitter to provide equal data to an EBC and a frame-based sensor for observations of multiple RSOs, this research found that the volume of data produced by an EBC can compensate for the EBC\u27s reduced spatial resolution to generate orbit updates of comparable accuracy to those produced by data from a frame-based sensor. This is especially true for single pass orbit updates, where the EBC provided a more accurate update than the frame-based sensor in 13 out of 14 cases
    • …
    corecore