33 research outputs found

    Deep learning for inverse problems in remote sensing: super-resolution and SAR despeckling

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Detecção de vivacidade de impressões digitais baseada em software

    Get PDF
    Orientador: Roberto de Alencar LotufoDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Com o uso crescente de sistemas de autenticação por biometria nos últimos anos, a detecção de impressões digitais falsas tem se tornado cada vez mais importante. Neste trabalho, nós implementamos e comparamos várias técnicas baseadas em software para detecção de vivacidade de impressões digitais. Utilizamos como extratores de características as redes convolucionais, que foram usadas pela primeira vez nesta área, e Local Binary Patterns (LBP). As técnicas foram usadas em conjunto com redução de dimensionalidade através da Análise de Componentes Principais (PCA) e um classificador Support Vector Machine (SVM). O aumento artificial de dados foi usado de forma bem sucedida para melhorar o desempenho do classificador. Testamos uma variedade de operações de pré-processamento, tais como filtragem em frequência, equalização de contraste e filtragem da região de interesse. Graças aos computadores de alto desempenho disponíveis como serviços em nuvem, foi possível realizar uma busca extensa e automática para encontrar a melhor combinação de operações de pré-processamento, arquiteturas e hiper-parâmetros. Os experimentos foram realizados nos conjuntos de dados usados nas competições Liveness Detection nos anos de 2009, 2011 e 2013, que juntos somam quase 50.000 imagens de impressões digitais falsas e verdadeiras. Nosso melhor método atinge uma taxa média de amostras classificadas corretamente de 95,2%, o que representa uma melhora de 59% na taxa de erro quando comparado com os melhores resultados publicados anteriormenteAbstract: With the growing use of biometric authentication systems in the past years, spoof fingerprint detection has become increasingly important. In this work, we implemented and compared various techniques for software-based fingerprint liveness detection. We use as feature extractors Convolutional Networks with random weights, which are applied for the first time for this task, and Local Binary Patterns. The techniques were used in conjunction with dimensionality reduction through Principal Component Analysis (PCA) and a Support Vector Machine (SVM) classifier. Dataset Augmentation was successfully used to increase classifier¿s performance. We tested a variety of preprocessing operations such as frequency filtering, contrast equalization, and region of interest filtering. An automatic and extensive search for the best combination of preprocessing operations, architectures and hyper-parameters was made, thanks to the fast computers available as cloud services. The experiments were made on the datasets used in The Liveness Detection Competition of years 2009, 2011 and 2013 that comprise almost 50,000 real and fake fingerprints¿ images. Our best method achieves an overall rate of 95.2% of correctly classified samples - an improvement of 59% in test error when compared with the best previously published resultsMestradoEnergia EletricaMestre em Engenharia Elétric

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    Banknote Authentication and Medical Image Diagnosis Using Feature Descriptors and Deep Learning Methods

    Get PDF
    Banknote recognition and medical image analysis have been the foci of image processing and pattern recognition research. As counterfeiters have taken advantage of the innovation in print media technologies for reproducing fake monies, hence the need to design systems which can reassure and protect citizens of the authenticity of banknotes in circulation. Similarly, many physicians must interpret medical images. But image analysis by humans is susceptible to error due to wide variations across interpreters, lethargy, and human subjectivity. Computer-aided diagnosis is vital to improvements in medical analysis, as they facilitate the identification of findings that need treatment and assist the expert’s workflow. Thus, this thesis is organized around three such problems related to Banknote Authentication and Medical Image Diagnosis. In our first research problem, we proposed a new banknote recognition approach that classifies the principal components of extracted HOG features. We further experimented on computing HOG descriptors from cells created from image patch vertices of SURF points and designed a feature reduction approach based on a high correlation and low variance filter. In our second research problem, we developed a mobile app for banknote identification and counterfeit detection using the Unity 3D software and evaluated its performance based on a Cascaded Ensemble approach. The algorithm was then extended to a client-server architecture using SIFT and SURF features reduced by Bag of Words and high correlation-based HOG vectors. In our third research problem, experiments were conducted on a pre-trained mobile app for medical image diagnosis using three convolutional layers with an Ensemble Classifier comprising PCA and bagging of five base learners. Also, we implemented a Bidirectional Generative Adversarial Network to mitigate the effect of the Binary Cross Entropy loss based on a Deep Convolutional Generative Adversarial Network as the generator and encoder with Capsule Network as the discriminator while experimenting on images with random composition and translation inferences. Lastly, we proposed a variant of the Single Image Super-resolution for medical analysis by redesigning the Super Resolution Generative Adversarial Network to increase the Peak Signal to Noise Ratio during image reconstruction by incorporating a loss function based on the mean square error of pixel space and Super Resolution Convolutional Neural Network layers

    Event-Based Algorithms For Geometric Computer Vision

    Get PDF
    Event cameras are novel bio-inspired sensors which mimic the function of the human retina. Rather than directly capturing intensities to form synchronous images as in traditional cameras, event cameras asynchronously detect changes in log image intensity. When such a change is detected at a given pixel, the change is immediately sent to the host computer, where each event consists of the x,y pixel position of the change, a timestamp, accurate to tens of microseconds, and a polarity, indicating whether the pixel got brighter or darker. These cameras provide a number of useful benefits over traditional cameras, including the ability to track extremely fast motions, high dynamic range, and low power consumption. However, with a new sensing modality comes the need to develop novel algorithms. As these cameras do not capture photometric intensities, novel loss functions must be developed to replace the photoconsistency assumption which serves as the backbone of many classical computer vision algorithms. In addition, the relative novelty of these sensors means that there does not exist the wealth of data available for traditional images with which we can train learning based methods such as deep neural networks. In this work, we address both of these issues with two foundational principles. First, we show that the motion blur induced when the events are projected into the 2D image plane can be used as a suitable substitute for the classical photometric loss function. Second, we develop self-supervised learning methods which allow us to train convolutional neural networks to estimate motion without any labeled training data. We apply these principles to solve classical perception problems such as feature tracking, visual inertial odometry, optical flow and stereo depth estimation, as well as recognition tasks such as object detection and human pose estimation. We show that these solutions are able to utilize the benefits of event cameras, allowing us to operate in fast moving scenes with challenging lighting which would be incredibly difficult for traditional cameras

    Model-based Optical Flow: Layers, Learning, and Geometry

    Get PDF
    The estimation of motion in video sequences establishes temporal correspondences between pixels and surfaces and allows reasoning about a scene using multiple frames. Despite being a focus of research for over three decades, computing motion, or optical flow, remains challenging due to a number of difficulties, including the treatment of motion discontinuities and occluded regions, and the integration of information from more than two frames. One reason for these issues is that most optical flow algorithms only reason about the motion of pixels on the image plane, while not taking the image formation pipeline or the 3D structure of the world into account. One approach to address this uses layered models, which represent the occlusion structure of a scene and provide an approximation to the geometry. The goal of this dissertation is to show ways to inject additional knowledge about the scene into layered methods, making them more robust, faster, and more accurate. First, this thesis demonstrates the modeling power of layers using the example of motion blur in videos, which is caused by fast motion relative to the exposure time of the camera. Layers segment the scene into regions that move coherently while preserving their occlusion relationships. The motion of each layer therefore directly determines its motion blur. At the same time, the layered model captures complex blur overlap effects at motion discontinuities. Using layers, we can thus formulate a generative model for blurred video sequences, and use this model to simultaneously deblur a video and compute accurate optical flow for highly dynamic scenes containing motion blur. Next, we consider the representation of the motion within layers. Since, in a layered model, important motion discontinuities are captured by the segmentation into layers, the flow within each layer varies smoothly and can be approximated using a low dimensional subspace. We show how this subspace can be learned from training data using principal component analysis (PCA), and that flow estimation using this subspace is computationally efficient. The combination of the layered model and the low-dimensional subspace gives the best of both worlds, sharp motion discontinuities from the layers and computational efficiency from the subspace. Lastly, we show how layered methods can be dramatically improved using simple semantics. Instead of treating all layers equally, a semantic segmentation divides the scene into its static parts and moving objects. Static parts of the scene constitute a large majority of what is shown in typical video sequences; yet, in such regions optical flow is fully constrained by the depth structure of the scene and the camera motion. After segmenting out moving objects, we consider only static regions, and explicitly reason about the structure of the scene and the camera motion, yielding much better optical flow estimates. Furthermore, computing the structure of the scene allows to better combine information from multiple frames, resulting in high accuracies even in occluded regions. For moving regions, we compute the flow using a generic optical flow method, and combine it with the flow computed for the static regions to obtain a full optical flow field. By combining layered models of the scene with reasoning about the dynamic behavior of the real, three-dimensional world, the methods presented herein push the envelope of optical flow computation in terms of robustness, speed, and accuracy, giving state-of-the-art results on benchmarks and pointing to important future research directions for the estimation of motion in natural scenes

    UNCOVERING PATTERNS IN COMPLEX DATA WITH RESERVOIR COMPUTING AND NETWORK ANALYTICS: A DYNAMICAL SYSTEMS APPROACH

    Get PDF
    In this thesis, we explore methods of uncovering underlying patterns in complex data, and making predictions, through machine learning and network science. With the availability of more data, machine learning for data analysis has advanced rapidly. However, there is a general lack of approaches that might allow us to 'open the black box'. In the machine learning part of this thesis, we primarily use an architecture called Reservoir Computing for time-series prediction and image classification, while exploring how information is encoded in the reservoir dynamics. First, we investigate the ways in which a Reservoir Computer (RC) learns concepts such as 'similar' and 'different', and relationships such as 'blurring', 'rotation' etc. between image pairs, and generalizes these concepts to different classes unseen during training. We observe that the high dimensional reservoir dynamics display different patterns for different relationships. This clustering allows RCs to perform significantly better in generalization with limited training compared with state-of-the-art pair-based convolutional/deep Siamese Neural Networks. Second, we demonstrate the utility of an RC in the separation of superimposed chaotic signals. We assume no knowledge of the dynamical equations that produce the signals, and require only that the training data consist of finite time samples of the component signals. We find that our method significantly outperforms the optimal linear solution to the separation problem, the Wiener filter. To understand how representations of signals are encoded in an RC during learning, we study its dynamical properties when trained to predict chaotic Lorenz signals. We do so by using a novel, mathematical fixed-point-finding technique called directional fibers. We find that, after training, the high dimensional RC dynamics includes fixed points that map to the known Lorenz fixed points, but the RC also has spurious fixed points, which are relevant to how its predictions break down. While machine learning is a useful data processing tool, its success often relies on a useful representation of the system's information. In contrast, systems with a large numbers of interacting components may be better analyzed by modeling them as networks. While numerous advances in network science have helped us analyze such systems, tools that identify properties on networks modeling multi-variate time-evolving data (such as disease data) are limited. We close this gap by introducing a novel data-driven, network-based Trajectory Profile Clustering (TPC) algorithm for 1) identification of disease subtypes and 2) early prediction of subtype/disease progression patterns. TPC identifies subtypes by clustering patients with similar disease trajectory profiles derived from bipartite patient-variable networks. Applying TPC to a Parkinson’s dataset, we identify 3 distinct subtypes. Additionally, we show that TPC predicts disease subtype 4 years in advance with 74% accuracy

    Generative Models for Inverse Imaging Problems

    Get PDF
    corecore