19 research outputs found

    Fast and efficient palmprint identification of a small sample within a full image.

    Get PDF
    In some fields like forensic research, experts demand that a found sample of an individual can be matched with its full counterpart contained in a database. The found sample may present several characteristics that make this matching more difficult to perform, such as distortion and, most importantly, a very small size. Several solutions have been presented intending to solve this problem, however, big computational effort is required or low recognition rate is obtained. In this paper, we present a fast, simple, and efficient method to relate a small sample of a partial palmprint to a full one using elemental optimization processes and a voting mechanic. Experimentation shows that our method performs with a higher recognition rate than the state of the art method, when trying to identify palmprint samples with a radius as small as 2.64 cm

    Evaluation of the one-step Lumicyano™ used in the visualisation of fingermarks on fabrics

    Get PDF
    This study consisted of three parts to evaluate the performance of Lumicyano™ on a variety of fabrics. One part assessed the impact of dye percentage (8%, 9% and 10%) on visualisation of fingermark detail and luminescent brightness in split grab marks. A 9% dye produced the highest quality detail of grab impressions with least interference from background fluorescence. The second part investigated the optimal relative humidity (RH, 75-84%) for certain fabric types using Lumicyano on split, six-series depletion fingermarks. It was concluded that the recommended RH of 80% remained the ideal cyanoacrylate fuming environment. The final and third part of this study determined the impact of sequential addition of Basic Yellow 40 (BY40) on Lumicyano compared to traditional cyanoacrylate (CA) followed with BY40 application. Results from this study demonstrated that Lumicyano on its own developed fingermarks with superior quality to Lumicyano with sequential addition of BY40 or traditional cyanoacrylate followed by BY40. Inclusion of more fabrics, donors and longer ageing periods should be explored in future studies to determine what frameworks are best for certain types of fabrics

    Learning the Consensus of Multiple Correspondences between Data Structures

    Get PDF
    En aquesta tesi presentem un marc de treball per aprendre el consens donades múltiples correspondències. S'assumeix que les diferents parts involucrades han generat aquestes correspondències per separat, i el nostre sistema actua com un mecanisme que calibra diferents característiques i considera diferents paràmetres per aprendre les millors assignacions i així, conformar una correspondència amb la major precisió possible a costa d'un cost computacional raonable. Aquest marc de treball de consens és presentat en una forma gradual, començant pels desenvolupaments més bàsics que utilitzaven exclusivament conceptes ben definits o únicament un parell de correspondències, fins al model final que és capaç de considerar múltiples correspondències, amb la capacitat d'aprendre automàticament alguns paràmetres de ponderació. Cada pas d'aquest marc de treball és avaluat fent servir bases de dades de naturalesa variada per demostrar efectivament que és possible tractar diferents escenaris de matching. Addicionalment, dos avanços suplementaris relacionats amb correspondències es presenten en aquest treball. En primer lloc, una nova mètrica de distància per correspondències s'ha desenvolupat, la qual va derivar en una nova estratègia per a la cerca de mitjanes ponderades. En segon lloc, un marc de treball específicament dissenyat per a generar correspondències al camp del registre d'imatges s'ha modelat, on es considera que una de les imatges és una imatge completa, i l'altra és una mostra petita d'aquesta. La conclusió presenta noves percepcions de com el nostre marc de treball de consens pot ser millorada, i com els dos desenvolupaments paral·lels poden convergir amb el marc de treball de consens.En esta tesis presentamos un marco de trabajo para aprender el consenso dadas múltiples correspondencias. Se asume que las distintas partes involucradas han generado dichas correspondencias por separado, y nuestro sistema actúa como un mecanismo que calibra distintas características y considera diferentes parámetros para aprender las mejores asignaciones y así, conformar una correspondencia con la mayor precisión posible a expensas de un costo computacional razonable. El marco de trabajo de consenso es presentado en una forma gradual, comenzando por los acercamientos más básicos que utilizaban exclusivamente conceptos bien definidos o únicamente un par de correspondencias, hasta el modelo final que es capaz de considerar múltiples correspondencias, con la capacidad de aprender automáticamente algunos parámetros de ponderación. Cada paso de este marco de trabajo es evaluado usando bases de datos de naturaleza variada para demostrar efectivamente que es posible tratar diferentes escenarios de matching. Adicionalmente, dos avances suplementarios relacionados con correspondencias son presentados en este trabajo. En primer lugar, una nueva métrica de distancia para correspondencias ha sido desarrollada, la cual derivó en una nueva estrategia para la búsqueda de medias ponderadas. En segundo lugar, un marco de trabajo específicamente diseñado para generar correspondencias en el campo del registro de imágenes ha sido establecida, donde se considera que una de las imágenes es una imagen completa, y la otra es una muestra pequeña de ésta. La conclusión presenta nuevas percepciones de cómo nuestro marco de trabajo de consenso puede ser mejorada, y cómo los dos desarrollos paralelos pueden converger con éste.In this work, we present a framework to learn the consensus given multiple correspondences. It is assumed that the several parties involved have generated separately these correspondences, and our system acts as a mechanism that gauges several characteristics and considers different parameters to learn the best mappings and thus, conform a correspondence with the highest possible accuracy at the expense of a reasonable computational cost. The consensus framework is presented in a gradual form, starting from the most basic approaches that used exclusively well-known concepts or only two correspondences, until the final model which is able to consider multiple correspondences, with the capability of automatically learning some weighting parameters. Each step of the framework is evaluated using databases of varied nature to effectively demonstrate that it is capable to address different matching scenarios. In addition, two supplementary advances related on correspondences are presented in this work. Firstly, a new distance metric for correspondences has been developed, which lead to a new strategy for the weighted mean correspondence search. Secondly, a framework specifically designed for correspondence generation in the image registration field has been established, where it is considered that one of the images is a full image, and the other one is a small sample of it. The conclusion presents insights of how our consensus framework can be enhanced, and how these two parallel developments can converge with it

    Learning Multimodal Structures in Computer Vision

    Get PDF
    A phenomenon or event can be received from various kinds of detectors or under different conditions. Each such acquisition framework is a modality of the phenomenon. Due to the relation between the modalities of multimodal phenomena, a single modality cannot fully describe the event of interest. Since several modalities report on the same event introduces new challenges comparing to the case of exploiting each modality separately. We are interested in designing new algorithmic tools to apply sensor fusion techniques in the particular signal representation of sparse coding which is a favorite methodology in signal processing, machine learning and statistics to represent data. This coding scheme is based on a machine learning technique and has been demonstrated to be capable of representing many modalities like natural images. We will consider situations where we are not only interested in support of the model to be sparse, but also to reflect a-priorily known knowledge about the application in hand. Our goal is to extract a discriminative representation of the multimodal data that leads to easily finding its essential characteristics in the subsequent analysis step, e.g., regression and classification. To be more precise, sparse coding is about representing signals as linear combinations of a small number of bases from a dictionary. The idea is to learn a dictionary that encodes intrinsic properties of the multimodal data in a decomposition coefficient vector that is favorable towards the maximal discriminatory power. We carefully design a multimodal representation framework to learn discriminative feature representations by fully exploiting, the modality-shared which is the information shared by various modalities, and modality-specific which is the information content of each modality individually. Plus, it automatically learns the weights for various feature components in a data-driven scheme. In other words, the physical interpretation of our learning framework is to fully exploit the correlated characteristics of the available modalities, while at the same time leverage the modality-specific character of each modality and change their corresponding weights for different parts of the feature in recognition

    Multimodal Learning and Its Application to Mobile Active Authentication

    Get PDF
    Mobile devices are becoming increasingly popular due to their flexibility and convenience in managing personal information such as bank accounts, profiles and passwords. With the increasing use of mobile devices comes the issue of security as the loss of a smartphone would compromise the personal information of the user. Traditional methods for authenticating users on mobile devices are based on passwords or fingerprints. As long as mobile devices remain active, they do not incorporate any mechanisms for verifying if the user originally authenticated is still the user in control of the mobile device. Thus, unauthorized individuals may improperly obtain access to personal information of the user if a password is compromised or if a user does not exercise adequate vigilance after initial authentication on a device. To deal with this problem, active authentication systems have been proposed in which users are continuously monitored after the initial access to the mobile device. Active authentication systems can capture users' data (facial image data, screen touch data, motion data, etc) through sensors (camera, touch screen, accelerometer, etc), extract features from different sensors' data, build classification models and authenticate users via comparing additional sensor data against the models. Mobile active authentication can be viewed as one application of the more general problem, namely, multimodal classification. The idea of multimodal classification is to utilize multiple sources (modalities) measuring the same instance to improve the overall performance compared to using a single source (modality). Multimodal classification also arises in many computer vision tasks such as image classification, RGBD object classification and scene recognition. In this dissertation, we not only present methods and algorithms related to active authentication problems, but also propose multimodal recognition algorithms based on low-rank and joint sparse representations as well as multimodal metric learning algorithm to improve multimodal classification performance. The multimodal learning algorithms proposed in this dissertation make no assumption about the feature type or applications, thus they can be applied to various recognition tasks such as mobile active authentication, image classification and RGBD recognition. First, we study the mobile active authentication problem by exploiting a dataset consisting of 50 users' face captured by the phone's frontal camera and screen touch data sensed by the screen for evaluating active authentication algorithms developed under this research. The dataset is named as UMD Active Authentication (UMDAA) dataset. Details on data preprocessing and feature extraction for touch data and face data are described respectively. Second, we present an approach for active user authentication using screen touch gestures by building linear and kernelized dictionaries based on sparse representations and associated classifiers. Experiments using the screen touch data components of UMDAA dataset as well as two other publicly available screen touch datasets show that the dictionary-based classification method compares favorably to those discussed in the literature. Experiments done using screen touch data collected in three different sessions show a drop in performance when the training and test data come from different sessions. This suggests a need for applying domain adaptation methods to further improve the performance of the classifiers. Third, we propose a domain adaptive sparse representation-based classification method that learns projections of data in a space where the sparsity of data is maintained. We provide an efficient iterative procedure for solving the proposed optimization problem. One of the key features of the proposed method is that it is computationally efficient as learning is done in the lower-dimensional space. Various experiments on UMDAA dataset show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive domain adaptation algorithms. Fourth, we propose low-rank and joint sparse representations-based multimodal recognition. Our formulations can be viewed as generalized versions of multivariate low-rank and sparse regression, where sparse and low-rank representations across all the modalities are imposed. One of our methods takes into account coupling information within different modalities simultaneously by enforcing the common low-rank and joint sparse representation among each modality's observations. We also modify our formulations by including an occlusion term that is assumed to be sparse. The alternating direction method of multipliers is proposed to efficiently solve the proposed optimization problems. Extensive experiments on UMDAA dataset, WVU multimodal biometrics dataset and Pascal-Sentence image classification dataset show that that our methods provide better recognition performance than other feature-level fusion methods. Finally, we propose a hierarchical multimodal metric learning algorithm for multimodal data in order to improve multimodal classification performance. We design metric for each modality as a product of two matrices: one matrix is modality specific, the other is enforced to be shared by all the modalities. The modality specific projection matrices capture the varying characteristics exhibited by multiple modalities and the common projection matrix establishes the relationship of the distance metrics corresponding to multiple modalities. The learned metrics significantly improves classification accuracy and experimental results of tagged image classification problem as well as various RGBD recognition problems show that the proposed algorithm outperforms existing learning algorithms based on multiple metrics as well as other state-of-the-art approaches tested on these datasets. Furthermore, we make the proposed multimodal metric learning algorithm non-linear by using kernel methods

    Facial Analysis: Looking at Biometric Recognition and Genome-Wide Association

    Get PDF

    Subspace Representations and Learning for Visual Recognition

    Get PDF
    Pervasive and affordable sensor and storage technology enables the acquisition of an ever-rising amount of visual data. The ability to extract semantic information by interpreting, indexing and searching visual data is impacting domains such as surveillance, robotics, intelligence, human- computer interaction, navigation, healthcare, and several others. This further stimulates the investigation of automated extraction techniques that are more efficient, and robust against the many sources of noise affecting the already complex visual data, which is carrying the semantic information of interest. We address the problem by designing novel visual data representations, based on learning data subspace decompositions that are invariant against noise, while being informative for the task at hand. We use this guiding principle to tackle several visual recognition problems, including detection and recognition of human interactions from surveillance video, face recognition in unconstrained environments, and domain generalization for object recognition.;By interpreting visual data with a simple additive noise model, we consider the subspaces spanned by the model portion (model subspace) and the noise portion (variation subspace). We observe that decomposing the variation subspace against the model subspace gives rise to the so-called parity subspace. Decomposing the model subspace against the variation subspace instead gives rise to what we name invariant subspace. We extend the use of kernel techniques for the parity subspace. This enables modeling the highly non-linear temporal trajectories describing human behavior, and performing detection and recognition of human interactions. In addition, we introduce supervised low-rank matrix decomposition techniques for learning the invariant subspace for two other tasks. We learn invariant representations for face recognition from grossly corrupted images, and we learn object recognition classifiers that are invariant to the so-called domain bias.;Extensive experiments using the benchmark datasets publicly available for each of the three tasks, show that learning representations based on subspace decompositions invariant to the sources of noise lead to results comparable or better than the state-of-the-art

    SPARSE REPRESENTATION, DISCRIMINATIVE DICTIONARIES AND PROJECTIONS FOR VISUAL CLASSIFICATION

    Get PDF
    Developments in sensing and communication technologies have led to an explosion in the availability of visual data from multiple sources and modalities. Millions of cameras have been installed in buildings, streets, and airports around the world that are capable of capturing multimodal information such as light, depth, heat etc. These data are potentially a tremendous resource for building robust visual detectors and classifiers. However, the data are often large, mostly unlabeled and increasingly of mixed modality. To extract useful information from these heterogeneous data, one needs to exploit the underlying physical, geometrical or statistical structure across data modalities. For instance, in computer vision, the number of pixels in an image can be rather large, but most inference or representation models use only a few parameters to describe the appearance, geometry, and dynamics of a scene. This has motivated researchers to develop a number of techniques for finding a low-dimensional representation of a high-dimensional dataset. The dominant methodology for modeling and exploiting the low-dimensional structure in high dimensional data is sparse dictionary-based modeling. While discriminative dictionary learning have demonstrated tremendous success in computer vision applications, their performance is often limited by the amount and type of labeled data available for training. In this dissertation, we extend the sparse dictionary learning framework for weakly supervised learning problems such as semi-supervised learning, ambiguously labeled learning and Multiple Instance Learning (MIL). Furthermore, we present nonlinear extensions of these methods using the kernel trick. We also address the problem of choosing the optimal kernel for sparse representation-based classification using Multiple Kernel Learning (MKL) methods. Finally, in order to deal with heterogeneous multimodal data, we present a feature level fusion method based on quadratic programing. The dissertation has been divided into following four parts: 1) In the first part, we develop a discriminative non-linear dictionary learning technique which utilizes both labeled and unlabeled data for learning dictionaries. We compute a probability distribution over class labels for all the unlabeled samples which is updated together with dictionary and sparse coefficients. The algorithm is also extended for ambiguously labeled data when part of the data contains multiple labels for a training sample. 2) Using non-linear dictionaries, we present a multi-class Multiple Instance Learning (MIL) algorithm where the data is given in the form of bags. Each bag contains multiple samples, called instances, out of which at least one belongs to the class of the bag. We propose a noisy-OR model and a generalized mean-based optimization framework for learning the dictionaries in the feature space. The proposed method can be viewed as a generalized dictionary learning algorithm since it reduces to a novel discriminative dictionary learning framework when there is only one instance in each bag. 3) We propose a Multiple Kernel Learning (MKL) algorithm that is based on the Sparse Representation-based Classification (SRC) method. Taking advantage of the non-linear kernel SRC in efficiently representing the non-linearities in the high-dimensional feature space, we propose an MKL method based on the kernel alignment criteria. Our method uses a two step training method to learn the kernel weights and the sparse codes. At each iteration, the sparse codes are updated first while fixing the kernel mixing coefficients, and then the kernel mixing coefficients are updated while fixing the sparse codes. These two steps are repeated until a stopping criteria is met. 4) Finally, using a linear classification model, we study the problem of fusing information from multiple modalities. Many current recognition algorithms combine different modalities based on training accuracy but do not consider the possibility of noise at test time. We describe an algorithm that perturbs test features so that all modalities predict the same class. We enforce this perturbation to be as small as possible via a quadratic program (QP) for continuous features, and a mixed integer program (MIP) for binary features. To efficiently solve the MIP, we provide a greedy algorithm and empirically show that its solution is very close to that of a state-of-the-art MIP solver

    Computer Vision from Spatial-Multiplexing Cameras at Low Measurement Rates

    Get PDF
    abstract: In UAVs and parking lots, it is typical to first collect an enormous number of pixels using conventional imagers. This is followed by employment of expensive methods to compress by throwing away redundant data. Subsequently, the compressed data is transmitted to a ground station. The past decade has seen the emergence of novel imagers called spatial-multiplexing cameras, which offer compression at the sensing level itself by providing an arbitrary linear measurements of the scene instead of pixel-based sampling. In this dissertation, I discuss various approaches for effective information extraction from spatial-multiplexing measurements and present the trade-offs between reliability of the performance and computational/storage load of the system. In the first part, I present a reconstruction-free approach to high-level inference in computer vision, wherein I consider the specific case of activity analysis, and show that using correlation filters, one can perform effective action recognition and localization directly from a class of spatial-multiplexing cameras, called compressive cameras, even at very low measurement rates of 1\%. In the second part, I outline a deep learning based non-iterative and real-time algorithm to reconstruct images from compressively sensed (CS) measurements, which can outperform the traditional iterative CS reconstruction algorithms in terms of reconstruction quality and time complexity, especially at low measurement rates. To overcome the limitations of compressive cameras, which are operated with random measurements and not particularly tuned to any task, in the third part of the dissertation, I propose a method to design spatial-multiplexing measurements, which are tuned to facilitate the easy extraction of features that are useful in computer vision tasks like object tracking. The work presented in the dissertation provides sufficient evidence to high-level inference in computer vision at extremely low measurement rates, and hence allows us to think about the possibility of revamping the current day computer systems.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Roadmap on signal processing for next generation measurement systems

    Get PDF
    Signal processing is a fundamental component of almost any sensor-enabled system, with a wide range of applications across different scientific disciplines. Time series data, images, and video sequences comprise representative forms of signals that can be enhanced and analysed for information extraction and quantification. The recent advances in artificial intelligence and machine learning are shifting the research attention towards intelligent, data-driven, signal processing. This roadmap presents a critical overview of the state-of-the-art methods and applications aiming to highlight future challenges and research opportunities towards next generation measurement systems. It covers a broad spectrum of topics ranging from basic to industrial research, organized in concise thematic sections that reflect the trends and the impacts of current and future developments per research field. Furthermore, it offers guidance to researchers and funding agencies in identifying new prospects.AerodynamicsMicrowave Sensing, Signals & System
    corecore