2,005 research outputs found

    Learning Visual Classifiers From Limited Labeled Images

    Get PDF
    Recognizing humans and their activities from images and video is one of the key goals of computer vision. While supervised learning algorithms like Support Vector Machines and Boosting have offered robust solutions, they require large amount of labeled data for good performance. It is often difficult to acquire large labeled datasets due to the significant human effort involved in data annotation. However, it is considerably easier to collect unlabeled data due to the availability of inexpensive cameras and large public databases like Flickr and YouTube. In this dissertation, we develop efficient machine learning techniques for visual classification from small amount of labeled training data by utilizing the structure in the testing data, labeled data in a different domain and unlabeled data. This dissertation has three main parts. In the first part of the dissertation, we consider how multiple noisy samples available during testing can be utilized to perform accurate visual classification. Such multiple samples are easily available in video-based recognition problem, which is commonly encountered in visual surveillance. Specifically, we study the problem of unconstrained human recognition from iris images. We develop a Sparse Representation-based selection and recognition scheme, which learns the underlying structure of clean images. This learned structure is utilized to develop a quality measure, and a quality-based fusion scheme is proposed to combine the varying evidence. Furthermore, we extend the method to incorporate privacy, an important requirement inpractical biometric applications, without significantly affecting the recognition performance. In the second part, we analyze the problem of utilizing labeled data in a different domain to aid visual classification. We consider the problem of shifts in acquisition conditions during training and testing, which is very common in iris biometrics. In particular, we study the sensor mismatch problem, where the training samples are acquired using a sensor much older than the one used for testing. We provide one of the first solutions to this problem, a kernel learning framework to adapt iris data collected from one sensor to another. Extensive evaluations on iris data from multiple sensors demonstrate that the proposed method leads to considerable improvement in cross sensor recognition accuracy. Furthermore, since the proposed technique requires minimal changes to the iris recognition pipeline, it can easily be incorporated into existing iris recognition systems. In the last part of the dissertation, we analyze how unlabeled data available during training can assist visual classification applications. Here, we consider still image-based vision applications involving humans, where explicit motion cues are not available. A human pose often conveys not only the configuration of the body parts, but also implicit predictive information about the ensuing motion. We propose a probabilistic framework to infer this dynamic information associated with a human pose, using unlabeled and unsegmented videos available during training. The inference problem is posed as a non-parametric density estimation problem on non-Euclidean manifolds. Since direct modeling is intractable, we develop a data driven approach, estimating the density for the test sample under consideration. Statistical inference on the estimated density provides us with quantities of interest like the most probable future motion of the human and the amount of motion informatio

    Generic multimodal biometric fusion

    Get PDF
    Biometric systems utilize physiological or behavioral traits to automatically identify individuals. A unimodal biometric system utilizes only one source of biometric information and suffers from a variety of problems such as noisy data, intra-class variations, restricted degrees of freedom, non-universality, spoof attacks and unacceptable error rates. Multimodal biometrics refers to a system which utilizes multiple biometric information sources and can overcome some of the limitation of unimodal system. Biometric information can be combined at 4 different levels: (i) Raw data level; (ii) Feature level; (iii) Match-score level; and (iv) Decision level. Match score fusion and decision fusion have received significant attention due to convenient information representation and raw data fusion is extremely challenging due to large diversity of representation. Feature level fusion provides a good trade-off between fusion complexity and loss of information due to subsequent processing. This work presents generic feature information fusion techniques for fusion of most of the commonly used feature representation schemes. A novel concept of Local Distance Kernels is introduced to transform the available information into an arbitrary common distance space where they can be easily fused together. Also, a new dynamic learnable noise removal scheme based on thresholding is used to remove shot noise in the distance vectors. Finally we propose the use of AdaBoost and Support Vector Machines for learning the fusion rules to obtain highly reliable final matching scores from the transformed local distance vectors. The integration of the proposed methods leads to large performance improvement over match-score or decision level fusion

    IRDO: Iris Recognition by Fusion of DTCWT and OLBP

    Get PDF
    Iris Biometric is a physiological trait of human beings. In this paper, we propose Iris an Recognition using Fusion of Dual Tree Complex Wavelet Transform (DTCWT) and Over Lapping Local Binary Pattern (OLBP) Features. An eye is preprocessed to extract the iris part and obtain the Region of Interest (ROI) area from an iris. The complex wavelet features are extracted for region from the Iris DTCWT. OLBP is further applied on ROI to generate features of magnitude coefficients. The resultant features are generated by fusing DTCWT and OLBP using arithmetic addition. The Euclidean Distance (ED) is used to compare test iris with database iris features to identify a person. It is observed that the values of Total Success Rate (TSR) and Equal Error Rate (EER) are better in the case of proposed IRDO compared to the state-of-the art technique

    Face recognition using statistical adapted local binary patterns.

    Get PDF
    Biometrics is the study of methods of recognizing humans based on their behavioral and physical characteristics or traits. Face recognition is one of the biometric modalities that received a great amount of attention from many researchers during the past few decades because of its potential applications in a variety of security domains. Face recognition however is not only concerned with recognizing human faces, but also with recognizing faces of non-biological entities or avatars. Fortunately, the need for secure and affordable virtual worlds is attracting the attention of many researchers who seek to find fast, automatic and reliable ways to identify virtual worlds’ avatars. In this work, I propose new techniques for recognizing avatar faces, which also can be applied to recognize human faces. Proposed methods are based mainly on a well-known and efficient local texture descriptor, Local Binary Pattern (LBP). I am applying different versions of LBP such as: Hierarchical Multi-scale Local Binary Patterns and Adaptive Local Binary Pattern with Directional Statistical Features in the wavelet space and discuss the effect of this application on the performance of each LBP version. In addition, I use a new version of LBP called Local Difference Pattern (LDP) with other well-known descriptors and classifiers to differentiate between human and avatar face images. The original LBP achieves high recognition rate if the tested images are pure but its performance gets worse if these images are corrupted by noise. To deal with this problem I propose a new definition to the original LBP in which the LBP descriptor will not threshold all the neighborhood pixel based on the central pixel value. A weight for each pixel in the neighborhood will be computed, a new value for each pixel will be calculated and then using simple statistical operations will be used to compute the new threshold, which will change automatically, based on the pixel’s values. This threshold can be applied with the original LBP or any other version of LBP and can be extended to work with Local Ternary Pattern (LTP) or any version of LTP to produce different versions of LTP for recognizing noisy avatar and human faces images
    • …
    corecore