8 research outputs found

    Investigation of robust gait recognition for different appearances and camera view angles

    Get PDF
    A gait recognition framework is proposed to tackle the challenge of unknown camera view angles as well as appearance changes in gait recognition. In the framework, camera view angles are firstly identified before gait recognition. Two compact images, gait energy image (GEI) and gait modified Gaussian image (GMGI), are used as the base gait feature images. Histogram of oriented gradients (HOG) is applied to the base gait feature images to generate feature descriptors, and then a final feature map after principal component analysis (PCA) operations on the descriptors are used to train support vector machine (SVM) models for individuals. A set of experiments are conducted on CASIA gait database B to investigate how appearance changes and unknown view angles affect the gait recognition accuracy under the proposed framework. The experimental results have shown that the framework is robust in dealing with unknown camera view angles, as well as appearance changes in gait recognition. In the unknown view angle testing, the recognition accuracy matches that of identical view angle testing in gait recognition. The proposed framework is specifically applicable in personal identification by gait in a small company/organization, where unintrusive personal identification is needed

    Dimensionality reduction and sparse representations in computer vision

    Get PDF
    The proliferation of camera equipped devices, such as netbooks, smartphones and game stations, has led to a significant increase in the production of visual content. This visual information could be used for understanding the environment and offering a natural interface between the users and their surroundings. However, the massive amounts of data and the high computational cost associated with them, encumbers the transfer of sophisticated vision algorithms to real life systems, especially ones that exhibit resource limitations such as restrictions in available memory, processing power and bandwidth. One approach for tackling these issues is to generate compact and descriptive representations of image data by exploiting inherent redundancies. We propose the investigation of dimensionality reduction and sparse representations in order to accomplish this task. In dimensionality reduction, the aim is to reduce the dimensions of the space where image data reside in order to allow resource constrained systems to handle them and, ideally, provide a more insightful description. This goal is achieved by exploiting the inherent redundancies that many classes of images, such as faces under different illumination conditions and objects from different viewpoints, exhibit. We explore the description of natural images by low dimensional non-linear models called image manifolds and investigate the performance of computer vision tasks such as recognition and classification using these low dimensional models. In addition to dimensionality reduction, we study a novel approach in representing images as a sparse linear combination of dictionary examples. We investigate how sparse image representations can be used for a variety of tasks including low level image modeling and higher level semantic information extraction. Using tools from dimensionality reduction and sparse representation, we propose the application of these methods in three hierarchical image layers, namely low-level features, mid-level structures and high-level attributes. Low level features are image descriptors that can be extracted directly from the raw image pixels and include pixel intensities, histograms, and gradients. In the first part of this work, we explore how various techniques in dimensionality reduction, ranging from traditional image compression to the recently proposed Random Projections method, affect the performance of computer vision algorithms such as face detection and face recognition. In addition, we discuss a method that is able to increase the spatial resolution of a single image, without using any training examples, according to the sparse representations framework. In the second part, we explore mid-level structures, including image manifolds and sparse models, produced by abstracting information from low-level features and offer compact modeling of high dimensional data. We propose novel techniques for generating more descriptive image representations and investigate their application in face recognition and object tracking. In the third part of this work, we propose the investigation of a novel framework for representing the semantic contents of images. This framework employs high level semantic attributes that aim to bridge the gap between the visual information of an image and its textual description by utilizing low level features and mid level structures. This innovative paradigm offers revolutionary possibilities including recognizing the category of an object from purely textual information without providing any explicit visual example

    Contribution to supervised representation learning: algorithms and applications.

    Get PDF
    278 p.In this thesis, we focus on supervised learning methods for pattern categorization. In this context, itremains a major challenge to establish efficient relationships between the discriminant properties of theextracted features and the inter-class sparsity structure.Our first attempt to address this problem was to develop a method called "Robust Discriminant Analysiswith Feature Selection and Inter-class Sparsity" (RDA_FSIS). This method performs feature selectionand extraction simultaneously. The targeted projection transformation focuses on the most discriminativeoriginal features while guaranteeing that the extracted (or transformed) features belonging to the sameclass share a common sparse structure, which contributes to small intra-class distances.In a further study on this approach, some improvements have been introduced in terms of theoptimization criterion and the applied optimization process. In fact, we proposed an improved version ofthe original RDA_FSIS called "Enhanced Discriminant Analysis with Class Sparsity using GradientMethod" (EDA_CS). The basic improvement is twofold: on the first hand, in the alternatingoptimization, we update the linear transformation and tune it with the gradient descent method, resultingin a more efficient and less complex solution than the closed form adopted in RDA_FSIS.On the other hand, the method could be used as a fine-tuning technique for many feature extractionmethods. The main feature of this approach lies in the fact that it is a gradient descent based refinementapplied to a closed form solution. This makes it suitable for combining several extraction methods andcan thus improve the performance of the classification process.In accordance with the above methods, we proposed a hybrid linear feature extraction scheme called"feature extraction using gradient descent with hybrid initialization" (FE_GD_HI). This method, basedon a unified criterion, was able to take advantage of several powerful linear discriminant methods. Thelinear transformation is computed using a descent gradient method. The strength of this approach is thatit is generic in the sense that it allows fine tuning of the hybrid solution provided by different methods.Finally, we proposed a new efficient ensemble learning approach that aims to estimate an improved datarepresentation. The proposed method is called "ICS Based Ensemble Learning for Image Classification"(EM_ICS). Instead of using multiple classifiers on the transformed features, we aim to estimate multipleextracted feature subsets. These were obtained by multiple learned linear embeddings. Multiple featuresubsets were used to estimate the transformations, which were ranked using multiple feature selectiontechniques. The derived extracted feature subsets were concatenated into a single data representationvector with strong discriminative properties.Experiments conducted on various benchmark datasets ranging from face images, handwritten digitimages, object images to text datasets showed promising results that outperformed the existing state-ofthe-art and competing methods

    Learning Visual Classifiers From Limited Labeled Images

    Get PDF
    Recognizing humans and their activities from images and video is one of the key goals of computer vision. While supervised learning algorithms like Support Vector Machines and Boosting have offered robust solutions, they require large amount of labeled data for good performance. It is often difficult to acquire large labeled datasets due to the significant human effort involved in data annotation. However, it is considerably easier to collect unlabeled data due to the availability of inexpensive cameras and large public databases like Flickr and YouTube. In this dissertation, we develop efficient machine learning techniques for visual classification from small amount of labeled training data by utilizing the structure in the testing data, labeled data in a different domain and unlabeled data. This dissertation has three main parts. In the first part of the dissertation, we consider how multiple noisy samples available during testing can be utilized to perform accurate visual classification. Such multiple samples are easily available in video-based recognition problem, which is commonly encountered in visual surveillance. Specifically, we study the problem of unconstrained human recognition from iris images. We develop a Sparse Representation-based selection and recognition scheme, which learns the underlying structure of clean images. This learned structure is utilized to develop a quality measure, and a quality-based fusion scheme is proposed to combine the varying evidence. Furthermore, we extend the method to incorporate privacy, an important requirement inpractical biometric applications, without significantly affecting the recognition performance. In the second part, we analyze the problem of utilizing labeled data in a different domain to aid visual classification. We consider the problem of shifts in acquisition conditions during training and testing, which is very common in iris biometrics. In particular, we study the sensor mismatch problem, where the training samples are acquired using a sensor much older than the one used for testing. We provide one of the first solutions to this problem, a kernel learning framework to adapt iris data collected from one sensor to another. Extensive evaluations on iris data from multiple sensors demonstrate that the proposed method leads to considerable improvement in cross sensor recognition accuracy. Furthermore, since the proposed technique requires minimal changes to the iris recognition pipeline, it can easily be incorporated into existing iris recognition systems. In the last part of the dissertation, we analyze how unlabeled data available during training can assist visual classification applications. Here, we consider still image-based vision applications involving humans, where explicit motion cues are not available. A human pose often conveys not only the configuration of the body parts, but also implicit predictive information about the ensuing motion. We propose a probabilistic framework to infer this dynamic information associated with a human pose, using unlabeled and unsegmented videos available during training. The inference problem is posed as a non-parametric density estimation problem on non-Euclidean manifolds. Since direct modeling is intractable, we develop a data driven approach, estimating the density for the test sample under consideration. Statistical inference on the estimated density provides us with quantities of interest like the most probable future motion of the human and the amount of motion informatio

    Efficient and Robust Methods for Audio and Video Signal Analysis

    Get PDF
    This thesis presents my research concerning audio and video signal processing and machine learning. Specifically, the topics of my research include computationally efficient classifier compounds, automatic speech recognition (ASR), music dereverberation, video cut point detection and video classification.Computational efficacy of information retrieval based on multiple measurement modalities has been considered in this thesis. Specifically, a cascade processing framework, including a training algorithm to set its parameters has been developed for combining multiple detectors or binary classifiers in computationally efficient way. The developed cascade processing framework has been applied on video information retrieval tasks of video cut point detection and video classification. The results in video classification, compared to others found in the literature, indicate that the developed framework is capable of both accurate and computationally efficient classification. The idea of cascade processing has been additionally adapted for the ASR task. A procedure for combining multiple speech state likelihood estimation methods within an ASR framework in cascaded manner has been developed. The results obtained clearly show that without impairing the transcription accuracy the computational load of ASR can be reduced using the cascaded speech state likelihood estimation process.Additionally, this thesis presents my work on noise robustness of ASR using a nonnegative matrix factorization (NMF) -based approach. Specifically, methods for transformation of sparse NMF-features into speech state likelihoods has been explored. The results reveal that learned transformations from NMF activations to speech state likelihoods provide better ASR transcription accuracy than dictionary label -based transformations. The results, compared to others in a noisy speech recognition -challenge show that NMF-based processing is an efficient strategy for noise robustness in ASR.The thesis also presents my work on audio signal enhancement, specifically, on removing the detrimental effect of reverberation from music audio. In the work, a linear prediction -based dereverberation algorithm, which has originally been developed for speech signal enhancement, was applied for music. The results obtained show that the algorithm performs well in conjunction with music signals and indicate that dynamic compression of music does not impair the dereverberation performance

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
    corecore