97 research outputs found

    Vision-based techniques for gait recognition

    Full text link
    Global security concerns have raised a proliferation of video surveillance devices. Intelligent surveillance systems seek to discover possible threats automatically and raise alerts. Being able to identify the surveyed object can help determine its threat level. The current generation of devices provide digital video data to be analysed for time varying features to assist in the identification process. Commonly, people queue up to access a facility and approach a video camera in full frontal view. In this environment, a variety of biometrics are available - for example, gait which includes temporal features like stride period. Gait can be measured unobtrusively at a distance. The video data will also include face features, which are short-range biometrics. In this way, one can combine biometrics naturally using one set of data. In this paper we survey current techniques of gait recognition and modelling with the environment in which the research was conducted. We also discuss in detail the issues arising from deriving gait data, such as perspective and occlusion effects, together with the associated computer vision challenges of reliable tracking of human movement. Then, after highlighting these issues and challenges related to gait processing, we proceed to discuss the frameworks combining gait with other biometrics. We then provide motivations for a novel paradigm in biometrics-based human recognition, i.e. the use of the fronto-normal view of gait as a far-range biometrics combined with biometrics operating at a near distance

    Improving Bags-of-Words model for object categorization

    Get PDF
    In the past decade, Bags-of-Words (BOW) models have become popular for the task of object recognition, owing to their good performance and simplicity. Some of the most effective recent methods for computer-based object recognition work by detecting and extracting local image features, before quantizing them according to a codebook rule such as k-means clustering, and classifying these with conventional classifiers such as Support Vector Machines and Naive Bayes. In this thesis, a Spatial Object Recognition Framework is presented that consists of the four main contributions of the research. The first contribution, frequent keypoint pattern discovery, works by combining pairs and triplets of frequent keypoints in order to discover intermediate representations for object classes. Based on the same frequent keypoints principle, algorithms for locating the region-of-interest in training images is then discussed. Extensions to the successful Spatial Pyramid Matching scheme, in order to better capture spatial relationships, are then proposed. The pairs frequency histogram and shapes frequency histogram work by capturing more redefined spatial information between local image features. Finally, alternative techniques to Spatial Pyramid Matching for capturing spatial information are presented. The proposed techniques, variations of binned log-polar histograms, divides the image into grids of different scale and different orientation. Thus captures the distribution of image features both in distance and orientation explicitly. Evaluations on the framework are focused on several recent and popular datasets, including image retrieval, object recognition, and object categorization. Overall, while the effectiveness of the framework is limited in some of the datasets, the proposed contributions are nevertheless powerful improvements of the BOW model

    Investigation of new feature descriptors for image search and classification

    Get PDF
    Content-based image search, classification and retrieval is an active and important research area due to its broad applications as well as the complexity of the problem. Understanding the semantics and contents of images for recognition remains one of the most difficult and prevailing problems in the machine intelligence and computer vision community. With large variations in size, pose, illumination and occlusions, image classification is a very challenging task. A good classification framework should address the key issues of discriminatory feature extraction as well as efficient and accurate classification. Towards that end, this dissertation focuses on exploring new image descriptors by incorporating cues from the human visual system, and integrating local, texture, shape as well as color information to construct robust and effective feature representations for advancing content-based image search and classification. Based on the Gabor wavelet transformation, whose kernels are similar to the 2D receptive field profiles of the mammalian cortical simple cells, a series of new image descriptors is developed. Specifically, first, a new color Gabor-HOG (GHOG) descriptor is introduced by concatenating the Histograms of Oriented Gradients (HOG) of the component images produced by applying Gabor filters in multiple scales and orientations to encode shape information. Second, the GHOG descriptor is analyzed in six different color spaces and grayscale to propose different color GHOG descriptors, which are further combined to present a new Fused Color GHOG (FC-GHOG) descriptor. Third, a novel GaborPHOG (GPHOG) descriptor is proposed which improves upon the Pyramid Histograms of Oriented Gradients (PHOG) descriptor, and subsequently a new FC-GPHOG descriptor is constructed by combining the multiple color GPHOG descriptors and employing the Principal Component Analysis (PCA). Next, the Gabor-LBP (GLBP) is derived by accumulating the Local Binary Patterns (LBP) histograms of the local Gabor filtered images to encode texture and local information of an image. Furthermore, a novel Gabor-LBPPHOG (GLP) image descriptor is proposed which integrates the GLBP and the GPHOG descriptors as a feature set and an innovative Fused Color Gabor-LBP-PHOG (FC-GLP) is constructed by fusing the GLP from multiple color spaces. Subsequently, The GLBP and the GHOG descriptors are then combined to produce the Gabor-LBP-HOG (GLH) feature vector which performs well on different object and scene image categories. The six color GLH vectors are further concatenated to form the Fused Color GLH (FC-GLH) descriptor. Finally, the Wigner based Local Binary Patterns (WLBP) descriptor is proposed that combines multi-neighborhood LBP, Pseudo-Wigner distribution of images and the popular bag of words model to effectively classify scene images. To assess the feasibility of the proposed new image descriptors, two classification methods are used: one method applies the PCA and the Enhanced Fisher Model (EFM) for feature extraction and the nearest neighbor rule for classification, while the other method employs the Support Vector Machine (SVM). The classification performance of the proposed descriptors is tested on several publicly available popular image datasets. The experimental results show that the proposed new image descriptors achieve image search and classification results better than or at par with other popular image descriptors, such as the Scale Invariant Feature Transform (SIFT), the Pyramid Histograms of visual Words (PHOW), the Pyramid Histograms of Oriented Gradients (PHOG), the Spatial Envelope (SE), the Color SIFT four Concentric Circles (C4CC), the Object Bank (OB), the Context Aware Topic Model (CA-TM), the Hierarchical Matching Pursuit (HMP), the Kernel Spatial Pyramid Matching (KSPM), the SIFT Sparse-coded Spatial Pyramid Matching (Sc-SPM), the Kernel Codebook (KC) and the LBP

    Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification

    Full text link
    Designing discriminative powerful texture features robust to realistic imaging conditions is a challenging computer vision problem with many applications, including material recognition and analysis of satellite or aerial imagery. In the past, most texture description approaches were based on dense orderless statistical distribution of local features. However, most recent approaches to texture recognition and remote sensing scene classification are based on Convolutional Neural Networks (CNNs). The d facto practice when learning these CNN models is to use RGB patches as input with training performed on large amounts of labeled data (ImageNet). In this paper, we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained using mapped coded images with explicit texture information provide complementary information to the standard RGB deep models. Additionally, two deep architectures, namely early and late fusion, are investigated to combine the texture and color information. To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification. We perform comprehensive experiments on four texture recognition datasets and four remote sensing scene classification benchmarks: UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with 7 categories and the recently introduced large scale aerial image dataset (AID) with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary information to standard RGB deep model of the same network architecture. Our late fusion TEX-Net architecture always improves the overall performance compared to the standard RGB network on both recognition problems. Our final combination outperforms the state-of-the-art without employing fine-tuning or ensemble of RGB network architectures.Comment: To appear in ISPRS Journal of Photogrammetry and Remote Sensin

    A Better Looking Brain: Image Pre-Processing Approaches for fMRI Data

    Get PDF
    Researchers in the field of functional neuroimaging have faced a long standing problem in pre-processing low spatial resolution data without losing meaningful details within. Commonly, the brain function is recorded by a technique known as echo-planar imaging that represents the measure of blood flow (BOLD signal) through a particular location in the brain as an array of intensity values changing over time. This approach to record a movie of blood flow in the brain is known as fMRI. The neural activity is then studied from the temporal correlation patterns existing within the fMRI time series. However, the resulting images are noisy and contain low spatial detail, thus making it imperative to pre-process them appropriately to derive meaningful activation patterns. Two of the several standard preprocessing steps employed just before the analysis stage are denoising and normalization. Fundamentally, it is difficult to perfectly remove noise from an image without making assumptions about signal and noise distributions. A convenient and commonly used alternative is to smooth the image with a Gaussian filter, but this method suffers from various obvious drawbacks, primarily loss of spatial detail. A greater challenge arises when we attempt to derive average activation patterns from fMRI images acquired from a group of individuals. The brain of one individual differs from others in a structural sense as well as in a functional sense. Commonly, the inter-individual differences in anatomical structures are compensated for by co-registering each subject\u27s data to a common normalization space, known as spatial normalization. However, there are no existing methods to compensate for the differences in functional organization of the brain. This work presents first steps towards data-driven robust algorithms for fMRI image denoising and multi-subject image normalization by utilizing inherent information within fMRI data. In addition, a new validation approach based on spatial shape of the activation regions is presented to quantify the effects of preprocessing and also as a tool to record the differences in activation patterns between individual subjects or within two groups such as healthy controls and patients with mental illness. Qualititative and quantitative results of the proposed framework compare favorably against existing and widely used model-driven approaches such as Gaussian smoothing and structure-based spatial normalization. This work is intended to provide neuroscience researchers tools to derive more meaningful activation patterns to accurately identify imaging biomarkers for various neurodevelopmental diseases and also maximize the specificity of a diagnosis

    Statistical Medial Model dor Cardiac Segmentation and Morphometry

    Get PDF
    In biomedical image analysis, shape information can be utilized for many purposes. For example, irregular shape features can help identify diseases; shape features can help match different instances of anatomical structures for statistical comparison; and prior knowledge of the mean and possible variation of an anatomical structure\u27s shape can help segment a new example of this structure in noisy, low-contrast images. A good shape representation helps to improve the performance of the above techniques. The overall goal of the proposed research is to develop and evaluate methods for representing shapes of anatomical structures. The medial model is a shape representation method that models a 3D object by explicitly defining its skeleton (medial axis) and deriving the object\u27s boundary via inverse-skeletonization . This model represents shape compactly, and naturally expresses descriptive global shape features like thickening , bending , and elongation . However, its application in biomedical image analysis has been limited, and it has not yet been applied to the heart, which has a complex shape. In this thesis, I focus on developing efficient methods to construct the medial model, and apply it to solve biomedical image analysis problems. I propose a new 3D medial model which can be efficiently applied to complex shapes. The proposed medial model closely approximates the medial geometry along medial edge curves and medial branching curves by soft-penalty optimization and local correction. I further develop a scheme to perform model-based segmentation using a statistical medial model which incorporates prior shape and appearance information. The proposed medial models are applied to a series of image analysis tasks. The 2D medial model is applied to the corpus callosum which results in an improved alignment of the patterns of commissural connectivity compared to a volumetric registration method. The 3D medial model is used to describe the myocardium of the left and right ventricles, which provides detailed thickness maps characterizing different disease states. The model-based myocardium segmentation scheme is tested in a heterogeneous adult MRI dataset. Our segmentation experiments demonstrate that the statistical medial model can accurately segment the ventricular myocardium and provide useful parameters to characterize heart function

    Improving Multi-view Facial Expression Recognition in Unconstrained Environments

    Get PDF
    Facial expression and emotion-related research has been a longstanding activity in psychology while computerized/automatic facial expression recognition of emotion is a relative recent and still emerging but active research area. Although many automatic computer systems have been proposed to address facial expression recognition problems, the majority of them fail to cope with the requirements of many practical application scenarios arising from either environmental factors or unexpected behavioural bias introduced by the users, such as illumination conditions and large head pose variation to the camera. In this thesis, two of the most influential and common issues raised in practical application scenarios when applying automatic facial expression recognition system are comprehensively explored and investigated. Through a series of experiments carried out under a proposed texture-based system framework for multi-view facial expression recognition, several novel texture feature representations are introduced for implementing multi-view facial expression recognition systems in practical environments, for which the state-of-the-art performance is achieved. In addition, a variety of novel categorization schemes for the configurations of an automatic multi-view facial expression recognition system is presented to address the impractical discrete categorization of facial expression of emotions in real-world scenarios. A significant improvement is observed when using the proposed categorizations in the proposed system framework using a novel implementation of the block based local ternary pattern approach

    Recognizing Emotions Conveyed through Facial Expressions

    Get PDF
    Emotional communication is a key element of habilitation care of persons with dementia. It is, therefore, highly preferable for assistive robots that are used to supplement human care provided to persons with dementia, to possess the ability to recognize and respond to emotions expressed by those who are being cared-for. Facial expressions are one of the key modalities through which emotions are conveyed. This work focuses on computer vision-based recognition of facial expressions of emotions conveyed by the elderly. Although there has been much work on automatic facial expression recognition, the algorithms have been experimentally validated primarily on young faces. The facial expressions on older faces has been totally excluded. This is due to the fact that the facial expression databases that were available and that have been used in facial expression recognition research so far do not contain images of facial expressions of people above the age of 65 years. To overcome this problem, we adopt a recently published database, namely, the FACES database, which was developed to address exactly the same problem in the area of human behavioural research. The FACES database contains 2052 images of six different facial expressions, with almost identical and systematic representation of the young, middle-aged and older age-groups. In this work, we evaluate and compare the performance of two of the existing imagebased approaches for facial expression recognition, over a broad spectrum of age ranging from 19 to 80 years. The evaluated systems use Gabor filters and uniform local binary patterns (LBP) for feature extraction, and AdaBoost.MH with multi-threshold stump learner for expression classification. We have experimentally validated the hypotheses that facial expression recognition systems trained only on young faces perform poorly on middle-aged and older faces, and that such systems confuse ageing-related facial features on neutral faces with other expressions of emotions. We also identified that, among the three age-groups, the middle-aged group provides the best generalization performance across the entire age spectrum. The performance of the systems was also compared to the performance of humans in recognizing facial expressions of emotions. Some similarities were observed, such as, difficulty in recognizing the expressions on older faces, and difficulty in recognizing the expression of sadness. The findings of our work establish the need for developing approaches for facial expression recognition that are robust to the effects of ageing on the face. The scientific results of our work can be used as a basis to guide future research in this direction
    corecore