23 research outputs found

    Integration of blcm and flbp in low resolution face recognition

    Get PDF
    Face recognition from face image has been a fast-growing topic in biometrics research community and a sizeable number of face recognition techniques based on texture analysis have been developed in the past few years. These techniques work well on grayscale and colour images with very few techniques deal with binary and low resolution image. With binary image becoming the preferred format for low face resolution analysis, there is need for further studies to provide a complete solution for image-based face recognition system with higher accuracy. To overcome the limitation of the existing techniques in extracting distinctive features in low resolution images due to the contrast between the face and background, we proposed a statistical feature analysis technique to fill in the gaps. To achieve this, the proposed technique integrates Binary Level Occurrence Matrix (BLCM) and Fuzzy Local Binary Pattern (FLBP) named BLCM-FLBP to extract global and local features of face from face low resolution images. The purpose of BLCM-FLBP is to distinctively improve performance of edge sharpness between black and white pixels in the binary image and to extract significant data relating to the features of face pattern. Experimental results on Yale and FEI datasets validates the superiority of the proposed technique over the other top-performing feature analysis techniques methods by utilizing different classifier which is Neural network (NN) and Random Forest (RF). The proposed technique achieved performance accuracy of 93.16% (RF), 95.27% (NN) when FEI dataset used, and the accuracy of 94.54% (RF), 93.61% (NN) when Yale.B used. Hence, the proposed technique outperforming other technique such as Gray Level Co-Occurrence Matrix (GLCM), Bag of Word (BOW), Fuzzy Local Binary Pattern (FLBP) respectively and Binary Level Occurrence Matrix (BLCM)

    Modelling facial dynamics change as people age

    Get PDF
    In the recent years, increased research activity in the area of facial ageing modelling has been recorded. This interest is attributed to the potential of using facial ageing modelling techniques for a number of different applications, including age estimation, prediction of the current appearance of missing persons, age-specific human-computer interaction, computer graphics, forensic applications, and medical applications. This thesis describes a general AAM model for modelling 4D (3D dynamic) ageing and specific models to map facial dynamics as people age. A fully automatic and robust pre-processing pipeline is used, along with an approach for tracking and inter-subject registering of 3D sequences (4D data). A 4D database of 3D videos of individuals has been assembled to achieve this goal. The database is the first of its kind in the world. Various techniques were deployed to build this database to overcome problems due to noise and missing data. A two-factor (age groups and gender) analysis of variance (MANOVA) was performed on the dataset. The groups were then compared to assess the separate effects of age on gender through variance analysis. The results show that smiles alter with age and have different characteristics between males and females. We analysed the rich sources of information present in the 3D dynamic features of smiles to provide more insight into the patterns of smile dynamics. The sources of temporal information that have been investigated include the varying dynamics of lip movements, which are analysed to extract the descriptive features. We evaluated the dynamic features of closed-mouth smiles among 80 subjects of both genders. Multilevel Principal Components Analysis (mPCA) is used to analyse the effect of naturally occurring groups in a population of individuals for smile dynamics data. A concise overview of the formal aspects of mPCA has been outlined, and we have demonstrated that mPCA offers a way to model the variations at different levels of structure in the data (between and within group levels)

    Contribution to Graph-based Multi-view Clustering: Algorithms and Applications

    Get PDF
    185 p.In this thesis, we study unsupervised learning, specifically, clustering methods for dividing data into meaningful groups. One major challenge is how to find an efficient algorithm with low computational complexity to deal with different types and sizes of datasets.For this purpose, we propose two approaches. The first approach is named "Multi-view Clustering via Kernelized Graph and Nonnegative Embedding" (MKGNE), and the second approach is called "Multi-view Clustering via Consensus Graph Learning and Nonnegative Embedding" (MVCGE). These two approaches jointly solve four tasks. They jointly estimate the unified similarity matrix over all views using the kernel tricks, the unified spectral projection of the data, the clusterindicator matrix, and the weight of each view without additional parameters. With these two approaches, there is no need for any postprocessing such as k-means clustering.In a further study, we propose a method named "Multi-view Spectral Clustering via Constrained Nonnegative Embedding" (CNESE). This method can overcome the drawbacks of the spectral clustering approaches, since they only provide a nonlinear projection of the data, on which an additional step of clustering is required. This can degrade the quality of the final clustering due to various factors such as the initialization process or outliers. Overcoming these drawbacks can be done by introducing a nonnegative embedding matrix which gives the final clustering assignment. In addition, some constraints are added to the targeted matrix to enhance the clustering performance.In accordance with the above methods, a new method called "Multi-view Spectral Clustering with a self-taught Robust Graph Learning" (MCSRGL) has been developed. Different from other approaches, this method integrates two main paradigms into the one-step multi-view clustering model. First, we construct an additional graph by using the cluster label space in addition to the graphs associated with the data space. Second, a smoothness constraint is exploited to constrain the cluster-label matrix and make it more consistent with the data views and the label view.Moreover, we propose two unified frameworks for multi-view clustering in Chapter 9. In these frameworks, we attempt to determine a view-based graphs, the consensus graph, the consensus spectral representation, and the soft clustering assignments. These methods retain the main advantages of the aforementioned methods and integrate the concepts of consensus and unified matrices. By using the unified matrices, we enforce the matrices of different views to be similar, and thus the problem of noise and inconsistency between different views will be reduced.Extensive experiments were conducted on several public datasets with different types and sizes, varying from face image datasets, to document datasets, handwritten datasets, and synthetics datasets. We provide several analyses of the proposed algorithms, including ablation studies, hyper-parameter sensitivity analyses, and computational costs. The experimental results show that the developed algorithms through this thesis are relevant and outperform several competing methods

    Image-set, Temporal and Spatiotemporal Representations of Videos for Recognizing, Localizing and Quantifying Actions

    Get PDF
    This dissertation addresses the problem of learning video representations, which is defined here as transforming the video so that its essential structure is made more visible or accessible for action recognition and quantification. In the literature, a video can be represented by a set of images, by modeling motion or temporal dynamics, and by a 3D graph with pixels as nodes. This dissertation contributes in proposing a set of models to localize, track, segment, recognize and assess actions such as (1) image-set models via aggregating subset features given by regularizing normalized CNNs, (2) image-set models via inter-frame principal recovery and sparsely coding residual actions, (3) temporally local models with spatially global motion estimated by robust feature matching and local motion estimated by action detection with motion model added, (4) spatiotemporal models 3D graph and 3D CNN to model time as a space dimension, (5) supervised hashing by jointly learning embedding and quantization, respectively. State-of-the-art performances are achieved for tasks such as quantifying facial pain and human diving. Primary conclusions of this dissertation are categorized as follows: (i) Image set can capture facial actions that are about collective representation; (ii) Sparse and low-rank representations can have the expression, identity and pose cues untangled and can be learned via an image-set model and also a linear model; (iii) Norm is related with recognizability; similarity metrics and loss functions matter; (v) Combining the MIL based boosting tracker with the Particle Filter motion model induces a good trade-off between the appearance similarity and motion consistence; (iv) Segmenting object locally makes it amenable to assign shape priors; it is feasible to learn knowledge such as shape priors online from Web data with weak supervision; (v) It works locally in both space and time to represent videos as 3D graphs; 3D CNNs work effectively when inputted with temporally meaningful clips; (vi) the rich labeled images or videos help to learn better hash functions after learning binary embedded codes than the random projections. In addition, models proposed for videos can be adapted to other sequential images such as volumetric medical images which are not included in this dissertation

    Neural function approximation on graphs: shape modelling, graph discrimination & compression

    Get PDF
    Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces

    An automatic wearable multi-sensor based gait analysis system for older adults.

    Get PDF
    Gait abnormalities in older adults are very common in clinical practice. They lead to serious adverse consequences such as falls and injury resulting in increased care cost. There is therefore a national imperative to address this challenge. Currently gait assessment is done using standardized clinical tools dependent on subjective evaluation. More objective gold standard methods (motion capture systems such as Qualisys and Vicon) to analyse gait rely on access to expensive complex equipment based in gait laboratories. These are not widely available for several reasons including a scarcity of equipment, need for technical staff, need for patients to attend in person, complicated time consuming procedures and overall expense. To broaden the use of accurate quantitative gait monitoring and assessment, the major goal of this thesis is to develop an affordable automatic gait analysis system that will provide comprehensive gait information and allow use in clinic or at home. It will also be able to quantify and visualize gait parameters, identify gait variables and changes, monitor abnormal gait patterns of older people in order to reduce the potential for falling and support falls risk management. A research program based on conducting experiments on volunteers is developed in collaboration with other researchers in Bournemouth University, The Royal Bournemouth Hospital and care homes. This thesis consists of five different studies toward addressing our major goal. Firstly, a study on the effects on sensor output from an Inertial Measurement Unit (IMU) attached to different anatomical foot locations. Placing an IMU over the bony prominence of the first cuboid bone is the best place as it delivers the most accurate data. Secondly, an automatic gait feature extraction method for analysing spatiotemporal gait features which shows that it is possible to extract gait features automatically outside of a gait laboratory. Thirdly, user friendly and easy to interpret visualization approaches are proposed to demonstrate real time spatiotemporal gait information. Four proposed approaches have the potential of helping professionals detect and interpret gait asymmetry. Fourthly, a validation study of spatiotemporal IMU extracted features compared with gold standard Motion Capture System and Treadmill measurements in young and older adults is conducted. The results obtained from three experimental conditions demonstrate that our IMU gait extracted features are highly valid for spatiotemporal gait variables in young and older adults. In the last study, an evaluation system using Procrustes and Euclidean distance matrix analysis is proposed to provide a comprehensive interpretation of shape and form differences between individual gaits. The results show that older gaits are distinguishable from young gaits. A pictorial and numerical system is proposed which indicates whether the assessed gait is normal or abnormal depending on their total feature values. This offers several advantages: 1) it is user friendly and is easy to set up and implement; 2) it does not require complex equipment with segmentation of body parts; 3) it is relatively inexpensive and therefore increases its affordability decreasing health inequality; and 4) its versatility increases its usability at home supporting inclusivity of patients who are home bound. A digital transformation strategy framework is proposed where stakeholders such as patients, health care professionals and industry partners can collaborate through development of new technologies, value creation, structural change, affordability and sustainability to improve the diagnosis and treatment of gait abnormalities

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered
    corecore