78 research outputs found
Clique descriptor of affine invariant regions for robust wide baseline image matching
Assuming that the image distortion between corresponding regions of a stereo pair of images with wide baseline can be approximated as an affine transformation if the regions are reasonably small, recent image matching algorithms have focused on affine invariant region (IR) detection and its description to increase the robustness in matching. However, the distinctiveness of an intensity-based region descriptor tends to deteriorate when an image includes homogeneous texture or repetitive pattern. To address this problem, we investigated the geometry of a local IR cluster (also called a clique) and propose a new clique-based image matching method. In the proposed method, the clique of an IR is estimated by Delaunay triangulation in a local affine frame and the Hausdorff distance is adopted for matching an inexact number of multiple descriptor vectors. We also introduce two adaptively weighted clique distances, where the neighbour distance in a clique is appropriately weighted according to characteristics of the local feature distribution. Experimental results show the clique-based matching method produces more tentative correspondences than variants of the SIFT-based method
Image resolution enhancement using dual-tree complex wavelet transform
In this letter, a complex wavelet-domain image resolution enhancement algorithm based on the estimation of wavelet coefficients is proposed. The method uses a forward and inverse dual-tree complex wavelet transform (DT-CWT) to construct a high-resolution (HR) image from the given low-resolution (LR) image. The HR image is reconstructed from the LR image, together with a set of wavelet coefficients, using the inverse DT-CWT. The set of wavelet coefficients is estimated from the DT-CWT decomposition of the rough estimation of the HR image. Results are presented and discussed on very HR QuickBird data, through comparisons between state-of-the-art resolution enhancement methods
Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding
Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness
Multi-set canonical correlation analysis for 3D abnormal gait behaviour recognition based on virtual sample generation
Small sample dataset and two-dimensional (2D) approach are challenges to vision-based abnormal gait behaviour recognition (AGBR). The lack of three-dimensional (3D) structure of the human body causes 2D based methods to be limited in abnormal gait virtual sample generation (VSG). In this paper, 3D AGBR based on VSG and multi-set canonical correlation analysis (3D-AGRBMCCA) is proposed. First, the unstructured point cloud data of gait are obtained by using a structured light sensor. A 3D parametric body model is then deformed to fit the point cloud data, both in shape and posture. The features of point cloud data are then converted to a high-level structured representation of the body. The parametric body model is used for VSG based on the estimated body pose and shape data. Symmetry virtual samples, pose-perturbation virtual samples and various body-shape virtual samples with multi-views are generated to extend the training samples. The spatial-temporal features of the abnormal gait behaviour from different views, body pose and shape parameters are then extracted by convolutional neural network based Long Short-Term Memory model network. These are projected onto a uniform pattern space using deep learning based multi-set canonical correlation analysis. Experiments on four publicly available datasets show the proposed system performs well under various conditions
Silhouette-based gait recognition using Procrustes shape analysis and elliptic Fourier descriptors
This paper presents a gait recognition method which combines spatio-temporal motion characteristics, statistical and physical parameters (referred to as STM-SPP) of a human subject for its classification by analysing shape of the subject's silhouette contours using Procrustes shape analysis (PSA) and elliptic Fourier descriptors (EFDs). STM-SPP uses spatio-temporal gait characteristics and physical parameters of human body to resolve similar dissimilarity scores between probe and gallery sequences obtained by PSA. A part-based shape analysis using EFDs is also introduced to achieve robustness against carrying conditions. The classification results by PSA and EFDs are combined, resolving tie in ranking using contour matching based on Hu moments. Experimental results show STM-SPP outperforms several silhouette-based gait recognition methods
A dynamic framework based on local Zernike Moment and motion history image for facial expression recognition
A dynamic descriptor facilitates robust recognition of facial expressions in video sequences. The current two main approaches to the recognition are basic emotion recognition and recognition based on facial action coding system (FACS) action units. In this paper we focus on basic emotion recognition and propose a spatio-temporal feature based on local Zernike moment in the spatial domain using motion change frequency. We also design a dynamic feature comprising motion history image and entropy. To recognise a facial expression, a weighting strategy based on the latter feature and sub-division of the image frame is applied to the former to enhance the dynamic information of facial expression, and followed by the application of the classical support vector machine. Experiments on the CK+ and MMI datasets using leave-one-out cross validation scheme demonstrate that the integrated framework achieves a better performance than using individual descriptor separately. Compared with six state-of-arts methods, the proposed framework demonstrates a superior performance
View and clothing invariant gait recognition via 3D human semantic folding
A novel 3-dimensional (3D) human semantic folding is introduced to provide a robust and efficient gait recognition method which is invariant to camera view and clothing style. The proposed gait recognition method comprises three modules: (1) 3D body pose, shape and viewing data estimation network (3D-BPSVeNet); (2) gait semantic parameter folding model; and (3) gait semantic feature refining network. First, 3D-BPSVeNet is constructed based on a convolution gated recurrent unit (ConvGRU) to extract 2-dimensional (2D) to 3D body pose and shape semantic descriptors (2D-3D-BPSDs) from a sequence of gait parsed RGB images. A 3D gait model with virtual dressing is then constructed by morphing the template of 3D body model using the estimated 2D-3D-BPSDs and the recognized clothing styles. The more accurate 2D-3D-BPSDs without clothes are then obtained by using the silhouette similarity function when updating the 3D body model to fit the 2D gait. Second, the intrinsic 2D-3D-BPSDs without interference from clothes are encoded by sparse distributed representation (SDR) to gain the binary gait semantic image (SD-BGSI) in a topographical semantic space. By averaging the SD-BGSIs in a gait cycle, a gait semantic folding image (GSFI) is obtained to give a high-level representation of gait. Third, a gait semantic feature refining network is trained to refine the semantic feature extracted directly from GSFI using three types of prior knowledge, i.e., viewing angles, clothing styles and carrying condition. Experimental analyses on CMU MoBo, CASIA B, KY4D, OU-MVLP and OU-ISIR datasets show a significant performance gain in gait recognition in terms of accuracy and robustness
Fusing dynamic deep learned features and handcrafted features for facial expression recognition
The automated recognition of facial expressions has been actively researched due to its wide-ranging applications. The recent advances in deep learning have improved the performance facial expression recognition (FER) methods. In this paper, we propose a framework that combines discriminative features learned using convolutional neural networks and handcrafted features that include shape- and appearance-based features to further improve the robustness and accuracy of FER. In addition, texture information is extracted from facial patches to enhance the discriminative power of the extracted textures. By encoding shape, appearance, and deep dynamic information, the proposed framework provides high performance and outperforms state-of-the-art FER methods on the CK+ dataset
A mutual information based adaptive windowing of informative EEG for emotion recognition
Emotion recognition using brain wave signals involves using high dimensional electroencephalogram (EEG) data. In this paper, a window selection method based on mutual information is introduced to select an appropriate signal window to reduce the length of the signals. The motivation of the windowing method comes from EEG emotion recognition being computationally costly and the data having low signal-to-noise ratio. The aim of the windowing method is to find a reduced signal where the emotions are strongest. In this paper, it is suggested, that using only the signal section which best describes emotions improves the classification of emotions. This is achieved by iteratively comparing different-length EEG signals at different time locations using the mutual information between the reduced signal and emotion labels as criterion. The reduced signal with the highest mutual information is used for extracting the features for emotion classification. In addition, a viable framework for emotion recognition is introduced. Experimental results on publicly available datasets, DEAP and MAHNOB-HCI, show significant improvement in emotion recognition accuracy
Robust contactless pulse transit time estimation based on signal quality metric
The pulse transit time (PTT) can provide valuable insight into cardiovascular health, specifically regarding arterial stiffness and blood pressure. Traditionally, PTT is derived by calculating the time difference between two photoplethysmography (PPG) measurements, which require a set of body-worn sensors attached to the skin. Recently, remote photoplethysmography (rPPG) has been proposed as a contactless monitoring alternative. The main problem with rPPG based PTT estimation is that motion artifacts affect the shape of waveform leading to the shift or over-detected peaks, which decreases the accuracy of PTT. To overcome this problem, this paper presents a robust pulse-by-pulse PTT estimation framework using a signal quality metric. By exploiting the local temporal information and global periodic characteristics, the metric automatically assesses pulse quality of signal on a pulse-by-pulse basis, and calculates the probabilities of the pulse peak being the actual peak. Furthermore, in order to cope with over-detected and shift pulse peaks, Kalman filter complemented by the proposed signal quality metric is used to adaptively adjust the peaks based on the estimated probability. All the refined peaks are finally used for pulse-by-pulse PTT estimation. The experiment results are promising, suggesting that the proposed framework provides a robust and more accurate PTT estimation in real applications
- …