3 research outputs found

    A novel shape descriptor based on salient keypoints detection for binary image matching and retrieval

    Get PDF
    We introduce a shape descriptor that extracts keypoints from binary images and automatically detects the salient ones among them. The proposed descriptor operates as follows: First, the contours of the image are detected and an image transformation is used to generate background information. Next, pixels of the transformed image that have specific characteristics in their local areas are used to extract keypoints. Afterwards, the most salient keypoints are automatically detected by filtering out redundant and sensitive ones. Finally, a feature vector is calculated for each keypoint by using the distribution of contour points in its local area. The proposed descriptor is evaluated using public datasets of silhouette images, handwritten math expressions, hand-drawn diagram sketches, and noisy scanned logos. Experimental results show that the proposed descriptor compares strongly against state of the art methods, and that it is reliable when applied on challenging images such as fluctuated handwriting and noisy scanned images. Furthermore, we integrate our descripto

    Spatio-temporal framework on facial expression recognition.

    Get PDF
    This thesis presents an investigation into two topics that are important in facial expression recognition: how to employ the dynamic information from facial expression image sequences and how to efficiently extract context and other relevant information of different facial regions. This involves the development of spatio-temporal frameworks for recognising facial expression. The thesis proposed three novel frameworks for recognising facial expression. The first framework uses sparse representation to extract features from patches of a face to improve the recognition performance, where part-based methods which are robust to image alignment are applied. In addition, the use of sparse representation reduces the dimensionality of features, and improves the semantic meaning and represents a face image more efficiently. Since a facial expression involves a dynamic process, and the process contains information that describes a facial expression more effectively, it is important to capture such dynamic information so as to recognise facial expressions over the entire video sequence. Thus, the second framework uses two types of dynamic information to enhance the recognition: a novel spatio-temporal descriptor based on PHOG (pyramid histogram of gradient) to represent changes in facial shape, and dense optical flow to estimate the movement (displacement) of facial landmarks. The framework views an image sequence as a spatio-temporal volume, and uses temporal information to represent the dynamic movement of facial landmarks associated with a facial expression. Specifically, spatial based descriptor representing spatial local shape is extended to spatio-temporal domain to capture the changes in local shape of facial sub-regions in the temporal dimension to give 3D facial component sub-regions of forehead, mouth, eyebrow and nose. The descriptor of optical flow is also employed to extract the information of temporal. The fusion of these two descriptors enhance the dynamic information and achieves better performance than the individual descriptors. The third framework also focuses on analysing the dynamics of facial expression sequences to represent spatial-temporal dynamic information (i.e., velocity). Two types of features are generated: a spatio-temporal shape representation to enhance the local spatial and dynamic information, and a dynamic appearance representation. In addition, an entropy-based method is introduced to provide spatial relationship of different parts of a face by computing the entropy value of different sub-regions of a face

    Spatio-temporal Representation and Analysis of Facial Expressions with Varying Intensities

    Get PDF
    PhDFacial expressions convey a wealth of information about our feelings, personality and mental state. In this thesis we seek efficient ways of representing and analysing facial expressions of varying intensities. Firstly, we analyse state-of-the-art systems by decomposing them into their fundamental components, in an effort to understand what are the useful practices common to successful systems. Secondly, we address the problem of sequence registration, which emerged as an open issue in our analysis. The encoding of the (non-rigid) motions generated by facial expressions is facilitated when the rigid motions caused by irrelevant factors, such as camera movement, are eliminated. We propose a sequence registration framework that is based on pre-trained regressors of Gabor motion energy. Comprehensive experiments show that the proposed method achieves very high registration accuracy even under difficult illumination variations. Finally, we propose an unsupervised representation learning framework for encoding the spatio-temporal evolution of facial expressions. The proposed framework is inspired by the Facial Action Coding System (FACS), which predates computer-based analysis. FACS encodes an expression in terms of localised facial movements and assigns an intensity score for each movement. The framework we propose mimics those two properties of FACS. Specifically, we propose to learn from data a linear transformation that approximates the facial expression variation in a sequence as a weighted sum of localised basis functions, where the weight of each basis function relates to movement intensity. We show that the proposed framework provides a plausible description of facial expressions, and leads to state-of-the-art performance in recognising expressions across intensities; from fully blown expressions to micro-expressions
    corecore