3 research outputs found
A novel shape descriptor based on salient keypoints detection for binary image matching and retrieval
We introduce a shape descriptor that extracts keypoints from binary images and
automatically detects the salient ones among them. The proposed descriptor operates as
follows: First, the contours of the image are detected and an image transformation is used to
generate background information. Next, pixels of the transformed image that have specific
characteristics in their local areas are used to extract keypoints. Afterwards, the most salient
keypoints are automatically detected by filtering out redundant and sensitive ones. Finally,
a feature vector is calculated for each keypoint by using the distribution of contour points
in its local area. The proposed descriptor is evaluated using public datasets of silhouette
images, handwritten math expressions, hand-drawn diagram sketches, and noisy scanned
logos. Experimental results show that the proposed descriptor compares strongly against
state of the art methods, and that it is reliable when applied on challenging images such as
fluctuated handwriting and noisy scanned images. Furthermore, we integrate our descripto
Spatio-temporal framework on facial expression recognition.
This thesis presents an investigation into two topics that are important in facial expression recognition: how to employ the dynamic information from facial expression image sequences and how to efficiently extract context and other relevant information of different facial regions. This involves the development of spatio-temporal frameworks for recognising facial expression.
The thesis proposed three novel frameworks for recognising facial expression. The first framework uses sparse representation to extract features from patches of a face to improve the recognition performance, where part-based methods which are robust to image alignment are applied. In addition, the use of sparse representation reduces the dimensionality of features, and improves the semantic meaning and represents a face image more efficiently.
Since a facial expression involves a dynamic process, and the process contains information that describes a facial expression more effectively, it is important to capture such dynamic information so as to recognise facial expressions over the entire video sequence. Thus, the second framework uses two types of dynamic information to enhance the recognition: a novel spatio-temporal descriptor based on PHOG (pyramid histogram of gradient) to represent changes in facial shape, and dense optical flow to estimate the movement (displacement) of facial landmarks. The framework views an image sequence as a spatio-temporal volume, and uses temporal information to represent the dynamic movement of facial landmarks associated with a facial expression. Specifically, spatial based descriptor representing spatial local shape is extended to spatio-temporal domain to capture the changes in local shape of facial sub-regions in the temporal dimension to give 3D facial component sub-regions of forehead, mouth, eyebrow and nose. The descriptor of optical flow is also employed to extract the information of temporal. The fusion of these two descriptors enhance the dynamic information and achieves better performance than the individual descriptors.
The third framework also focuses on analysing the dynamics of facial expression sequences to represent spatial-temporal dynamic information (i.e., velocity). Two types of features are generated: a spatio-temporal shape representation to enhance the local spatial and dynamic information, and a dynamic appearance representation. In addition, an entropy-based method is introduced to provide spatial relationship of different parts of a face by computing the entropy value of different sub-regions of a face
Spatio-temporal Representation and Analysis of Facial Expressions with Varying Intensities
PhDFacial expressions convey a wealth of information about our feelings, personality and mental
state. In this thesis we seek efficient ways of representing and analysing facial expressions of
varying intensities. Firstly, we analyse state-of-the-art systems by decomposing them into their
fundamental components, in an effort to understand what are the useful practices common to
successful systems. Secondly, we address the problem of sequence registration, which emerged
as an open issue in our analysis. The encoding of the (non-rigid) motions generated by facial expressions
is facilitated when the rigid motions caused by irrelevant factors, such as camera movement,
are eliminated. We propose a sequence registration framework that is based on pre-trained
regressors of Gabor motion energy. Comprehensive experiments show that the proposed method
achieves very high registration accuracy even under difficult illumination variations. Finally,
we propose an unsupervised representation learning framework for encoding the spatio-temporal
evolution of facial expressions. The proposed framework is inspired by the Facial Action Coding
System (FACS), which predates computer-based analysis. FACS encodes an expression in terms
of localised facial movements and assigns an intensity score for each movement. The framework
we propose mimics those two properties of FACS. Specifically, we propose to learn from
data a linear transformation that approximates the facial expression variation in a sequence as
a weighted sum of localised basis functions, where the weight of each basis function relates to
movement intensity. We show that the proposed framework provides a plausible description of
facial expressions, and leads to state-of-the-art performance in recognising expressions across
intensities; from fully blown expressions to micro-expressions