136 research outputs found
Discriminative Hessian Eigenmaps for face recognition
Dimension reduction algorithms have attracted a lot of attentions in face recognition because they can select a subset of effective and efficient discriminative features in the face images. Most of dimension reduction algorithms can not well model both the intra-class geometry and interclass discrimination simultaneously. In this paper, we introduce the Discriminative Hessian Eigenmaps (DHE), a novel dimension reduction algorithm to address this problem. DHE will consider encoding the geometric and discriminative information in a local patch by improved Hessian Eigenmaps and margin maximization respectively. Empirical studies on public face database thoroughly demonstrate that DHE is superior to popular algorithms for dimension reduction, e.g., FLDA, LPP, MFA and DLA. ©2010 IEEE.published_or_final_versionThe 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Dallas, TX., 14-19 March 2010. In IEEE International Conference on Acoustics, Speech and Signal Processing Proceedings, 2010, p. 5586-558
Cross-domain web image annotation
In recent years, cross-domain learning algorithms have attracted much attention to solve labeled data insufficient problem. However, these cross-domain learning algorithms cannot be applied for subspace learning, which plays a key role in multimedia, e.g., web image annotation. This paper envisions the cross-domain discriminative subspace learning and provides an effective solution to cross-domain subspace learning. In particular, we propose the cross-domain discriminative Hessian Eigenmaps or CDHE for short. CDHE connects the training and the testing samples by minimizing the quadratic distance between the distribution of the training samples and that of the testing samples. Therefore, a common subspace for data representation can be preserved. We basically expect the discriminative information to separate the concepts in the training set can be shared to separate the concepts in the testing set as well and thus we have a chance to address above cross-domain problem duly. The margin maximization is duly adopted in CDHE so the discriminative information for separating different classes can be well preserved. Finally, CDHE encodes the local geometry of each training class in the local tangent space which is locally isometric to the data manifold and thus can locally preserve the intra-class local geometry. Experimental evidence on real world image datasets demonstrates the effectiveness of CDHE for cross-domain web image annotation. © 2009 IEEE.published_or_final_versionThe IEEE International Conference on Data Mining Workshops (ICDMW) 2009, Miami, FL., 6 December 2009. In Proceedings of the IEEE International Conference on Data Mining, 2009, p. 184-18
Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction
It is difficult to find the optimal sparse solution of a manifold learning
based dimensionality reduction algorithm. The lasso or the elastic net
penalized manifold learning based dimensionality reduction is not directly a
lasso penalized least square problem and thus the least angle regression (LARS)
(Efron et al. \cite{LARS}), one of the most popular algorithms in sparse
learning, cannot be applied. Therefore, most current approaches take indirect
ways or have strict settings, which can be inconvenient for applications. In
this paper, we proposed the manifold elastic net or MEN for short. MEN
incorporates the merits of both the manifold learning based dimensionality
reduction and the sparse learning based dimensionality reduction. By using a
series of equivalent transformations, we show MEN is equivalent to the lasso
penalized least square problem and thus LARS is adopted to obtain the optimal
sparse solution of MEN. In particular, MEN has the following advantages for
subsequent classification: 1) the local geometry of samples is well preserved
for low dimensional data representation, 2) both the margin maximization and
the classification error minimization are considered for sparse projection
calculation, 3) the projection matrix of MEN improves the parsimony in
computation, 4) the elastic net penalty reduces the over-fitting problem, and
5) the projection matrix of MEN can be interpreted psychologically and
physiologically. Experimental evidence on face recognition over various popular
datasets suggests that MEN is superior to top level dimensionality reduction
algorithms.Comment: 33 pages, 12 figure
Parametric face alignment : generative and discriminative approaches
Tese de doutoramento em Engenharia Electrotécnica e de Computadores, apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraThis thesis addresses the matching of deformable human face models into 2D images.
Two di erent approaches are detailed: generative and discriminative methods. Generative
or holistic methods model the appearance/texture of all image pixels describing
the face by synthesizing the expected appearance (it builds synthetic versions of the target
face). Discriminative or patch-based methods model the local correlations between
pixel values. Such approach uses an ensemble of local feature detectors all connected
by a shape regularization model. Typically, generative approaches can achieve higher
tting accuracy, but discriminative methods perform a lot better in unseen images.
The Active Appearance Models (AAMs) are probably the most widely used generative
technique. AAMs match parametric models of shape and appearance into new
images by solving a nonlinear optimization that minimizes the di erence between a
synthetic template and the real appearance. The rst part of this thesis describes the
2.5D AAM, an extension of the original 2D AAM that deals with a full perspective
projection model. The 2.5D AAM uses a 3D Point Distribution Model (PDM) and a
2D appearance model whose control points are de ned by a perspective projection of
the PDM. Two model tting algorithms and their computational e cient approximations
are proposed: the Simultaneous Forwards Additive (SFA) and the Normalization
Forwards Additive (NFA). Robust solutions for the SFA and NFA are also proposed in
order to take into account the self-occlusion and/or partial occlusion of the face. Extensive
results, involving the tting convergence, tting performance in unseen data,
robustness to occlusion, tracking performance and pose estimation are shown.
The second main part of this thesis concerns to discriminative methods such as
the Constrained Local Models (CLM) or the Active Shape Models (ASM), where an ensemble of local feature detectors are constrained to lie within the subspace spanned
by a PDM. Fitting such a model to an image typically involves two steps: (1) a local
search using a detector, obtaining response maps for each landmark and (2) a global
optimization that nds the shape parameters that jointly maximize all the detection responses.
This work proposes: Discriminative Bayesian Active Shape Models (DBASM)
a new global optimization strategy, using a Bayesian approach, where the posterior distribution
of the shape parameters are inferred in a maximum a posteriori (MAP) sense
by means of a Linear Dynamical System (LDS). The DBASM approach models the covariance
of the latent variables i.e. it uses 2nd order statistics of the shape (and pose)
parameters. Later, Bayesian Active Shape Models (BASM) is presented. BASM is an
extension of the previous DBASM formulation where the prior distribution is explicitly
modeled by means of recursive Bayesian estimation. Extensive results are presented,
evaluating DBASM and BASM global optimization strategies, local face parts detectors
and tracking performance in several standard datasets. Qualitative results taken
from the challenging Labeled Faces in the Wild (LFW) dataset are also shown.
Finally, the last part of this thesis, addresses the identity and facial expression
recognition. Face geometry is extracted from input images using the AAM and low
dimensional manifolds were then derived using Laplacian EigenMaps (LE) resulting in
two types of manifolds, one for representing identity and the other for person-speci c
facial expression. The identity and facial expression recognition system uses a two
stage approach: First, a Support Vector Machines (SVM) is used to establish identity
across expression changes, then the second stage deals with person-speci c expression
recognition with a network of Hidden Markov Models (HMMs). Results taken from
people exhibiting the six basic expressions (happiness, sadness, anger, fear, surprise
and disgust) plus the neutral emotion are shown.Esta tese aborda a correspond^encia de modelos humanos de faces deform aveis em
imagens 2D. S~ao apresentadas duas abordagens diferentes: m etodos generativos e discriminativos.
Os modelos generativos ou hol sticos modelam a apar^encia/textura de
todos os pixeis que descrevem a face, sintetizando a apar^encia esperada (s~ao criadas
vers~oes sint eticas da face alvo). Os modelos discriminativos ou baseados em partes
modelam correla c~oes locais entre valores de pixeis. Esta abordagem utiliza um conjunto
de detectores locais de caracter sticas, conectados por um modelo de regulariza c~ao
geom etrico. Normalmente, as abordagens generativas permitem obter uma maior precis~
ao de ajuste do modelo, mas os m etodos discriminativos funcionam bastante melhor
em imagens nunca antes vistas.
Os Modelos Activos de Apar^encia (AAMs) s~ao provavelmente a t ecnica generativa
mais utilizada. Os AAMs ajustam modelos param etricos de forma e apar^encia em
imagens, resolvendo uma optimiza c~ao n~ao linear que minimiza a diferen ca entre o
modelo sint etico e a apar^encia real. A primeira parte desta tese descreve os AAM
2.5D, uma extens~ao do AAM original 2D que permite a utiliza c~ao de um modelo de
projec c~ao em perspectiva. Os AAM 2.5D utilizam um Modelo de Distribui c~ao de
Pointos (PDM) e um modelo de apar^encia 2D cujos pontos de controlo s~ao de nidos
por uma projec c~ao em perspectiva do PDM. Dois algoritmos de ajuste do modelo e as
suas aproxima c~oes e cientes s~ao propostas: Simultaneous Forwards Additive (SFA) e
o Normalization Forwards Additive (NFA). Solu c~oes robustas para o SFA e NFA, que
contemplam a oclus~ao parcial da face, s~ao igualmente propostas. Resultados extensos,
envolvendo a converg^encia de ajuste, o desempenho em imagens nunca vistas, robustez
a oclus~ao, desempenho de seguimento e estimativa de pose s~ao apresentados. A segunda parte desta da tese diz respeito os m etodos discriminativos, tais como
os Modelos Locais com Restri c~oes (CLM) ou os Modelos Activos de Forma (ASM),
onde um conjunto de detectores de caracteristicas locais est~ao restritos a pertencer ao
subespa co gerado por um PDM. O ajuste de um modelo deste tipo, envolve tipicamente
duas et apas: (1) uma pesquisa local utilizando um detector, obtendo mapas de resposta
para cada ponto de refer^encia e (2) uma estrat egia de optimiza c~ao global que encontra
os par^ametros do PDM que permitem maximizar todas as respostas conjuntamente.
Neste trabalho e proposto o Discriminative Bayesian Active Shape Models (DBASM),
uma nova estrat egia de optimiza c~ao global que utiliza uma abordagem Bayesiana, onde
a distribui c~ao a posteriori dos par^ametros de forma s~ao inferidos por meio de um
sistema din^amico linear. A abordagem DBASM modela a covari^ancia das vari aveis
latentes ou seja, e utilizado estat stica de segunda ordem na modela c~ao dos par^ametros.
Posteriormente e apresentada a formula c~ao Bayesian Active Shape Models (BASM). O
BASM e uma extens~ao do DBASM, onde a distribui c~ao a priori e explicitamente
modelada por meio de estima c~ao Bayesiana recursiva. S~ao apresentados resultados
extensos, avaliando as estrat egias de optimiza c~ao globais DBASM e BASM, detectores
locais de componentes da face, e desempenho de seguimento em v arias bases de dados
padr~ao. Resultados qualitativos extra dos da desa ante base de dados Labeled Faces
in the Wild (LFW) s~ao tamb em apresentados.
Finalmente, a ultima parte desta tese aborda o reconhecimento de id^entidade e
express~oes faciais. A geometria da face e extra da de imagens utilizando o AAM e
variedades de baixa dimensionalidade s~ao derivadas utilizando Laplacian EigenMaps
(LE), obtendo-se dois tipos de variedades, uma para representar a id^entidade e a outra
para express~oes faciais espec cas de cada pessoa. A id^entidade e o sistema de reconhecimento
de express~oes faciais utiliza uma abordagem de duas fases: Num primeiro
est agio e utilizado uma M aquina de Vectores de Suporte (SVM) para determinar a
id^entidade, dedicando-se o segundo est agio ao reconhecimento de express~oes. Este
est agio e especi co para cada pessoa e utiliza Modelos de Markov Escondidos (HMM).
S~ao mostrados resultados obtidos em pessoas exibindo as seis express~oes b asicas (alegria,
tristeza, raiva, medo, surpresa e nojo), e ainda a emo c~ao neutra
Motor Imagery Classification Based on Bilinear Sub-Manifold Learning of Symmetric Positive-Definite Matrices
In motor imagery brain-computer interfaces (BCIs), the symmetric positive-definite (SPD) covariance matrices of electroencephalogram (EEG) signals carry important discriminative information. In this paper, we intend to classify motor imagery EEG signals by exploiting the fact that the space of SPD matrices endowed with Riemannian distance is a high-dimensional Riemannian manifold. To alleviate the overfitting and heavy computation problems associated with conventional classification methods on high-dimensional manifold, we propose a framework for intrinsic sub-manifold learning from a high-dimensional Riemannian manifold. Considering a special case of SPD space, a simple yet efficient bilinear sub-manifold learning (BSML) algorithm is derived to learn the intrinsic sub-manifold by identifying a bilinear mapping that maximizes the preservation of the local geometry and global structure of the original manifold. Two BSML-based classification algorithms are further proposed to classify the data on a learned intrinsic sub-manifold. Experimental evaluation of the classification of EEG revealed that the BSML method extracts the intrinsic sub-manifold approximately 5Ă— faster and with higher classification accuracy compared with competing algorithms. The BSML also exhibited strong robustness against a small training dataset, which often occurs in BCI studies
Supervised local descriptor learning for human action recognition
Local features have been widely used in computer vision tasks, e.g., human action recognition, but it tends to be an extremely challenging task to deal with large-scale local features of high dimensionality with redundant information. In this paper, we propose a novel fully supervised local descriptor learning algorithm called discriminative embedding method based on the image-to-class distance (I2CDDE) to learn compact but highly discriminative local feature descriptors for more accurate and efficient action recognition. By leveraging the advantages of the I2C distance, the proposed I2CDDE incorporates class labels to enable fully supervised learning of local feature descriptors, which achieves highly discriminative but compact local descriptors. The objective of our I2CDDE is to minimize the I2C distances from samples to their corresponding classes while maximizing the I2C distances to the other classes in the low-dimensional space. To further improve the performance, we propose incorporating a manifold regularization based on the graph Laplacian into the objective function, which can enhance the smoothness of the embedding by extracting the local intrinsic geometrical structure. The proposed I2CDDE for the first time achieves fully supervised learning of local feature descriptors. It significantly improves the performance of I2C-based methods by increasing the discriminative ability of local features while greatly reducing the computational burden by dimensionality reduction to handle large-scale data. We apply the proposed I2CDDE algorithm to human action recognition on four widely used benchmark datasets. The results have shown that I2CDDE can significantly improve I2C-based classifiers and achieves state-of-the-art performance
Automatic handwriter identification using advanced machine learning
Handwriter identification a challenging problem especially for forensic investigation. This topic has received significant attention from the research community and several handwriter identification systems were developed for various applications including forensic science, document analysis and investigation of the historical documents. This work is part of an investigation to develop new tools and methods for Arabic palaeography, which is is the study of handwritten material, particularly ancient manuscripts with missing writers, dates, and/or places. In particular, the main aim of this research project is to investigate and develop new techniques and algorithms for the classification and analysis of ancient handwritten documents to support palaeographic studies.
Three contributions were proposed in this research. The first is concerned with the development of a text line extraction algorithm on colour and greyscale historical manuscripts. The idea uses a modified bilateral filtering approach to adaptively smooth the images while still preserving the edges through a nonlinear combination of neighboring image values. The proposed algorithm aims to compute a median and a separating seam and has been validated to deal with both greyscale and colour historical documents using different datasets. The results obtained suggest that our proposed technique yields attractive results when compared against a few similar algorithms.
The second contribution proposes to deploy a combination of Oriented Basic Image features and the concept of graphemes codebook in order to improve the recognition performances. The proposed algorithm is capable to effectively extract the most distinguishing handwriter’s patterns. The idea consists of judiciously combining a multiscale feature extraction with the concept of grapheme to allow for the extraction of several discriminating features such as handwriting curvature, direction, wrinkliness and various edge-based features. The technique was validated for identifying handwriters using both Arabic and English writings captured as scanned images using the IAM dataset for English handwriting and ICFHR 2012 dataset for Arabic handwriting. The results obtained clearly demonstrate the effectiveness of the proposed method when compared against some similar techniques.
The third contribution is concerned with an offline handwriter identification approach based on the convolutional neural network technology. At the first stage, the Alex-Net architecture was employed to learn image features (handwritten scripts) and the features obtained from the fully connected layers of the model. Then, a Support vector machine classifier is deployed to classify the writing styles of the various handwriters. In this way, the test scripts can be classified by the CNN training model for further classification. The proposed approach was evaluated based on Arabic Historical datasets; Islamic Heritage Project (IHP) and Qatar National Library (QNL). The obtained results demonstrated that the proposed model achieved superior performances when compared to some similar method
- …