118 research outputs found
Nonlinear Supervised Dimensionality Reduction via Smooth Regular Embeddings
The recovery of the intrinsic geometric structures of data collections is an
important problem in data analysis. Supervised extensions of several manifold
learning approaches have been proposed in the recent years. Meanwhile, existing
methods primarily focus on the embedding of the training data, and the
generalization of the embedding to initially unseen test data is rather
ignored. In this work, we build on recent theoretical results on the
generalization performance of supervised manifold learning algorithms.
Motivated by these performance bounds, we propose a supervised manifold
learning method that computes a nonlinear embedding while constructing a smooth
and regular interpolation function that extends the embedding to the whole data
space in order to achieve satisfactory generalization. The embedding and the
interpolator are jointly learnt such that the Lipschitz regularity of the
interpolator is imposed while ensuring the separation between different
classes. Experimental results on several image data sets show that the proposed
method outperforms traditional classifiers and the supervised dimensionality
reduction algorithms in comparison in terms of classification accuracy in most
settings
Out-of-sample generalizations for supervised manifold learning for classification
Supervised manifold learning methods for data classification map data samples
residing in a high-dimensional ambient space to a lower-dimensional domain in a
structure-preserving way, while enhancing the separation between different
classes in the learned embedding. Most nonlinear supervised manifold learning
methods compute the embedding of the manifolds only at the initially available
training points, while the generalization of the embedding to novel points,
known as the out-of-sample extension problem in manifold learning, becomes
especially important in classification applications. In this work, we propose a
semi-supervised method for building an interpolation function that provides an
out-of-sample extension for general supervised manifold learning algorithms
studied in the context of classification. The proposed algorithm computes a
radial basis function (RBF) interpolator that minimizes an objective function
consisting of the total embedding error of unlabeled test samples, defined as
their distance to the embeddings of the manifolds of their own class, as well
as a regularization term that controls the smoothness of the interpolation
function in a direction-dependent way. The class labels of test data and the
interpolation function parameters are estimated jointly with a progressive
procedure. Experimental results on face and object images demonstrate the
potential of the proposed out-of-sample extension algorithm for the
classification of manifold-modeled data sets
Advances in Character Recognition
This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject
Semi-supervised Learning with Deterministic Labeling and Large Margin Projection
The centrality and diversity of the labeled data are very influential to the
performance of semi-supervised learning (SSL), but most SSL models select the
labeled data randomly. This study first construct a leading forest that forms a
partially ordered topological space in an unsupervised way, and select a group
of most representative samples to label with one shot (differs from active
learning essentially) using property of homeomorphism. Then a kernelized large
margin metric is efficiently learned for the selected data to classify the
remaining unlabeled sample. Optimal leading forest (OLF) has been observed to
have the advantage of revealing the difference evolution along a path within a
subtree. Therefore, we formulate an optimization problem based on OLF to select
the samples. Also with OLF, the multiple local metrics learning is facilitated
to address multi-modal and mix-modal problem in SSL, especially when the number
of class is large. Attribute to this novel design, stableness and accuracy of
the performance is significantly improved when compared with the
state-of-the-art graph SSL methods. The extensive experimental studies have
shown that the proposed method achieved encouraging accuracy and efficiency.
Code has been made available at https://github.com/alanxuji/DeLaLA.Comment: 12 pages, ready to submit to a journa
Off-line Arabic Handwriting Recognition System Using Fast Wavelet Transform
In this research, off-line handwriting recognition system for Arabic alphabet is
introduced. The system contains three main stages: preprocessing, segmentation and
recognition stage. In the preprocessing stage, Radon transform was used in the design
of algorithms for page, line and word skew correction as well as for word slant
correction. In the segmentation stage, Hough transform approach was used for line
extraction. For line to words and word to characters segmentation, a statistical method
using mathematic representation of the lines and words binary image was used.
Unlike most of current handwriting recognition system, our system simulates the
human mechanism for image recognition, where images are encoded and saved in
memory as groups according to their similarity to each other. Characters are
decomposed into a coefficient vectors, using fast wavelet transform, then, vectors,
that represent a character in different possible shapes, are saved as groups with one
representative for each group. The recognition is achieved by comparing a vector of
the character to be recognized with group representatives.
Experiments showed that the proposed system is able to achieve the recognition task
with 90.26% of accuracy. The system needs only 3.41 seconds a most to recognize a
single character in a text of 15 lines where each line has 10 words on average
- …