65,964 research outputs found
A Multi-Feature Selection Approach for Gender Identification of Handwriting based on Kernel Mutual Information
This paper presents a new flexible approach to predict the gender of the writers from their handwriting samples. Handwriting features like slant, curvature, line separation, chain code, character shapes, and more, can be extracted from different methods. Therefore, the multi-feature sets are irrelevant and redundant. The conflict of the features exists in the sets, which affects the accuracy of classification and the computing cost. This paper proposes an approach, named Kernel Mutual Information (KMI), that focuses on feature selection. The KMI approach can decrease redundancies and conflicts. In addition, it extracts an optimal subset of features from the writing samples produced by male and female writers. To ensure that KMI can apply the various features, this paper describes the handwriting segmentation and handwritten text recognition technology used. The classification is carried out using a Support Vector Machine (SVM) on two databases. The first database comes from the ICDAR 2013 competition on gender prediction, which provides the samples in both Arabic and English. The other database contains the Registration-Document-Form (RDF) database in Chinese. The proposed and compared methods were evaluated on both databases. Results from the methods highlight the importance of feature selection for gender prediction from handwriting
An investigation of genetic algorithm-based feature selection techniques applied to keystroke dynamics biometrics
Due to the continuous use of social networks, users can be vulnerable to online situations such as paedophilia treats. One of the ways to do the
investigation of an alleged pedophile is to verify the legitimacy of the genre that
it claims. One possible technique to adopt is keystroke dynamics analysis. However, this technique can extract many attributes, causing a negative impact on
the accuracy of the classifier due to the presence of redundant and irrelevant
attributes. Thus, this work using the wrapper approach in features selection using genetic algorithms and as KNN, SVM and Naive Bayes classifiers. Bringing
as best result the SVM classifier with 90% accuracy, identifying what is most
suitable for both bases
A Multiple-Expert Binarization Framework for Multispectral Images
In this work, a multiple-expert binarization framework for multispectral
images is proposed. The framework is based on a constrained subspace selection
limited to the spectral bands combined with state-of-the-art gray-level
binarization methods. The framework uses a binarization wrapper to enhance the
performance of the gray-level binarization. Nonlinear preprocessing of the
individual spectral bands is used to enhance the textual information. An
evolutionary optimizer is considered to obtain the optimal and some suboptimal
3-band subspaces from which an ensemble of experts is then formed. The
framework is applied to a ground truth multispectral dataset with promising
results. In addition, a generalization to the cross-validation approach is
developed that not only evaluates generalizability of the framework, it also
provides a practical instance of the selected experts that could be then
applied to unseen inputs despite the small size of the given ground truth
dataset.Comment: 12 pages, 8 figures, 6 tables. Presented at ICDAR'1
Recommended from our members
Hierarchical classification for multiple, distributed web databases
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Our research aims to provide an alternative hierarchical categorization and search capability based on a Bayesian network learning algorithm. Our proposed approach, which is grounded on automatic textual analysis of subject content of online web databases, attempts to address the database selection problem by first classifying web databases into a hierarchy of topic categories. The experimental results reported demonstrate that such a classification approach not only effectively reduces the class search space, but also helps to significantly improve the accuracy of classification performance
Distributed top-k aggregation queries at large
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network
A hybrid algorithm for Bayesian network structure learning with application to multi-label learning
We present a novel hybrid algorithm for Bayesian network structure learning,
called H2PC. It first reconstructs the skeleton of a Bayesian network and then
performs a Bayesian-scoring greedy hill-climbing search to orient the edges.
The algorithm is based on divide-and-conquer constraint-based subroutines to
learn the local structure around a target variable. We conduct two series of
experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is
currently the most powerful state-of-the-art algorithm for Bayesian network
structure learning. First, we use eight well-known Bayesian network benchmarks
with various data sizes to assess the quality of the learned structure returned
by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in
terms of goodness of fit to new data and quality of the network structure with
respect to the true dependence structure of the data. Second, we investigate
H2PC's ability to solve the multi-label learning problem. We provide
theoretical results to characterize and identify graphically the so-called
minimal label powersets that appear as irreducible factors in the joint
distribution under the faithfulness condition. The multi-label learning problem
is then decomposed into a series of multi-class classification problems, where
each multi-class variable encodes a label powerset. H2PC is shown to compare
favorably to MMHC in terms of global classification accuracy over ten
multi-label data sets covering different application domains. Overall, our
experiments support the conclusions that local structural learning with H2PC in
the form of local neighborhood induction is a theoretically well-motivated and
empirically effective learning framework that is well suited to multi-label
learning. The source code (in R) of H2PC as well as all data sets used for the
empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author
- …