24,598 research outputs found
Embedding Feature Selection for Large-scale Hierarchical Classification
Large-scale Hierarchical Classification (HC) involves datasets consisting of
thousands of classes and millions of training instances with high-dimensional
features posing several big data challenges. Feature selection that aims to
select the subset of discriminant features is an effective strategy to deal
with large-scale HC problem. It speeds up the training process, reduces the
prediction time and minimizes the memory requirements by compressing the total
size of learned model weight vectors. Majority of the studies have also shown
feature selection to be competent and successful in improving the
classification accuracy by removing irrelevant features. In this work, we
investigate various filter-based feature selection methods for dimensionality
reduction to solve the large-scale HC problem. Our experimental evaluation on
text and image datasets with varying distribution of features, classes and
instances shows upto 3x order of speed-up on massive datasets and upto 45% less
memory requirements for storing the weight vectors of learned model without any
significant loss (improvement for some datasets) in the classification
accuracy. Source Code: https://cs.gmu.edu/~mlbio/featureselection.Comment: IEEE International Conference on Big Data (IEEE BigData 2016
An Efficient Hidden Markov Model for Offline Handwritten Numeral Recognition
Traditionally, the performance of ocr algorithms and systems is based on the
recognition of isolated characters. When a system classifies an individual
character, its output is typically a character label or a reject marker that
corresponds to an unrecognized character. By comparing output labels with the
correct labels, the number of correct recognition, substitution errors
misrecognized characters, and rejects unrecognized characters are determined.
Nowadays, although recognition of printed isolated characters is performed with
high accuracy, recognition of handwritten characters still remains an open
problem in the research arena. The ability to identify machine printed
characters in an automated or a semi automated manner has obvious applications
in numerous fields. Since creating an algorithm with a one hundred percent
correct recognition rate is quite probably impossible in our world of noise and
different font styles, it is important to design character recognition
algorithms with these failures in mind so that when mistakes are inevitably
made, they will at least be understandable and predictable to the person
working with theComment: 6pages, 5 figure
Autoencoding the Retrieval Relevance of Medical Images
Content-based image retrieval (CBIR) of medical images is a crucial task that
can contribute to a more reliable diagnosis if applied to big data. Recent
advances in feature extraction and classification have enormously improved CBIR
results for digital images. However, considering the increasing accessibility
of big data in medical imaging, we are still in need of reducing both memory
requirements and computational expenses of image retrieval systems. This work
proposes to exclude the features of image blocks that exhibit a low encoding
error when learned by a autoencoder (). We examine the
histogram of autoendcoding errors of image blocks for each image class to
facilitate the decision which image regions, or roughly what percentage of an
image perhaps, shall be declared relevant for the retrieval task. This leads to
reduction of feature dimensionality and speeds up the retrieval process. To
validate the proposed scheme, we employ local binary patterns (LBP) and support
vector machines (SVM) which are both well-established approaches in CBIR
research community. As well, we use IRMA dataset with 14,410 x-ray images as
test data. The results show that the dimensionality of annotated feature
vectors can be reduced by up to 50% resulting in speedups greater than 27% at
expense of less than 1% decrease in the accuracy of retrieval when validating
the precision and recall of the top 20 hits.Comment: To appear in proceedings of The 5th International Conference on Image
Processing Theory, Tools and Applications (IPTA'15), Nov 10-13, 2015,
Orleans, Franc
- …