5,883 research outputs found
Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
This work is motivated by the needs of predictive analytics on healthcare
data as represented by Electronic Medical Records. Such data is invariably
problematic: noisy, with missing entries, with imbalance in classes of
interests, leading to serious bias in predictive modeling. Since standard data
mining methods often produce poor performance measures, we argue for
development of specialized techniques of data-preprocessing and classification.
In this paper, we propose a new method to simultaneously classify large
datasets and reduce the effects of missing values. It is based on a multilevel
framework of the cost-sensitive SVM and the expected maximization imputation
method for missing values, which relies on iterated regression analyses. We
compare classification results of multilevel SVM-based algorithms on public
benchmark datasets with imbalanced classes and missing values as well as real
data in health applications, and show that our multilevel SVM-based method
produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625
A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition
We study large-scale kernel methods for acoustic modeling and compare to DNNs
on performance metrics related to both acoustic modeling and recognition.
Measuring perplexity and frame-level classification accuracy, kernel-based
acoustic models are as effective as their DNN counterparts. However, on
token-error-rates DNN models can be significantly better. We have discovered
that this might be attributed to DNN's unique strength in reducing both the
perplexity and the entropy of the predicted posterior probabilities. Motivated
by our findings, we propose a new technique, entropy regularized perplexity,
for model selection. This technique can noticeably improve the recognition
performance of both types of models, and reduces the gap between them. While
effective on Broadcast News, this technique could be also applicable to other
tasks.Comment: arXiv admin note: text overlap with arXiv:1411.400
Non-convex regularization in remote sensing
In this paper, we study the effect of different regularizers and their
implications in high dimensional image classification and sparse linear
unmixing. Although kernelization or sparse methods are globally accepted
solutions for processing data in high dimensions, we present here a study on
the impact of the form of regularization used and its parametrization. We
consider regularization via traditional squared (2) and sparsity-promoting (1)
norms, as well as more unconventional nonconvex regularizers (p and Log Sum
Penalty). We compare their properties and advantages on several classification
and linear unmixing tasks and provide advices on the choice of the best
regularizer for the problem at hand. Finally, we also provide a fully
functional toolbox for the community.Comment: 11 pages, 11 figure
DISCRIMINANT STEPWISE PROCEDURE
Stepwise procedure is now probably the most popular tool for automatic feature selection. In the most cases it represents model selection approach which evaluates various feature subsets (so called wrapper). In fact it is heuristic search technique which examines the space of all possible feature subsets. This method is known in the literature under different names and variants. We organize the concepts and terminology, and show several variants of stepwise feature selection from a search strategy point of view. Short review of implementations in R will be given
- …