62,273 research outputs found
Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
This work is motivated by the needs of predictive analytics on healthcare
data as represented by Electronic Medical Records. Such data is invariably
problematic: noisy, with missing entries, with imbalance in classes of
interests, leading to serious bias in predictive modeling. Since standard data
mining methods often produce poor performance measures, we argue for
development of specialized techniques of data-preprocessing and classification.
In this paper, we propose a new method to simultaneously classify large
datasets and reduce the effects of missing values. It is based on a multilevel
framework of the cost-sensitive SVM and the expected maximization imputation
method for missing values, which relies on iterated regression analyses. We
compare classification results of multilevel SVM-based algorithms on public
benchmark datasets with imbalanced classes and missing values as well as real
data in health applications, and show that our multilevel SVM-based method
produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625
Parametric Local Metric Learning for Nearest Neighbor Classification
We study the problem of learning local metrics for nearest neighbor
classification. Most previous works on local metric learning learn a number of
local unrelated metrics. While this "independence" approach delivers an
increased flexibility its downside is the considerable risk of overfitting. We
present a new parametric local metric learning method in which we learn a
smooth metric matrix function over the data manifold. Using an approximation
error bound of the metric matrix function we learn local metrics as linear
combinations of basis metrics defined on anchor points over different regions
of the instance space. We constrain the metric matrix function by imposing on
the linear combinations manifold regularization which makes the learned metric
matrix function vary smoothly along the geodesics of the data manifold. Our
metric learning method has excellent performance both in terms of predictive
power and scalability. We experimented with several large-scale classification
problems, tens of thousands of instances, and compared it with several state of
the art metric learning methods, both global and local, as well as to SVM with
automatic kernel selection, all of which it outperforms in a significant
manner
- …