29,512 research outputs found
Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
This work is motivated by the needs of predictive analytics on healthcare
data as represented by Electronic Medical Records. Such data is invariably
problematic: noisy, with missing entries, with imbalance in classes of
interests, leading to serious bias in predictive modeling. Since standard data
mining methods often produce poor performance measures, we argue for
development of specialized techniques of data-preprocessing and classification.
In this paper, we propose a new method to simultaneously classify large
datasets and reduce the effects of missing values. It is based on a multilevel
framework of the cost-sensitive SVM and the expected maximization imputation
method for missing values, which relies on iterated regression analyses. We
compare classification results of multilevel SVM-based algorithms on public
benchmark datasets with imbalanced classes and missing values as well as real
data in health applications, and show that our multilevel SVM-based method
produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625
K-nearest Neighbor Search by Random Projection Forests
K-nearest neighbor (kNN) search has wide applications in many areas,
including data mining, machine learning, statistics and many applied domains.
Inspired by the success of ensemble methods and the flexibility of tree-based
methodology, we propose random projection forests (rpForests), for kNN search.
rpForests finds kNNs by aggregating results from an ensemble of random
projection trees with each constructed recursively through a series of
carefully chosen random projections. rpForests achieves a remarkable accuracy
in terms of fast decay in the missing rate of kNNs and that of discrepancy in
the kNN distances. rpForests has a very low computational complexity. The
ensemble nature of rpForests makes it easily run in parallel on multicore or
clustered computers; the running time is expected to be nearly inversely
proportional to the number of cores or machines. We give theoretical insights
by showing the exponential decay of the probability that neighboring points
would be separated by ensemble random projection trees when the ensemble size
increases. Our theory can be used to refine the choice of random projections in
the growth of trees, and experiments show that the effect is remarkable.Comment: 15 pages, 4 figures, 2018 IEEE Big Data Conferenc
Modeling Stroke Diagnosis with the Use of Intelligent Techniques
The purpose of this work is to test the efficiency of specific intelligent classification algorithms when dealing with the domain of stroke medical diagnosis. The dataset consists of patient records of the ”Acute Stroke Unit”, Alexandra Hospital, Athens, Greece, describing patients suffering one of 5 different stroke types diagnosed by 127 diagnostic attributes / symptoms collected during the first hours of the emergency stroke situation as well as during the hospitalization and recovery phase of the patients. Prior to the application of the intelligent classifier the dimensionality of the dataset is further reduced using a variety of classic and state of the art dimensionality reductions techniques so as to capture the intrinsic dimensionality of the data. The results obtained indicate that the proposed methodology achieves prediction accuracy levels that are comparable to those obtained by intelligent classifiers trained on the original feature space
Fitting Prediction Rule Ensembles with R Package pre
Prediction rule ensembles (PREs) are sparse collections of rules, offering
highly interpretable regression and classification models. This paper presents
the R package pre, which derives PREs through the methodology of Friedman and
Popescu (2008). The implementation and functionality of package pre is
described and illustrated through application on a dataset on the prediction of
depression. Furthermore, accuracy and sparsity of PREs is compared with that of
single trees, random forest and lasso regression in four benchmark datasets.
Results indicate that pre derives ensembles with predictive accuracy comparable
to that of random forests, while using a smaller number of variables for
prediction
Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications
Wireless sensor networks monitor dynamic environments that change rapidly
over time. This dynamic behavior is either caused by external factors or
initiated by the system designers themselves. To adapt to such conditions,
sensor networks often adopt machine learning techniques to eliminate the need
for unnecessary redesign. Machine learning also inspires many practical
solutions that maximize resource utilization and prolong the lifespan of the
network. In this paper, we present an extensive literature review over the
period 2002-2013 of machine learning methods that were used to address common
issues in wireless sensor networks (WSNs). The advantages and disadvantages of
each proposed algorithm are evaluated against the corresponding problem. We
also provide a comparative guide to aid WSN designers in developing suitable
machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial
- …