1,626 research outputs found
Effective classifiers for detecting objects
Several state-of-the-art machine learning classifiers are compared for the purposes of object detection in complex images, using global image features derived from the Ohta color space and Local Binary Patterns. Image complexity in this sense refers to the degree to which the target objects are occluded and/or non-dominant (i.e. not in the foreground) in the image, and also the degree to which the images are cluttered with non-target objects. The results indicate that a voting ensemble of Support Vector Machines, Random Forests, and Boosted Decision Trees provide the best performance with AUC values of up to 0.92 and Equal Error Rate accuracies of up to 85.7% in stratified 10-fold cross validation experiments on the GRAZ02 complex image dataset
Improvements on coronal hole detection in SDO/AIA images using supervised classification
We demonstrate the use of machine learning algorithms in combination with
segmentation techniques in order to distinguish coronal holes and filaments in
SDO/AIA EUV images of the Sun. Based on two coronal hole detection techniques
(intensity-based thresholding, SPoCA), we prepared data sets of manually
labeled coronal hole and filament channel regions present on the Sun during the
time range 2011 - 2013. By mapping the extracted regions from EUV observations
onto HMI line-of-sight magnetograms we also include their magnetic
characteristics. We computed shape measures from the segmented binary maps as
well as first order and second order texture statistics from the segmented
regions in the EUV images and magnetograms. These attributes were used for data
mining investigations to identify the most performant rule to differentiate
between coronal holes and filament channels. We applied several classifiers,
namely Support Vector Machine, Linear Support Vector Machine, Decision Tree,
and Random Forest and found that all classification rules achieve good results
in general, with linear SVM providing the best performances (with a true skill
statistic of ~0.90). Additional information from magnetic field data
systematically improves the performance across all four classifiers for the
SPoCA detection. Since the calculation is inexpensive in computing time, this
approach is well suited for applications on real-time data. This study
demonstrates how a machine learning approach may help improve upon an
unsupervised feature extraction method.Comment: in press for SWS
The Unbalanced Classification Problem: Detecting Breaches in Security
This research proposes several methods designed to improve solutions for security classification problems. The security classification problem involves unbalanced, high-dimensional, binary classification problems that are prevalent today. The imbalance within this data involves a significant majority of the negative class and a minority positive class. Any system that needs protection from malicious activity, intruders, theft, or other types of breaches in security must address this problem. These breaches in security are considered instances of the positive class. Given numerical data that represent observations or instances which require classification, state of the art machine learning algorithms can be applied. However, the unbalanced and high-dimensional structure of the data must be considered prior to applying these learning methods. High-dimensional data poses a “curse of dimensionality” which can be overcome through the analysis of subspaces. Exploration of intelligent subspace modeling and the fusion of subspace models is proposed. Detailed analysis of the one-class support vector machine, as well as its weaknesses and proposals to overcome these shortcomings are included. A fundamental method for evaluation of the binary classification model is the receiver operating characteristic (ROC) curve and the area under the curve (AUC). This work details the underlying statistics involved with ROC curves, contributing a comprehensive review of ROC curve construction and analysis techniques to include a novel graphic for illustrating the connection between ROC curves and classifier decision values. The major innovations of this work include synergistic classifier fusion through the analysis of ROC curves and rankings, insight into the statistical behavior of the Gaussian kernel, and novel methods for applying machine learning techniques to defend against computer intrusion detection. The primary empirical vehicle for this research is computer intrusion detection data, and both host-based intrusion detection systems (HIDS) and network-based intrusion detection systems (NIDS) are addressed. Empirical studies also include military tactical scenarios
Dont Just Divide; Polarize and Conquer!
In data containing heterogeneous subpopulations, classification performance
benefits from incorporating the knowledge of cluster structure in the
classifier. Previous methods for such combined clustering and classification
are either 1) classifier-specific and not generic, or 2) independently perform
clustering and classifier training, which may not form clusters that can
potentially benefit classifier performance. The question of how to perform
clustering to improve the performance of classifiers trained on the clusters
has received scant attention in previous literature, despite its importance in
several real-world applications. In this paper, we design a simple and
efficient classification algorithm called Clustering Aware Classification
(CAC), to find clusters that are well suited for being used as training
datasets by classifiers for each underlying subpopulation. Our experiments on
synthetic and real benchmark datasets demonstrate the efficacy of CAC over
previous methods for combined clustering and classification.Comment: 19 Pages, 5 figure
- …