1,520 research outputs found
The Unbalanced Classification Problem: Detecting Breaches in Security
This research proposes several methods designed to improve solutions for security classification problems. The security classification problem involves unbalanced, high-dimensional, binary classification problems that are prevalent today. The imbalance within this data involves a significant majority of the negative class and a minority positive class. Any system that needs protection from malicious activity, intruders, theft, or other types of breaches in security must address this problem. These breaches in security are considered instances of the positive class. Given numerical data that represent observations or instances which require classification, state of the art machine learning algorithms can be applied. However, the unbalanced and high-dimensional structure of the data must be considered prior to applying these learning methods. High-dimensional data poses a “curse of dimensionality” which can be overcome through the analysis of subspaces. Exploration of intelligent subspace modeling and the fusion of subspace models is proposed. Detailed analysis of the one-class support vector machine, as well as its weaknesses and proposals to overcome these shortcomings are included. A fundamental method for evaluation of the binary classification model is the receiver operating characteristic (ROC) curve and the area under the curve (AUC). This work details the underlying statistics involved with ROC curves, contributing a comprehensive review of ROC curve construction and analysis techniques to include a novel graphic for illustrating the connection between ROC curves and classifier decision values. The major innovations of this work include synergistic classifier fusion through the analysis of ROC curves and rankings, insight into the statistical behavior of the Gaussian kernel, and novel methods for applying machine learning techniques to defend against computer intrusion detection. The primary empirical vehicle for this research is computer intrusion detection data, and both host-based intrusion detection systems (HIDS) and network-based intrusion detection systems (NIDS) are addressed. Empirical studies also include military tactical scenarios
The Effectiveness of Using Diversity to Select Multiple Classifier Systems with Varying Classification Thresholds
In classification applications, the goal of fusion techniques is to exploit complementary approaches and merge the information provided by these methods to provide a solution superior than any single method. Associated with choosing a methodology to fuse pattern recognition algorithms is the choice of algorithm or algorithms to fuse. Historically, classifier ensemble accuracy has been used to select which pattern recognition algorithms are included in a multiple classifier system. More recently, research has focused on creating and evaluating diversity metrics to more effectively select ensemble members. Using a wide range of classification data sets, methodologies, and fusion techniques, current diversity research is extended by expanding classifier domains before employing fusion methodologies. The expansion is made possible with a unique classification score algorithm developed for this purpose. Correlation and linear regression techniques reveal that the relationship between diversity metrics and accuracy is tenuous and optimal ensemble selection should be based on ensemble accuracy. The strengths and weaknesses of popular diversity metrics are examined in the context of the information they provide with respect to changing classification thresholds and accuracies
Different approaches for the detection of SSH anomalous connections
The Secure Shell Protocol (SSH) is a well-known standard protocol, mainly used for remotely accessing shell accounts on Unix-like operating systems to perform administrative tasks. As a result, the SSH service has been an appealing target for attackers, aiming to guess root passwords performing dictionary attacks or to directly exploit the service itself. To identify such situations, this article addresses the detection of SSH anomalous connections from an intrusion detection perspective. The main idea is to compare several strategies and approaches for a better detection of SSH-based attacks. To test the classification performance of different classifiers and combinations of them, SSH data coming from a real-world honeynet are gathered and analysed. For comparison purposes and to draw conclusions about data collection, both packet-based and flow data are analysed. A wide range of classifiers and ensembles are applied to these data, as well as different validation schemes for better analysis of the obtained results. The high-rate classification results lead to positive conclusions about the identification of malicious SSH connections
Competitive Learning Neural Network Ensemble Weighted by Predicted Performance
Ensemble approaches have been shown to enhance classification by combining the outputs from a set of voting classifiers. Diversity in error patterns among base classifiers promotes ensemble performance. Multi-task learning is an important characteristic for Neural Network classifiers. Introducing a secondary output unit that receives different training signals for base networks in an ensemble can effectively promote diversity and improve ensemble performance. Here a Competitive Learning Neural Network Ensemble is proposed where a secondary output unit predicts the classification performance of the primary output unit in each base network. The networks compete with each other on the basis of classification performance and partition the stimulus space. The secondary units adaptively receive different training signals depending on the competition. As the result, each base network develops ¡°preference¡± over different regions of the stimulus space as indicated by their secondary unit outputs. To form an ensemble decision, all base networks¡¯ primary unit outputs are combined and weighted according to the secondary unit outputs. The effectiveness of the proposed approach is demonstrated with the experiments on one real-world and four artificial classification problems
Deep Neural Ensemble for Retinal Vessel Segmentation in Fundus Images towards Achieving Label-free Angiography
Automated segmentation of retinal blood vessels in label-free fundus images
entails a pivotal role in computed aided diagnosis of ophthalmic pathologies,
viz., diabetic retinopathy, hypertensive disorders and cardiovascular diseases.
The challenge remains active in medical image analysis research due to varied
distribution of blood vessels, which manifest variations in their dimensions of
physical appearance against a noisy background.
In this paper we formulate the segmentation challenge as a classification
task. Specifically, we employ unsupervised hierarchical feature learning using
ensemble of two level of sparsely trained denoised stacked autoencoder. First
level training with bootstrap samples ensures decoupling and second level
ensemble formed by different network architectures ensures architectural
revision. We show that ensemble training of auto-encoders fosters diversity in
learning dictionary of visual kernels for vessel segmentation. SoftMax
classifier is used for fine tuning each member auto-encoder and multiple
strategies are explored for 2-level fusion of ensemble members. On DRIVE
dataset, we achieve maximum average accuracy of 95.33\% with an impressively
low standard deviation of 0.003 and Kappa agreement coefficient of 0.708 .
Comparison with other major algorithms substantiates the high efficacy of our
model.Comment: Accepted as a conference paper at IEEE EMBC, 201
Fair comparison of skin detection approaches on publicly available datasets
Skin detection is the process of discriminating skin and non-skin regions in
a digital image and it is widely used in several applications ranging from hand
gesture analysis to track body parts and face detection. Skin detection is a
challenging problem which has drawn extensive attention from the research
community, nevertheless a fair comparison among approaches is very difficult
due to the lack of a common benchmark and a unified testing protocol. In this
work, we investigate the most recent researches in this field and we propose a
fair comparison among approaches using several different datasets. The major
contributions of this work are an exhaustive literature review of skin color
detection approaches, a framework to evaluate and combine different skin
detector approaches, whose source code is made freely available for future
research, and an extensive experimental comparison among several recent methods
which have also been used to define an ensemble that works well in many
different problems. Experiments are carried out in 10 different datasets
including more than 10000 labelled images: experimental results confirm that
the best method here proposed obtains a very good performance with respect to
other stand-alone approaches, without requiring ad hoc parameter tuning. A
MATLAB version of the framework for testing and of the methods proposed in this
paper will be freely available from https://github.com/LorisNann
Improving ECG Classification Accuracy Using an Ensemble of Neural Network Modules
This paper illustrates the use of a combined neural network model based on Stacked Generalization method for classification of electrocardiogram (ECG) beats. In conventional Stacked Generalization method, the combiner learns to map the base classifiers' outputs to the target data. We claim adding the input pattern to the base classifiers' outputs helps the combiner to obtain knowledge about the input space and as the result, performs better on the same task. Experimental results support our claim that the additional knowledge according to the input space, improves the performance of the proposed method which is called Modified Stacked Generalization. In particular, for classification of 14966 ECG beats that were not previously seen during training phase, the Modified Stacked Generalization method reduced the error rate for 12.41% in comparison with the best of ten popular classifier fusion methods including Max, Min, Average, Product, Majority Voting, Borda Count, Decision Templates, Weighted Averaging based on Particle Swarm Optimization and Stacked Generalization
- …