31 research outputs found

    Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing

    Get PDF
    Widespread adoption of cloud computing has increased the attractiveness of such services to cybercriminals. Distributed denial of service (DDoS) attacks targeting the cloud’s bandwidth, services and resources to render the cloud unavailable to both cloud providers, and users are a common form of attacks. In recent times, feature selection has been identified as a pre-processing phase in cloud DDoS attack defence which can potentially increase classification accuracy and reduce computational complexity by identifying important features from the original dataset during supervised learning. In this work, we propose an ensemble-based multi-filter feature selection method that combines the output of four filter methods to achieve an optimum selection. We then perform an extensive experimental evaluation of our proposed method using intrusion detection benchmark dataset, NSL-KDD and decision tree classifier. The findings show that our proposed method can effectively reduce the number of features from 41 to 13 and has a high detection rate and classification accuracy when compared to other classification techniques

    Ensemble Models for Intrusion Detection System Classification

    Get PDF
    Using data analytics in the problem of Intrusion Detection and Prevention Systems (IDS/IPS) is a continuous research problem due to the evolutionary nature of the problem and the changes in major influencing factors. The main challenges in this area are designing rules that can predict malware in unknown territories and dealing with the complexity of the problem and the conflicting requirements regarding high accuracy of detection and high efficiency. In this scope, we evaluated the usage of state-of-the-art ensemble learning models in improving the performance and efficiency of IDS/IPS. We compared our approaches with other existing approaches using popular open-source datasets available in this area

    МЕТОД ВИБОРУ ОЗНАК ДЛЯ СИСТЕМИ ВИЯВЛЕННЯ ВТОРГНЕНЬ З ВИКОРИСТАННЯМ АНСАМБЛЕВОГО ПІДХОДУ ТА НЕЧІТКОЇ ЛОГІКИ

    Get PDF
    The study proposed a new method of constructing a set of important features for solving classification problems. This method is based on the idea of using an ensemble of estimators of the importance of features with summarization and the final result of the ensemble with the help of fuzzy logic algorithms. Statistical criteria (chi2, f_classif, correlation coefficient), mean decrease in impurity (MDI), mutual information criterion (mutual_info_classif) were used as estimators of the importance of features. Reducing the number of features on all data sets affects the accuracy of the assessment according to the criterion of the average reduction of classification errors. As long as the group of features in the data set for training contains the first features with the greatest influence, the accuracy of the model is at the initial level, but when at least one of the features with a large impact is excluded from the model, the accuracy of the model is noticeably reduced. The best classification results for all studied data sets were provided by classifiers based on trees or nearest neighbors: DesignTreeClassifier, ExtraTreeClassifier, KNeighborsClassifier. Due to the exclusion of non-essential features from the model, a noticeable increase in the speed of learning is achieved (up to 60-70%). Ensemble learning was used to increase the accuracy of the assessment. The VotingClassifier classifier, built on the basis of algorithms with the maximum learning speed, provided the best learning speed indicators. For future work, the goal is to further improve the proposed IDS model in the direction of improving the selection of classifiers to obtain optimal results, and setting the parameters of the selected classifiers, improving the strategy of generalizing the results of individual classifiers. For the proposed model, the ability to detect individual types of attacks with multi-class prediction is of significant interest.У дослідженні був запропонований новий метод побудови набору важливих ознак для вирішення задач класифікації. Цей метод заснований на ідеє використання ансамбля оцінювачів важливості ознак з підведенням підсумків і кінцевого результату ансамбля за допомо-гою алгоритмів нечіткої логіки. В якості оцінювачів важливості ознак було використано статистичні критерії (chi2, f_classif, коефіцієнт кореляції), критерій середнього зменшення помилок класифікації (mean decrease in impurity - MDI), критерій взаємної інформації (mutual_info_classif). Зменшення кількості ознак на усіх наборах даних впливає на точність оцінювання відповідно до критерію середнього зменшення помилок класифікації. Поки група ознак в на-борі даних для навчання містить перши за списком ознаки з найбільшим впливом, точність моделі знаходиться на початковому рівні, але при виключенні з моделі хоча б однієї з ознак з великим впливом, точність моделі помітно знижується. Найкращі результати класифікації для усіх досліджених наборів даних забезпечили класифікатори на основі дерев або найближчих сусідів: DecignTreeClassifier, ExtraTreeClassifier, KNeighborsClassifier. За рахунок виключення із моделі несуттєвих ознак досягається помітне збільшення швидкості навчання (до 60-70%). Для підвищення точності оцінювання було використано ансамблеве навчання. Найкращі показники за швидкістю навчання забезпечив класифікатор VotingClassifier, побудований на базі алгоритмів з максимальною швидкістю навчання. Для майбутньої роботи метою є подальше вдосконалення запропонованої моделі IDS в напрямках вдосконалення вибору класифікаторів для отримання оптимальних результатів, та налаштування параметрів вибраних класифікаторів, удосконалення стратегії узагальнення результатів окремих класифікаторів. Для запропонованої моделі істотний інтерес представляє можливість виявлення окремих типів атак з урахуванням багатокласового прогнозування

    An Ensemble Model for Multiclass Classification and Outlier Detection Method in Data Mining

    Get PDF
    Real life world datasets exhibit a multiclass classification structure characterized by imbalance classes. Minority classes are treated as outliers’ classes. The study used cross-industry process for data mining methodology. A heterogeneous multiclass ensemble was developed by combining several strategies and ensemble techniques. The datasets used were drawn from UCI machine learning repository. Experiments for validating the model were conducted and represented in form of tables and figures. An ensemble filter selection method was developed and used for preprocessing datasets. Point-outliers were filtered using Inter quartile range filter algorithm. Datasets were resampled using Synthetic minority oversampling technique (SMOTE) algorithm. Multiclass datasets were transformed to binary classes using OnevsOne decomposing technique. An Ensemble model was developed using adaboost and random subspace algorithms utilizing random forest as the base classifier. The classifiers built were combined using voting methodology. The model was validated with classification and outlier metric performance measures such as Recall, Precision, F-measure and AUCROC values. The classifiers were evaluated using 10 fold stratified cross validation. The model showed better performance in terms of outlier detection and classification prediction for multiclass problem. The model outperformed other well-known existing classification and outlier detection algorithms such as Naïve bayes, KNN, Bagging, JRipper, Decision trees, RandomTree and Random forest. The study findings established ensemble techniques, resampling datasets and decomposing multiclass results in an improved detection of minority outlier (rare) classes. Keywords: Multiclass, Outlier, Ensemble, Model, Classification DOI: 10.7176/JIEA/9-2-04 Publication date: April 30th 2019

    An ensemble based approach for effective intrusion detection using majority voting

    Get PDF
    Of late, Network Security Research is taking center stage given the vulnerability of computing ecosystem with networking systems increasingly falling to hackers. On the network security canvas, Intrusion detection system (IDS) is an essential tool used for timely detection of cyber-attacks. A designated set of reliable safety has been put in place to check any severe damage to the network and the user base. Machine learning (ML) is being frequently used to detect intrusion owing to their understanding of intrusion detection systems in minimizing security threats. However, several single classifiers have their limitation and pose challenges to the development of effective IDS. In this backdrop, an ensemble approach has been proposed in current work to tackle the issues of single classifiers and accordingly, a highly scalable and constructive majority voting-based ensemble model was proposed which can be employed in real-time for successfully scrutinizing the network traffic to proactively warn about the possibility of attacks. By taking into consideration the properties of existing machine learning algorithms, an effective model was developed and accordingly, an accuracy of 99%, 97.2%, 97.2%, and 93.2% were obtained for DoS, Probe, R2L, and U2R attacks and thus, the proposed model is effective for identifying intrusion

    Comparative Analysis of Selected Filtered Feature Rankers Evaluators for Cyber Attacks Detection

    Get PDF
    An increase in global connectivity and rapid expansion of computer usage and computer networks has made the security of the computer system an important issue with the industries and cyber communities being faced with new kinds of attacks daily The high complexity of cyberattacks poses a great challenge to the protection of cyberinfrastructures Confidentiality Integrity and availability of sensitive information stored on it Intrusion detection systems monitors network traffic for suspicious Intrusive activity and issues alert when such activity is detected Building Intrusion detection system that is computationally efficient and effective requires the use of relevant features of the network traffics packets identified by feature selection algorithms This paper implemented K-Nearest Neighbor and Na ve Bayes Intrusion detection models using relevant features of the UNSW-NB15 Intrusion detection dataset selected by Gain Ratio Information Gain Relief F and Correlation rankers feature selection technique

    Feature selection for multiple water quality status: Integrated bootstrapping and SMOTE approach in imbalance classes

    Get PDF
    STORET is one method to determine the river water quality into four classes (very good , good, medium and bad) based on the data of water for each attribute or feature. The success of the formation of pattern recognition model much depends on the quality of data. There are two issues as the concern of this research as follows: the data having disproportionate amount among the classes (imbalance class) and the finding of noise on its attribute. Therefore, this research integrates the SMOTE Technique and bootstrapping to handle the problem of imbalance class. While an experiment is conducted to eliminate the noise on the attribute by using some feature selection algorithms with filter approach (information gain, rule, derivation, correlation and chi square). This research has some stages as follows: data understanding, pre-processing, imbalance class, feature selection, classification and performance evaluation. Based on the result of testing using 10-fold cross validation, it shows that the use of the SMOTE-bootstrapping technique is able to increase the accurate value from 83.3% to be 98.8%. While the process of noise elimination on the data attribute is also able to increase the accuracy to be 99.5% (the use of feature subset produced by the information gain algorithm and the decision tree classification algorithm)

    A Comparative Analysis for Filter-Based Feature Selection Techniques with Tree-based Classification

    Get PDF
    The selection of features is crucial as an essential pre-processing method, used in the area of research as Data Mining, Text mining, and Image Processing. Raw datasets for machine learning, comprise a combination of multidimensional attributes which have a huge amount of size. They are used for making predictions. If these datasets are used for classification, due to the majority of the presence of features that are inconsistent and redundant, it occupies more resources according to time and produces incorrect results and effects on the classification. With the intention of improving the efficiency and performance of the classification, these features have to be eliminated. A variety of feature subset selection methods had been presented to find and eliminate as many redundant and useless features as feasible. A comparative analysis for filter-based feature selection techniques with tree-based classification is done in this research work. Several feature selection techniques and classifiers are applied to different datasets using the Weka Tool. In this comparative analysis, we evaluated the performance of six different feature selection techniques and their effects on decision tree classifiers using 10-fold cross-validation on three datasets. After the analysis of the result, It has been found that the feature selection method ChiSquaredAttributeEval + Ranker search with Random Forest classifier beats other methods for effective and efficient evaluation and it is applicable to numerous real datasets in several application domain
    corecore