7,113 research outputs found

    Classification hardness for supervised learners on 20 years of intrusion detection data

    Get PDF
    This article consolidates analysis of established (NSL-KDD) and new intrusion detection datasets (ISCXIDS2012, CICIDS2017, CICIDS2018) through the use of supervised machine learning (ML) algorithms. The uniformity in analysis procedure opens up the option to compare the obtained results. It also provides a stronger foundation for the conclusions about the efficacy of supervised learners on the main classification task in network security. This research is motivated in part to address the lack of adoption of these modern datasets. Starting with a broad scope that includes classification by algorithms from different families on both established and new datasets has been done to expand the existing foundation and reveal the most opportune avenues for further inquiry. After obtaining baseline results, the classification task was increased in difficulty, by reducing the available data to learn from, both horizontally and vertically. The data reduction has been included as a stress-test to verify if the very high baseline results hold up under increasingly harsh constraints. Ultimately, this work contains the most comprehensive set of results on the topic of intrusion detection through supervised machine learning. Researchers working on algorithmic improvements can compare their results to this collection, knowing that all results reported here were gathered through a uniform framework. This work's main contributions are the outstanding classification results on the current state of the art datasets for intrusion detection and the conclusion that these methods show remarkable resilience in classification performance even when aggressively reducing the amount of data to learn from

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Severity Analysis of Large Truck Crashes- Comparision Between the Regression Modeling Methods with Machine Learning Methods.

    Get PDF
    According to the Texas Department of Transportation’s Texas Motor Vehicle Crash Statistics, Texas has had the highest number of severe crashes involving large trucks in the US. As defined by the US Department of Transportation, a large truck is any vehicle with a gross vehicle weight rating greater than 10,000 pounds. Generally, it requires more time and much more space for large trucks to accelerating, slowing down, and stopping. Also, there will be large blind spots when large trucks make wide turns. Therefore, if an unexpected traffic situation comes upon, It would be more difficult for large trucks to take evasive actions than regular vehicles to avoid a collision. Due to their large size and heavy weight, large truck crashes often result in huge economic and social costs. Predicting the severity level of a reported large truck crash with unknown severity or of the severity of crashes that may be expected to occur sometime in the future is useful. It can help to prevent the crash from happening or help rescue teams and hospitals provide proper medical care as fast as possible. To identify the appropriate modeling approaches for predicting the severity of large truck crash, in this research, four representative classification tree-based ML models (e.g., Extreme Gradient Boosting tree (XGBoost), Adaptive Boosting tree(AdaBoost), Random Forest (RF), Gradient Boost Decision Tree (GBDT)), two non-tree-based ML models (e.g., the Support Vector Machines (SVM), k-Nearest Neighbors (kNN)), and LR model were selected. The results indicate that the GBDT model performs best among all of seven models

    Representing complex data using localized principal components with application to astronomical data

    Full text link
    Often the relation between the variables constituting a multivariate data space might be characterized by one or more of the terms: ``nonlinear'', ``branched'', ``disconnected'', ``bended'', ``curved'', ``heterogeneous'', or, more general, ``complex''. In these cases, simple principal component analysis (PCA) as a tool for dimension reduction can fail badly. Of the many alternative approaches proposed so far, local approximations of PCA are among the most promising. This paper will give a short review of localized versions of PCA, focusing on local principal curves and local partitioning algorithms. Furthermore we discuss projections other than the local principal components. When performing local dimension reduction for regression or classification problems it is important to focus not only on the manifold structure of the covariates, but also on the response variable(s). Local principal components only achieve the former, whereas localized regression approaches concentrate on the latter. Local projection directions derived from the partial least squares (PLS) algorithm offer an interesting trade-off between these two objectives. We apply these methods to several real data sets. In particular, we consider simulated astrophysical data from the future Galactic survey mission Gaia.Comment: 25 pages. In "Principal Manifolds for Data Visualization and Dimension Reduction", A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (eds), Lecture Notes in Computational Science and Engineering, Springer, 2007, pp. 180--204, http://www.springer.com/dal/home/generic/search/results?SGWID=1-40109-22-173750210-
    corecore