2,533 research outputs found

    Optimized Naïve Bayesian Algorithm for Efficient Performance

    Get PDF
    Naïve Bayesian algorithm is a data mining algorithm that depicts relationship between data objects using probabilistic method. Classification using Bayesian algorithm is usually done by finding the class that has the highest probability value. Data mining is a popular research area that consists of algorithm development and pattern extraction from database using different algorithms. Classification is one of the major tasks of data mining which aimed at building a model (classifier) that can be used to predict unknown class labels. There are so many algorithms for classification such as decision tree classifier, neural network, rule induction and naïve Bayesian. This paper is focused on naïve Bayesian algorithm which is a classical algorithm for classifying categorical data. It easily converged at local optima. Particle Swarm Optimization (PSO) algorithm has gained recognition in many fields of human endeavours and has been applied to enhance efficiency and accuracy in different problem domain. This paper proposed an optimized naïve Bayesian classifier using particle swarm optimization to overcome the problem of premature convergence and to improve the efficiency of the naïve Bayesian algorithm. The classification result from the optimized naïve Bayesian when compared with the traditional algorithm showed a better performance Keywords: Data Mining, Classification, Particle Swarm Optimization, Naïve Bayesian

    Machine learning approach for detection of nonTor traffic

    Get PDF
    Intrusion detection has attracted a considerable interest from researchers and industry. After many years of research the community still faces the problem of building reliable and efficient intrusion detection systems (IDS) capable of handling large quantities of data with changing patterns in real time situations. The Tor network is popular in providing privacy and security to end user by anonymizing the identity of internet users connecting through a series of tunnels and nodes. This work identifies two problems; classification of Tor traffic and nonTor traffic to expose the activities within Tor traffic that minimizes the protection of users in using the UNB-CIC Tor Network Traffic dataset and classification of the Tor traffic flow in the network. This paper proposes a hybrid classifier; Artificial Neural Network in conjunction with Correlation feature selection algorithm for dimensionality reduction and improved classification performance. The reliability and efficiency of the propose hybrid classifier is compared with Support Vector Machine and naïve Bayes classifiers in detecting nonTor traffic in UNB-CIC Tor Network Traffic dataset. Experimental results show the hybrid classifier, ANN-CFS proved a better classifier in detecting nonTor traffic and classifying the Tor traffic flow in UNB-CIC Tor Network Traffic dataset

    Applying machine learning to categorize distinct categories of network traffic

    Get PDF
    The recent rapid growth of the field of data science has made available to all fields opportunities to leverage machine learning. Computer network traffic classification has traditionally been performed using static, pre-written rules that are easily made ineffective if changes, legitimate or not, are made to the applications or protocols underlying a particular category of network traffic. This paper explores the problem of network traffic classification and analyzes the viability of having the process performed using a multitude of classical machine learning techniques against significant statistical similarities between classes of network traffic as opposed to traditional static traffic identifiers. To accomplish this, network data was captured, processed, and evaluated for 10 application labels under the categories of video conferencing, video streaming, video gaming, and web browsing as described later in Table 1. Flow-based statistical features for the dataset were derived from the network captures in accordance with the “Flow Data Feature Creation” section and were analyzed against a nearest centroid, k-nearest neighbors, Gaussian naïve Bayes, support vector machine, decision tree, random forest, and multi-layer perceptron classifier. Tools and techniques broadly available to organizations and enthusiasts were used. Observations were made on working with network data in a machine learning context, strengths and weaknesses of different models on such data, and the overall efficacy of the tested models. Ultimately, it was found that simple models freely available to anyone can achieve high accuracy, recall, and F1 scores in network traffic classification, with the best-performing model, random forest, having 89% accuracy, a macro average F1 score of .77, and a macro average recall of 76%, with the most common feature of successful classification being related to maximum packet sizes in a network flow

    A MACHINE LEARNING APPROACH FOR CLASSIFYING JAVASCRIPT USING STATIC CODE ANALYSIS

    Get PDF
    This thesis develops a machine learning approach to classify normal and anomalous JavaScript based on a static analysis of select features derived from the top 30 000 webpages on the internet. A dataset of 136 features was extracted from 100 000 raw JavaScript files. Nine test groups were created and tested using 10 subsets of features. K-means clustering was used to group the data and manually translate into binary classification. The results from the K-means clustering show moderate performance with distortions less than 1.0 from elbow plot analysis and average silhouette scores between 0.3 and 0.8 using silhouette analysis of the clustering. The classification of each JavaScript file was then examined using naïve Bayes algorithm to re-create and examine the performance of the highest performing classifiers using a less processing intensive method. Naïve Bayes was not a good model to re-create the K-means classifier. The best performing classifiers had a Matthews correlation coefficient of 0.75 when examining small JavaScript, and less that 0.38 when examining the medium or large JavaScript. The results show that most JavaScript files were small in file size, and file size was the only defining feature. No features tested effectively categorize the vast majority of JavaScript other than file size. Further research is needed to find features that more accurately encompass the majority of JavaScript to define normal JavaScript.National Security Agency, Ft. Meade, MD, 20755Lieutenant, United States NavyApproved for public release. Distribution is unlimited
    corecore