15,058 research outputs found
Recommended from our members
Hierarchical classification for multiple, distributed web databases
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Our research aims to provide an alternative hierarchical categorization and search capability based on a Bayesian network learning algorithm. Our proposed approach, which is grounded on automatic textual analysis of subject content of online web databases, attempts to address the database selection problem by first classifying web databases into a hierarchy of topic categories. The experimental results reported demonstrate that such a classification approach not only effectively reduces the class search space, but also helps to significantly improve the accuracy of classification performance
Intrusion Detection System using Bayesian Network Modeling
Computer Network Security has become a critical and important issue due to ever increasing cyber-crimes. Cybercrimes are spanning from simple piracy crimes to information theft in international terrorism. Defence security agencies and other militarily related organizations are highly concerned about the confidentiality and access control of the stored data. Therefore, it is really important to investigate on Intrusion Detection System (IDS) to detect and prevent cybercrimes to protect these systems. This research proposes a novel distributed IDS to detect and prevent attacks such as denial service, probes, user to root and remote to user attacks. In this work, we propose an IDS based on Bayesian network classification modelling technique. Bayesian networks are popular for adaptive learning, modelling diversity network traffic data for meaningful classification details. The proposed model has an anomaly based IDS with an adaptive learning process. Therefore, Bayesian networks have been applied to build a robust and accurate IDS. The proposed IDS has been evaluated against the KDD DAPRA dataset which was designed for network IDS evaluation. The research methodology consists of four different Bayesian networks as classification models, where each of these classifier models are interconnected and communicated to predict on incoming network traffic data. Each designed Bayesian network model is capable of detecting a major category of attack such as denial of service (DoS). However, all four Bayesian networks work together to pass the information of the classification model to calibrate the IDS system. The proposed IDS shows the ability of detecting novel attacks by continuing learning with different datasets. The testing dataset constructed by sampling the original KDD dataset to contain balance number of attacks and normal connections. The experiments show that the proposed system is effective in detecting attacks in the test dataset and is highly accurate in detecting all major attacks recorded in DARPA dataset. The proposed IDS consists with a promising approach for anomaly based intrusion detection in distributed systems. Furthermore, the practical implementation of the proposed IDS system can be utilized to train and detect attacks in live network traffi
A traffic classification method using machine learning algorithm
Applying concepts of attack investigation in IT industry, this idea has been developed to design
a Traffic Classification Method using Data Mining techniques at the intersection of Machine
Learning Algorithm, Which will classify the normal and malicious traffic. This classification will
help to learn about the unknown attacks faced by IT industry. The notion of traffic classification
is not a new concept; plenty of work has been done to classify the network traffic for
heterogeneous application nowadays. Existing techniques such as (payload based, port based
and statistical based) have their own pros and cons which will be discussed in this
literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now
Fame for sale: efficient detection of fake Twitter followers
are those Twitter accounts specifically created to
inflate the number of followers of a target account. Fake followers are
dangerous for the social platform and beyond, since they may alter concepts
like popularity and influence in the Twittersphere - hence impacting on
economy, politics, and society. In this paper, we contribute along different
dimensions. First, we review some of the most relevant existing features and
rules (proposed by Academia and Media) for anomalous Twitter accounts
detection. Second, we create a baseline dataset of verified human and fake
follower accounts. Such baseline dataset is publicly available to the
scientific community. Then, we exploit the baseline dataset to train a set of
machine-learning classifiers built over the reviewed rules and features. Our
results show that most of the rules proposed by Media provide unsatisfactory
performance in revealing fake followers, while features proposed in the past by
Academia for spam detection provide good results. Building on the most
promising features, we revise the classifiers both in terms of reduction of
overfitting and cost for gathering the data needed to compute the features. The
final result is a novel classifier, general enough to thwart
overfitting, lightweight thanks to the usage of the less costly features, and
still able to correctly classify more than 95% of the accounts of the original
training set. We ultimately perform an information fusion-based sensitivity
analysis, to assess the global sensitivity of each of the features employed by
the classifier. The findings reported in this paper, other than being supported
by a thorough experimental methodology and interesting on their own, also pave
the way for further investigation on the novel issue of fake Twitter followers
Using online linear classifiers to filter spam Emails
The performance of two online linear classifiers - the Perceptron and Littlestoneâs Winnow â is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard NaĂŻve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering
- âŠ