102 research outputs found

    Sentiment Classification of Customer Reviews about Automobiles in Roman Urdu

    Full text link
    Text mining is a broad field having sentiment mining as its important constituent in which we try to deduce the behavior of people towards a specific item, merchandise, politics, sports, social media comments, review sites etc. Out of many issues in sentiment mining, analysis and classification, one major issue is that the reviews and comments can be in different languages like English, Arabic, Urdu etc. Handling each language according to its rules is a difficult task. A lot of research work has been done in English Language for sentiment analysis and classification but limited sentiment analysis work is being carried out on other regional languages like Arabic, Urdu and Hindi. In this paper, Waikato Environment for Knowledge Analysis (WEKA) is used as a platform to execute different classification models for text classification of Roman Urdu text. Reviews dataset has been scrapped from different automobiles sites. These extracted Roman Urdu reviews, containing 1000 positive and 1000 negative reviews, are then saved in WEKA attribute-relation file format (arff) as labeled examples. Training is done on 80% of this data and rest of it is used for testing purpose which is done using different models and results are analyzed in each case. The results show that Multinomial Naive Bayes outperformed Bagging, Deep Neural Network, Decision Tree, Random Forest, AdaBoost, k-NN and SVM Classifiers in terms of more accuracy, precision, recall and F-measure.Comment: This is a pre-print of a contribution published in Advances in Intelligent Systems and Computing (editors: Kohei Arai, Supriya Kapoor and Rahul Bhatia) published by Springer, Cham. The final authenticated version is available online at: https://doi.org/10.1007/978-3-030-03405-4_4

    A Review on Features’ Robustness in High Diversity Mobile Traffic Classifications

    Get PDF
    Mobile traffics are becoming more dominant due to growing usage of mobile devices and proliferation of IoT. The influx of mobile traffics introduce some new challenges in traffic classifications; namely the diversity complexity and behavioral dynamism complexity. Existing traffic classifications methods are designed for classifying standard protocols and user applications with more deterministic behaviors in small diversity. Currently, flow statistics, payload signature and heuristic traffic attributes are some of the most effective features used to discriminate traffic classes. In this paper, we investigate the correlations of these features to the less-deterministic user application traffic classes based on corresponding classification accuracy. Then, we evaluate the impact of large-scale classification on feature's robustness based on sign of diminishing accuracy. Our experimental results consolidate the needs for unsupervised feature learning to address the dynamism of mobile application behavioral traits for accurate classification on rapidly growing mobile traffics

    Hybrid intelligent approach for network intrusion detection

    Get PDF
    In recent years, computer networks are broadly used, and they have become very complicated. A lot of sensitive information passes through various kinds of computer devices, ranging from minicomputers to servers and mobile devices. These occurring changes have led to draw the conclusion that the number of attacks on important information over the network systems is increasing with every year. Intrusion is the main threat to the network. It is defined as a series of activities aimed for exposing the security of network systems in terms of confidentiality, integrity and availability, as a result; intrusion detection is extremely important as a part of the defense. Hence, there must be substantial improvement in network intrusion detection techniques and systems. Due to the prevailing limitations of finding novel attacks, high false detection, and accuracy in previous intrusion detection approaches, this study has proposed a hybrid intelligent approach for network intrusion detection based on k-means clustering algorithm and support vector machine classification algorithm. The aim of this study is to reduce the rate of false alarm and also to improve the detection rate, comparing with the existing intrusion detection approaches. In the present study, NSL-KDD intrusion dataset has been used for training and testing the proposed approach. In order to improve classification performance, some steps have been taken beforehand. The first one is about unifying the types and filtering the dataset by data transformation. Then, a features selection algorithm is applied to remove irrelevant and noisy features for the purpose of intrusion. Feature selection has decreased the features from 41 to 21 features for intrusion detection and later normalization method is employed to perform and reduce the differences among the data. Clustering is the last step of processing before classification has been performed, using k-means algorithm. Under the purpose of classification, support vector machine have been used. After training and testing the proposed hybrid intelligent approach, the results of performance evaluation have shown that the proposed network intrusion detection has achieved high accuracy and low false detection rate. The accuracy is 96.025 percent and the false alarm is 3.715 percent

    A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in IoT backbone networks

    Get PDF
    With increasing reliance on Internet of Things (IoT) devices and services, the capability to detect intrusions and malicious activities within IoT networks is critical for resilience of the network infrastructure. In this paper, we present a novel model for intrusion detection based on two-layer dimension reduction and two-tier classification module, designed to detect malicious activities such as User to Root (U2R) and Remote to Local (R2L) attacks. The proposed model is using component analysis and linear discriminate analysis of dimension reduction module to spate the high dimensional dataset to a lower one with lesser features. We then apply a two-tier classification module utilizing NaĂŻve Bayes and Certainty Factor version of K-Nearest Neighbor to identify suspicious behaviors. The experiment results using NSL-KDD dataset shows that our model outperforms previous models designed to detect U2R and R2L attacks

    An Efficient Intrusion Detection Approach Utilizing Various WEKA Classifiers

    Get PDF
    Detection of Intrusion is an essential expertise business segment as well as a dynamic area of study and expansion caused by its requirement. Modern day intrusion detection systems still have these limitations of time sensitivity. The main requirement is to develop a system which is able of handling large volume of network data to detect attacks more accurately and proactively. Research conducted by on the KDDCUP99 dataset resulted in a various set of attributes for each of the four major attack types. Without reducing the number of features, detecting attack patterns within the data is more difficult for rule generation, forecasting, or classification. The goal of this research is to present a new method that Compare results of appropriately categorized and inaccurately categorized as proportions and the features chosen. In this research paper we explained our approach “An Efficient Intrusion Detection Approach Utilizing Various WEKA Classifiers” which is proposed to enhance the competence of recognition of intrusion employing different WEKA classifiers on processed KDDCUP99 dataset. During the experiment we employed Adaboost, J48, JRip, NaiveBayes and Random Tree classifiers to categorize the different attacks from the processed KDDCUP99. Keywords: Classifier, Data Mining, IDS, Network Security, Attacks, Cyber Securit

    Machine Learning Models for Network Intrusion Detection and Authentication of Smart Phone Users

    Get PDF
    A thesis presented to the faculty of the Elmer R. Smith College of Business and Technology at Morehead State University in partial fulfillment of the requirements for the Degree of Master of Science by S. Sareh Ahmadi on November 18, 2019

    ALEC: Active learning with ensemble of classifiers for clinical diagnosis of coronary artery disease

    Get PDF
    Invasive angiography is the reference standard for coronary artery disease (CAD) diagnosis but is expensive and associated with certain risks. Machine learning (ML) using clinical and noninvasive imaging parameters can be used for CAD diagnosis to avoid the side effects and cost of angiography. However, ML methods require labeled samples for efficient training. The labeled data scarcity and high labeling costs can be mitigated by active learning. This is achieved through selective query of challenging samples for labeling. To the best of our knowledge, active learning has not been used for CAD diagnosis yet. An Active Learning with Ensemble of Classifiers (ALEC) method is proposed for CAD diagnosis, consisting of four classifiers. Three of these classifiers determine whether a patient’s three main coronary arteries are stenotic or not. The fourth classifier predicts whether the patient has CAD or not. ALEC is first trained using labeled samples. For each unlabeled sample, if the outputs of the classifiers are consistent, the sample along with its predicted label is added to the pool of labeled samples. Inconsistent samples are manually labeled by medical experts before being added to the pool. The training is performed once more using the samples labeled so far. The interleaved phases of labeling and training are repeated until all samples are labeled. Compared with 19 other active learning algorithms, ALEC combined with a support vector machine classifier attained superior performance with 97.01% accuracy. Our method is justified mathematically as well. We also comprehensively analyze the CAD dataset used in this paper. As part of dataset analysis, features pairwise correlation is computed. The top 15 features contributing to CAD and stenosis of the three main coronary arteries are determined. The relationship between stenosis of the main arteries is presented using conditional probabilities. The effect of considering the number of stenotic arteries on sample discrimination is investigated. The discrimination power over dataset samples is visualized, assuming each of the three main coronary arteries as a sample label and considering the two remaining arteries as sample features

    Hybrid intrusion detection system based on the stacking ensemble of C5 decision tree classifier and one class support vector machine

    Get PDF
    Cyberttacks are becoming increasingly sophisticated, necessitating the efficient intrusion detection mechanisms to monitor computer resources and generate reports on anomalous or suspicious activities. Many Intrusion Detection Systems (IDSs) use a single classifier for identifying intrusions. Single classifier IDSs are unable to achieve high accuracy and low false alarm rates due to polymorphic, metamorphic, and zero-day behaviors of malware. In this paper, a Hybrid IDS (HIDS) is proposed by combining the C5 decision tree classifier and One Class Support Vector Machine (OC-SVM). HIDS combines the strengths of SIDS) and Anomaly-based Intrusion Detection System (AIDS). The SIDS was developed based on the C5.0 Decision tree classifier and AIDS was developed based on the one-class Support Vector Machine (SVM). This framework aims to identify both the well-known intrusions and zero-day attacks with high detection accuracy and low false-alarm rates. The proposed HIDS is evaluated using the benchmark datasets, namely, Network Security Laboratory-Knowledge Discovery in Databases (NSL-KDD) and Australian Defence Force Academy (ADFA) datasets. Studies show that the performance of HIDS is enhanced, compared to SIDS and AIDS in terms of detection rate and low false-alarm rates. © 2020 by the authors. Licensee MDPI, Basel, Switzerland
    • …
    corecore