18 research outputs found

    A simultaneous spam and phishing attack detection framework for short message service based on text mining approach

    Get PDF
    Short Messaging Service (SMS) is one type of many communication mediums that are used by scammers to send persuasive messages that will attract unwary recipients. In Malaysia, most sectors such as telecommunication, banking, government, healthcare, and private have taken the initiative to educate their clients about SMS scams. Unfortunately, many people still fall victim. Within the field of SMS detection, only the framework for a single attack detection for Spam has been studied. Phishing has never been studied. Existing detection frameworks are not suited to detect SMS Phishing because these attacks have their own specific behaviour and characteristic words. This gives rise to the need of producing a framework that is able to detect both attacks at the same time. This thesis addresses SMS Spam and Phishing attack detection framework development. 3 modules can be found in this framework, of which are Data Collection, Attack Profiling and Text Mining respectively. For Module 1, the data sets used in this research are from the UCI Machine Learning Repository, the Dublin Institute of Technology (DIT), British English SMS and Malay SMS. The Phishing Rule-Based algorithm is used to extract SMS Phishing. For Module 2, the SMS Attack Profiling algorithm is used in order to produce SMS Spam and Phishing words. The Text Mining module consists of several phases such as Tokenization, Lemmatization, Feature Selection and Classifier. These phases are done with the use of Rapidminer and the Weka data mining tool. Three (3) types of features are used in this framework, which are the Generic Features, Payload Features and Hybrid Features. All of these features are examined and the resulting performance metric used to compare the results is the rate of True Positive (TP) and Accuracy (A). There are four (4) set of results that were successfully obtained from this research. The first result shows that the extraction of SMS Phishing from the SMS Spam class contributes to four (4) enhanced datasets of the UCI Machine Learning Repository, the Dublin Institute of Technology (DIT), British English SMS and Malay SMS. The second results are the SMS Spam and Phishing attack profiling from the enhance UCI Machine Learning Repository, the Dublin Institute of Technology (DIT), British English SMS and Malay SMS. The third and fourth results are obtained from Feature Selection and Classifier phase where Eighty (80) experiments were done to examine the Generic Feature, Payload Features and Hybrid Features. There are five (5) Classification techniques used such as Naive Bayes, K-NN, Decision Tree, Random Tree and Decision Stump. The result of Hybrid Feature accuracy using Rapidminer and Naive Bayes technique is 77.47%, for K-NN: 78.56%, Decision Tree: 57.16%, Random Tree: 57.24% and Decision Stump: 57.16%. Meanwhile, by using Weka the Naive Bayes accuracy rate get 71.45%, K-NN: 81.64%, Decision Tree: 57.10%, Random Tree: 70.64% and Decision Stump: 60.19%. The experiments done using Rapidminer and Weka data mining tool because this is the first survey to detect SMS Spam and Phishing attack at the same time and the results are acceptable. Additionally, the proposed framework also can detect the attack simultaneously using text mining approaches

    Customer profiling using classification approach for bank telemarketing

    Get PDF
    Telemarketing is a type of direct marketing where a salesperson contacts the customers to sell products or services over the phone. The database of prospective customers comes from direct marketing database. It is important for the company to predict the set of customers with highest probability to accept the sales or offer based on their personal characteristics or behaviour during shopping. Recently, companies have started to resort to data mining approaches for customer profiling. This project focuses on helping banks to increase the accuracy of their customer profiling through classification as well as identifying a group of customers who have a high probability to subscribe to a long-term deposit. In the experiments, three classification algorithms are used, which are Naïve Bayes, Random Forest, and Decision Tree. The experiments measured accuracy percentage, precision and recall rates and showed that classification is useful for predicting customer profiles and increasing telemarketing sales

    Machine Learning-Based Distributed Denial of Service Attack Detection on Intrusion Detection System Regarding to Feature Selection

    Get PDF
    Distributed Service Denial (DDoS) is a type of network attack, which each year increases in volume and intensity.  DDoS attacks also form part of the major types of cyber security threats so far. Early detection plays a key role in avoiding the catastrophic effects on server infrastructure from DDoS attacks. Detection techniques in the traditional Intrusion Detection System (IDS) are far from perfect compared to a number of modern techniques and tools used by attackers, because the traditional IDS only uses signature-based detection or anomaly-based detection models and causes a lot of false positive flags, since the flow of computer network data packets has complex properties in terms of both size and source. Based on the  deficiency in the ordinary IDS, this study aims to detect DDoS attacks by using machine learning techniques to enhance IDS policy development.  According to the experiment the selection of features plays an important role in the precision of the detection results and in the performance of machine learning in classification problems. The combination of seven key selected dataset features used as an input neural network classifier in this study provides the highest accuracy value at 97.76%

    You tube spam comment detection using support vector machine and k–nearest neighbor

    Get PDF
    Social networking such as YouTube, Facebook and others are very popular nowadays. The best thing about YouTube is user can subscribe also giving opinion on the comment section. However, this attract the spammer by spamming the comments on that videos. Thus, this study develop a YouTube detection framework by using Support Vector Machine (SVM) and K-Nearest Neighbor (k-NN). There are five (5) phases involved in this research such as Data Collection, Pre-processing, Feature Selection, Classification and Detection. The experiments is done by using Weka and RapidMiner. The accuracy result of SVM and KNN by using both machine learning tools show good accuracy result. Others solution to avoid spam attack is trying not to click the link on comments to avoid any problem

    Feature Selection to Enhance Phishing Website Detection Based On URL Using Machine Learning Techniques

    Get PDF
    The detection of phishing websites based on machine learning has gained much attention due to its ability to detect newly generated phishing URLs. To detect phishing websites, most techniques combine URLs, web page content, and external features. However, the content of the web page and external features are time-consuming, require large computing power, and are not suitable for resource-constrained devices. To overcome this problem, this study applies feature selection techniques based on the URL to improve the detection process. The methodology for this study consists of seven stages, including data preparation, preprocessing, splitting the dataset into training and validation, feature selection, 10-fold cross-validation, validating the model, and finally performance evaluation. Two public datasets were used to validate the method. TreeSHAP and Information Gain were used to rank features and select the top 10, 15, and 20. These features are fed into three machine learning classifiers which are Naïve Bayes, Random Forest, and XGBoost. Their performance is evaluated based on accuracy, precision, and recall. As a result, the features ranked by TreeSHAP contributed most to improving detection accuracy. The highest accuracy of 98.59 percent was achieved by XGBoost for the first dataset with 15 features. For the second dataset, the highest accuracy is 90.21 percent using 20 features and Random Forest. As for Naïve Bayes, the highest accuracy recorded is 98.49 percent using the first dataset

    Feature Selection of Distributed Denial of Service (DDos) IoT Bot Attack Detection Using Machine Learning Techniques

    Get PDF
    Distributed Denial of Service (DDoS) attack can be made through numerous medium and became the one of the biggest threats for computer security. One of the most effective approaches are to develop an algorithm using Machine Learning (ML). However, low accuracy of DDoS because of feature selection classifier and time-consuming detection. This research focusses on the features selection of DDoS IoT bot attack detection using ML techniques. Two datasets from NetFlow which are NF_ToN_IoT and NF_BoT_IoT are manipulated with 2 attributes selection which are Information Gain and Gain Ratio and ranked using Ranker algorithm. These datasets are then tested using four different algorithm such as Naïve Bayes (NB). K-Nearest Neighbor (KNN), Decision Table (DT) and Random Forest (RF). The results then compared using confusion matrix evaluation Accuracy, True Positive, True Negative, Precision and Recall. The result from two datasets is selected by Top 4, Top 8 and Top 12 features selection. The best overall classifier is Naïve Bayes with the accuracy of 97.506% and 90.67% for both dataset NF_ToN_IoT and NF_BoT_IoT.&nbsp

    Classification of metamorphic virus using n-grams signatures

    Get PDF
    Metamorphic virus has a capability to change, translate, and rewrite its own code once infected the system to bypass detection. The computer system then can be seriously damage by this undetected metamorphic virus. Due to this, it is very vital to design a metamorphic virus classification model that can detect this virus. This paper focused on detection of metamorphic virus using Term Frequency Inverse Document Frequency (TF-IDF) technique. This research was conducted using Second Generation virus dataset. The first step is the classification model to cluster the metamorphic virus using TF-IDF technique. Then, the virus cluster is evaluated using Naïve Bayes algorithm in terms of accuracy using performance metric. The types of virus classes and features are extracted from bi-gram assembly language. The result shows that the proposed model was able to classify metamorphic virus using TF-IDF with optimal number of virus class with average accuracy of 94.2%

    Network monitoring system to detect unauthorized connection

    Get PDF
    The Network Monitoring System to Detect Unauthorized Connection is a network analytic tool that use to review local area network usage. The main purpose of the application is monitoring the internet protocol traffic between local area network and Internet. In addition, this system aimed to detect unauthorized Internet Protocol addresses that are inside the network range. It also can prevent network intruders from Local Area Network connection (LAN). It is a computerized system that complete with element of confidentiality, integrity and availability. The system was built using waterfall methodology that begins with system analysis, design, implementation, testing, installation and maintenance. The system is using Visual Studio 2013 with SQL Server as server operations. There are ten modules in this system which are user main page, register admin module, register staff module, login admin module, login staff module, admin menu module, staff menu module, scan view module, status view module and report module. There are about 30 respondents who agreed and satisfied with the system. As a result, this system was successfully built to detect and block the unauthorized access in the network

    IMPROVING SECURITY AND PRIVACY REQUIREMENT FOR BUSINESS REGISTRATION SYSTEM (BRS)

    Get PDF
    This paper explains the security and privacy requirement in Business Registration System (BRS) hosted at a government institution. These requirements will be used in an improvement for the system security and privacy requirement in BRS

    Data wiping tool: ByteEditor Technique

    Get PDF
    This Wiping Tool is an anti-forensic tool that is built to wipe data permanently from laptop’s storage. This tool is capable to ensure the data from being recovered with any recovery tools. The objective of building this wiping tool is to maintain the confidentiality and integrity of the data from unauthorized access. People tend to delete the file in normal way, however, the file face the risk of being recovered. Hence, the integrity and confidentiality of the deleted file cannot be protected. Through wiping tools, the files are overwritten with random strings to make the files no longer readable. Thus, the integrity and the confidentiality of the file can be protected. Regarding wiping tools, nowadays, lots of wiping tools face issue such as data breach because the wiping tools are unable to delete the data permanently from the devices. This situation might affect their main function and a threat to their users. Hence, a new wiping tool is developed to overcome the problem. A new wiping tool named Data Wiping tool is applying two wiping techniques. The first technique is Randomized Data while the next one is enhancing wiping technique, known as ByteEditor. ByteEditor is a combination of two different techniques, byte editing and byte deletion. With the implementation of Object�Oriented methodology, this wiping tool is built. This methodology consists of analyzing, designing, implementation and testing. The tool is analyzed and compared with other wiping tools before the designing of the tool start. Once the designing is done, implementation phase take place. The code of the tool is created using Visual Studio 2010 with C# language and being tested their functionality to ensure the developed tool meet the objectives of the project. This tool is believed able to contribute to the development of wiping tools and able to solve problems related to other wiping tools
    corecore