47 research outputs found

    Improved Malware detection model with Apriori Association rule and particle swarm optimization

    Get PDF
    The incessant destruction and harmful tendency of malware on mobile devices has made malware detection an indispensable continuous field of research. Different matching/mismatching approaches have been adopted in the detection of malware which includes anomaly detection technique, misuse detection, or hybrid detection technique. In order to improve the detection rate of malicious application on the Android platform, a novel knowledge-based database discovery model that improves apriori association rule mining of a priori algorithm with Particle Swarm Optimization (PSO) is proposed. Particle swarm optimization (PSO) is used to optimize the random generation of candidate detectors and parameters associated with apriori algorithm (AA) for features selection. In this method, the candidate detectors generated by particle swarm optimization form rules using apriori association rule. These rule models are used together with extraction algorithm to classify and detect malicious android application. Using a number of rule detectors, the true positive rate of detecting malicious code is maximized, while the false positive rate of wrongful detection is minimized. The results of the experiments show that the proposed a priori association rule with Particle Swarm Optimization model has remarkable improvement over the existing contemporary detection models. © 2019 Olawale Surajudeen Adebayo and Normaziah Abdul Aziz

    Formulation Of Association Rule Mining (ARM) For An Effective Cyber Attack Attribution In Cyber Threat Intelligence (CTI)

    Get PDF
    In recent year, an adversary has improved their Tactic, Technique and Procedure (TTPs) in launching cyberattack that make it less predictable, more persistent, resourceful and better funded. So many organisation has opted to use Cyber Threat Intelligence (CTI) in their security posture in attributing cyberattack effectively. However, to fully leverage the massive amount of data in CTI for threat attribution, an organisation needs to spend their focus more on discovering the hidden knowledge behind the voluminous data to produce an effective cyberattack attribution. Hence this paper emphasized on the research of association analysis in CTI process for cyber attack attribution. The aim of this paper is to formulate association ruleset to perform the attribution process in the CTI. The Apriori algorithm is used to formulate association ruleset in association analysis process and is known as the CTI Association Ruleset (CTI-AR). Interestingness measure indicator specially support (s), confidence (c) and lift (l) are used to measure the practicality, validity and filtering the CTI-AR. The results showed that CTI-AR effectively identify the attributes, relationship between attributes and attribution level group of cyberattack in CTI. This research has a high potential of being expanded into cyber threat hunting process in providing a more proactive cybersecurity environment

    Security in Data Mining- A Comprehensive Survey

    Get PDF
    Data mining techniques, while allowing the individuals to extract hidden knowledge on one hand, introduce a number of privacy threats on the other hand. In this paper, we study some of these issues along with a detailed discussion on the applications of various data mining techniques for providing security. An efficient classification technique when used properly, would allow an user to differentiate between a phishing website and a normal website, to classify the users as normal users and criminals based on their activities on Social networks (Crime Profiling) and to prevent users from executing malicious codes by labelling them as malicious. The most important applications of Data mining is the detection of intrusions, where different Data mining techniques can be applied to effectively detect an intrusion and report in real time so that necessary actions are taken to thwart the attempts of the intruder. Privacy Preservation, Outlier Detection, Anomaly Detection and PhishingWebsite Classification are discussed in this paper

    Security in Data Mining-A Comprehensive Survey

    Get PDF
    Data mining techniques, while allowing the individuals to extract hidden knowledge on one hand, introduce a number of privacy threats on the other hand. In this paper, we study some of these issues along with a detailed discussion on the applications of various data mining techniques for providing security. An efficient classification technique when used properly, would allow an user to differentiate between a phishing website and a normal website, to classify the users as normal users and criminals based on their activities on Social networks (Crime Profiling) and to prevent users from executing malicious codes by labelling them as malicious. The most important applications of Data mining is the detection of intrusions, where different Data mining techniques can be applied to effectively detect an intrusion and report in real time so that necessary actions are taken to thwart the attempts of the intruder

    Explainable machine learning for malware detection on Android applications

    Get PDF
    The presence of malicious software (malware), for example, in Android applications (apps), has harmful or irreparable consequences to the user and/or the device. Despite the protections app stores provide to avoid malware, it keeps growing in sophistication and diffusion. In this paper, we explore the use of machine learning (ML) techniques to detect malware in Android apps. The focus is on the study of different data pre-processing, dimensionality reduction, and classification techniques, assessing the generalization ability of the learned models using public domain datasets and specifically developed apps. We find that the classifiers that achieve better performance for this task are support vector machines (SVM) and random forests (RF). We emphasize the use of feature selection (FS) techniques to reduce the data dimensionality and to identify the most relevant features in Android malware classification, leading to explainability on this task. Our approach can identify the most relevant features to classify an app as malware. Namely, we conclude that permissions play a prominent role in Android malware detection. The proposed approach reduces the data dimensionality while achieving high accuracy in identifying malware in Android apps.info:eu-repo/semantics/publishedVersio

    On the Use of Artificial Malicious Patterns for Android Malware Detection

    Get PDF
    International audienceMalware programs currently represent the most serious threat to computer information systems. Despite the performed efforts of researchers in this field, detection tools still have limitations for one main reason. Actually, malware developers usually use obfuscation techniques consisting in a set of transformations that make the code and/or its execution difficult to analyze by hindering both manual and automated inspections. These techniques allow the malware to escape the detection tools, and hence to be seen as a benign program. To solve the obfuscation issue, many researchers have proposed to extract frequent Application Programming Interface (API) call sequences from previously encountered malware programs using pattern mining techniques and hence, build a base of fraudulent behaviors. Based on this process, it is worth mentioning that the performance of the detection process heavily depends on the base of examples of malware behaviors; also called malware patterns. In order to deal with this shortcoming, a dynamic detection method called Artificial Malware-based Detection (AMD) is proposed in this paper. AMD makes use of not only extracted malware patterns but also artificially generated ones. The artificial malware patterns are generated using an evolutionary (genetic) algorithm. The latter evolves a population of API call sequences with the aim to find new malware behaviors following a set of well-defined evolution rules. The artificial fraudulent behaviors are subsequently inserted into the base of examples in order to enrich it with unseen malware patterns. The main motivation behind the proposed AMD approach is to diversify the base of malware examples in order to maximize the detection rate. AMD has been tested on different Android malware data sets and compared against recent prominent works using commonly employed performance metrics. The performance analysis of the obtained results shows the merits of our AMD novel approach

    Feature Selection on Permissions, Intents and APIs for Android Malware Detection

    Get PDF
    Malicious applications pose an enormous security threat to mobile computing devices. Currently 85% of all smartphones run Android, Google’s open-source operating system, making that platform the primary threat vector for malware attacks. Android is a platform that hosts roughly 99% of known malware to date, and is the focus of most research efforts in mobile malware detection due to its open source nature. One of the main tools used in this effort is supervised machine learning. While a decade of work has made a lot of progress in detection accuracy, there is an obstacle that each stream of research is forced to overcome, feature selection, i.e., determining which attributes of Android are most effective as inputs into machine learning models. This dissertation aims to address that problem by providing the community with an exhaustive analysis of the three primary types of Android features used by researchers: Permissions, Intents and API Calls. The intent of the report is not to describe a best performing feature set or a best performing machine learning model, nor to explain why certain Permissions, Intents or API Calls get selected above others, but rather to provide a holistic methodology to help guide feature selection for Android malware detection. The experiments used eleven different feature selection techniques covering filter methods, wrapper methods and embedded methods. Each feature selection technique was applied to seven different datasets based on the seven combinations available of Permissions, Intents and API Calls. Each of those seven datasets are from a base set of 119k Android apps. All of the result sets were then validated against three different machine learning models, Random Forest, SVM and a Neural Net, to test applicability across algorithm type. The experiments show that using a combination of Permissions, Intents and API Calls produced higher accuracy than using any of those alone or in any other combination and that feature selection should be performed on the combined dataset, not by feature type and then combined. The data also shows that, in general, a feature set size of 200 or more attributes is required for optimal results. Finally, the feature selection methods Relief, Correlation-based Feature Selection (CFS) and Recursive Feature Elimination (RFE) using a Neural Net are not satisfactory approaches for Android malware detection work. Based on the proposed methodology and experiments, this research provided insights into feature selection – a significant but often overlooked issue in Android malware detection. We believe the results reported herein is an important step for effective feature evaluation and selection in assisting malware detection especially for datasets with a large number of features. The methodology also has the potential to be applied to similar malware detection tasks or even in broader domains such as pattern recognition

    Machine Learning for Software Engineering: A Tertiary Study

    Full text link
    Machine learning (ML) techniques increase the effectiveness of software engineering (SE) lifecycle activities. We systematically collected, quality-assessed, summarized, and categorized 83 reviews in ML for SE published between 2009-2022, covering 6,117 primary studies. The SE areas most tackled with ML are software quality and testing, while human-centered areas appear more challenging for ML. We propose a number of ML for SE research challenges and actions including: conducting further empirical validation and industrial studies on ML; reconsidering deficient SE methods; documenting and automating data collection and pipeline processes; reexamining how industrial practitioners distribute their proprietary data; and implementing incremental ML approaches.Comment: 37 pages, 6 figures, 7 tables, journal articl

    Optimization Modeling and Machine Learning Techniques Towards Smarter Systems and Processes

    Get PDF
    The continued penetration of technology in our daily lives has led to the emergence of the concept of Internet-of-Things (IoT) systems and networks. An increasing number of enterprises and businesses are adopting IoT-based initiatives expecting that it will result in higher return on investment (ROI) [1]. However, adopting such technologies poses many challenges. One challenge is improving the performance and efficiency of such systems by properly allocating the available and scarce resources [2, 3]. A second challenge is making use of the massive amount of data generated to help make smarter and more informed decisions [4]. A third challenge is protecting such devices and systems given the surge in security breaches and attacks in recent times [5]. To that end, this thesis proposes the use of various optimization modeling and machine learning techniques in three different systems; namely wireless communication systems, learning management systems (LMSs), and computer network systems. In par- ticular, the first part of the thesis posits optimization modeling techniques to improve the aggregate throughput and power efficiency of a wireless communication network. On the other hand, the second part of the thesis proposes the use of unsupervised machine learning clustering techniques to be integrated into LMSs to identify unengaged students based on their engagement with material in an e-learning environment. Lastly, the third part of the thesis suggests the use of exploratory data analytics, unsupervised machine learning clustering, and supervised machine learning classification techniques to identify malicious/suspicious domain names in a computer network setting. The main contributions of this thesis can be divided into three broad parts. The first is developing optimal and heuristic scheduling algorithms that improve the performance of wireless systems in terms of throughput and power by combining wireless resource virtualization with device-to-device and machine-to-machine communications. The second is using unsupervised machine learning clustering and association algorithms to determine an appropriate engagement level model for blended e-learning environments and study the relationship between engagement and academic performance in such environments. The third is developing a supervised ensemble learning classifier to detect malicious/suspicious domain names that achieves high accuracy and precision
    corecore