266 research outputs found

    JavaScript Metamorphic Malware Detection Using Machine Learning Techniques

    Get PDF
    Various factors like defects in the operating system, email attachments from unknown sources, downloading and installing a software from non-trusted sites make computers vulnerable to malware attacks. Current antivirus techniques lack the ability to detect metamorphic viruses, which vary the internal structure of the original malware code across various versions, but still have the exact same behavior throughout. Antivirus software typically relies on signature detection for identifying a virus, but code morphing evades signature detection quite effectively. JavaScript is used to generate metamorphic malware by changing the code’s Abstract Syntax Tree without changing the actual functionality, making it very difficult to detect by antivirus software. As JavaScript is prevalent almost everywhere, it becomes an ideal candidate language for spreading malware. This research aims to detect metamorphic malware using various machine learning models like K Nearest Neighbors, Random Forest, Support Vector Machine, and Naïve Bayes. It also aims to test the effectiveness of various morphing techniques that can be used to reduce the accuracy of the classification model. Thus, this involves improvement on both fronts of generation and detection of the malware helping antivirus software detect morphed codes with better accuracy. In this research, JavaScript based metamorphic engine reduces the accuracy of a trained malware detector. While N-gram frequency based feature vectors give good accuracy results for classifying metamorphic malware, HMM feature vectors provide the best results

    Review on Malware and Malware Detection ‎Using Data Mining Techniques

    Get PDF
    البرمجيات الخبيثة هي اي نوع من البرمجيات او شفرات برمجية التي هدفها سرقة بعض المعلومات الخاصة او بيانات من نظام الكمبيوتر او عمليات الكمبيوتر او(و) فقط ببساطة لعمل المبتغيات غير المشروعة لصانع البرامجيات الخبيثة على نظام الكمبيوتر، وبدون الرخصة من مستخدمي الكمبيوتر. البرامجيات الخبيثة للمختصر القصير تعرف كملور. ومع ذلك، اكتشاف البرامجبات الخبيثة اصبحت واحدة من اهم المشاكل في مجال امن الكمبيوتر وذلك لان بنية الاتصال الحالية غير حصينه للاختراق من قبل عدة انواع من استراتيجيات الاصابات والهجومات للبرامجيات الخبيثة. فضلا على ذلك، البرامجيات الخبيثة متنوعة ومختلفة في المقدار والنوعيات وهذا يبطل بصورة تامة فعالية طرق الحماية القديمة والتقليدية مثل طريقة التواقيع والتي تكون غير قادرة على اكتشاف البرامجيات الخبيثة الجديدة. من ناحية أخرى، هذا الضعف سوف يودي الى نجاح اختراق (والهجوم) نظام الكمبيوتر بالإضافة الى نجاح هجومات أكثر تطوراً مثل هجوم منع الخدمة الموزع. طرق تنقيب البيانات يمكن ان تستخدم لتغلب على القصور في طريقة التواقيع لاكتشاف البرامجيات الخبيثة غير المعروفة. هذا البحث يقدم نظره عامة عن البرامجيات الخبيثة وانظمة اكتشاف البرامجيات الخبيثة باستخدام التقنيات الحديثة مثل تقنيات طريقة تعدين البيانات لاكتشاف عينات البرامجيات الخبيثة المعروفة وغير المعروفة.Malicious software is any type of software or codes which hooks some: private information, data from the computer system, computer operations or(and) merely just to do malicious goals of the author on the computer system, without permission of the computer users. (The short abbreviation of malicious software is Malware). However, the detection of malware has become one of biggest issues in the computer security field because of the current communication infrastructures are vulnerable to penetration from many types of malware infection strategies and attacks.  Moreover, malwares are variant and diverse in volume and types and that strictly explode the effectiveness of traditional defense methods like signature approach, which is unable to detect a new malware. However, this vulnerability will lead to a successful computer system penetration (and attack) as well as success of more advanced attacks like distributed denial of service (DDoS) attack. Data mining methods can be used to overcome limitation of signature-based techniques to detect the zero-day malware. This paper provides an overview of malware and malware detection system using modern techniques such as techniques of data mining approach to detect known and unknown malware samples

    Applications in security and evasions in machine learning : a survey

    Get PDF
    In recent years, machine learning (ML) has become an important part to yield security and privacy in various applications. ML is used to address serious issues such as real-time attack detection, data leakage vulnerability assessments and many more. ML extensively supports the demanding requirements of the current scenario of security and privacy across a range of areas such as real-time decision-making, big data processing, reduced cycle time for learning, cost-efficiency and error-free processing. Therefore, in this paper, we review the state of the art approaches where ML is applicable more effectively to fulfill current real-world requirements in security. We examine different security applications' perspectives where ML models play an essential role and compare, with different possible dimensions, their accuracy results. By analyzing ML algorithms in security application it provides a blueprint for an interdisciplinary research area. Even with the use of current sophisticated technology and tools, attackers can evade the ML models by committing adversarial attacks. Therefore, requirements rise to assess the vulnerability in the ML models to cope up with the adversarial attacks at the time of development. Accordingly, as a supplement to this point, we also analyze the different types of adversarial attacks on the ML models. To give proper visualization of security properties, we have represented the threat model and defense strategies against adversarial attack methods. Moreover, we illustrate the adversarial attacks based on the attackers' knowledge about the model and addressed the point of the model at which possible attacks may be committed. Finally, we also investigate different types of properties of the adversarial attacks

    Performance Assessment of some Phishing predictive models based on Minimal Feature corpus

    Get PDF
    Phishing is currently one of the severest cybersecurity challenges facing the emerging online community. With damages running into millions of dollars in financial and brand losses, the sad tale of phishing activities continues unabated. This led to an arms race between the con artists and online security community which demand a constant investigation to win the cyberwar. In this paper, a new approach to phishing is investigated based on the concept of minimal feature set on some selected remarkable machine learning algorithms. The goal of this is to select and determine the most efficient machine learning methodology without undue high computational requirement usually occasioned by non-minimal feature corpus. Using the frequency analysis approach, a 13-dimensional feature set consisting of 85% URL-based feature category and 15% non-URL-based feature category was generated. This is because the URL-based features are observed to be more regularly exploited by phishers in most zero-day attacks. The proposed minimal feature set is then trained on a number of classifiers consisting of Random Tree, Decision Tree, Artificial Neural Network, Support Vector Machine and Naïve Bayes. Using 10 fold-cross validation, the approach was experimented and evaluated with a dataset consisting of 10000 phishing instances. The results indicate that Random Tree outperforms other classifiers with significant accuracy of 96.1% and a Receiver’s Operating Curve (ROC) value of 98.7%. Thus, the approach provides the performance metrics of various state of art machine learning approaches popular with phishing detection which can stimulate further deeper research work in the evaluation of other ML techniques with the minimal feature set approach

    Obfuscated computer virus detection using machine learning algorithm

    Get PDF
    Nowadays, computer virus attacks are getting very advanced. New obfuscated computer virus created by computer virus writers will generate a new shape of computer virus automatically for every single iteration and download. This constantly evolving computer virus has caused significant threat to information security of computer users, organizations and even government. However, signature based detection technique which is used by the conventional anti-computer virus software in the market fails to identify it as signatures are unavailable. This research proposed an alternative approach to the traditional signature based detection method and investigated the use of machine learning technique for obfuscated computer virus detection. In this work, text strings are used and have been extracted from virus program codes as the features to generate a suitable classifier model that can correctly classify obfuscated virus files. Text string feature is used as it is informative and potentially only use small amount of memory space. Results show that unknown files can be correctly classified with 99.5% accuracy using SMO classifier model. Thus, it is believed that current computer virus defense can be strengthening through machine learning approach

    A Machine Learning based Empirical Evaluation of Cyber Threat Actors High Level Attack Patterns over Low level Attack Patterns in Attributing Attacks

    Full text link
    Cyber threat attribution is the process of identifying the actor of an attack incident in cyberspace. An accurate and timely threat attribution plays an important role in deterring future attacks by applying appropriate and timely defense mechanisms. Manual analysis of attack patterns gathered by honeypot deployments, intrusion detection systems, firewalls, and via trace-back procedures is still the preferred method of security analysts for cyber threat attribution. Such attack patterns are low-level Indicators of Compromise (IOC). They represent Tactics, Techniques, Procedures (TTP), and software tools used by the adversaries in their campaigns. The adversaries rarely re-use them. They can also be manipulated, resulting in false and unfair attribution. To empirically evaluate and compare the effectiveness of both kinds of IOC, there are two problems that need to be addressed. The first problem is that in recent research works, the ineffectiveness of low-level IOC for cyber threat attribution has been discussed intuitively. An empirical evaluation for the measure of the effectiveness of low-level IOC based on a real-world dataset is missing. The second problem is that the available dataset for high-level IOC has a single instance for each predictive class label that cannot be used directly for training machine learning models. To address these problems in this research work, we empirically evaluate the effectiveness of low-level IOC based on a real-world dataset that is specifically built for comparative analysis with high-level IOC. The experimental results show that the high-level IOC trained models effectively attribute cyberattacks with an accuracy of 95% as compared to the low-level IOC trained models where accuracy is 40%.Comment: 20 page

    ReP-ETD: A Repetitive Preprocessing technique for Embedded Text Detection from images in spam emails

    Get PDF
    Email service proves to be a convenient and powerful communication tool. As internet continues to grow, the type of information available to user has shifted from text only to multimedia enriched. Embedded text in multimedia content is one of the prevalent means for delivering messages to content viewers. With the increasing importance of emails and the incursions of internet marketers, spam has become a major problem and has given rise to unwanted mails. Spammers are continuously adopting new techniques to evade detection. Image spam is one such technique where in embedded text within images carries the main information of the spam message instead of text based spam. Currently, image spam is evaluated to be roughly 50% of all spam traffic and is still on the rise, thus a serious research issue. Filtering mails is one of the popular approaches used to block spam mails. This work proposes new model ReP-ETD (Repetitive Pre-processing technique for Embedded Text Detection) for efficiently and accurately detecting spam in email images. The performance of the proposed ReP-ETD model has been evaluated across the identified parameters and compared with other existing models. The simulation results demonstrate the effectiveness of the proposed model

    Classifying malicious windows executables using anomaly based detection

    Get PDF
    A malicious executable is broadly defined as any program or piece of code designed to cause damage to a system or the information it contains, or to prevent the system from being used in a normal manner. A generic term used to describe any kind of malicious software is Maiware, which includes Viruses, Worms, Trojans, Backdoors, Root-kits, Spyware and Exploits. Anomaly detection is technique which builds a statistical profile of the normal and malicious data and classifies unseen data based on these two profiles. A detection system is presented here which is anomaly based and focuses on the Windows® platform. Several file infection techniques were studied to understand what particular features in the executable binary are more susceptible to being used for the malicious code propagation. A framework is presented for collecting data for both static (non-execution based) as well as dynamic (execution based) analysis of the malicious executables. Two specific features are extracted using static analysis, Windows API (from the Import Address Table of the Portable Executable Header) and the hex byte frequency count (collected using Hexdump utility) which have been explained in detail. Dynamic analysis features which were extracted are briefly mentioned and the major challenges faced using this data is explained. Classification results using Support Vector Machines for anomaly detection is shown for the two static analysis features. Experimental results have provided classification results with up to 94% accuracy for new, previously unseen executables
    corecore