536 research outputs found

    Real time detection of malicious webpages using machine learning techniques

    Get PDF
    In today's Internet, online content and especially webpages have increased exponentially. Alongside this huge rise, the number of users has also amplified considerably in the past two decades. Most responsible institutions such as banks and governments follow specific rules and regulations regarding conducts and security. But, most websites are designed and developed using little restrictions on these issues. That is why it is important to protect users from harmful webpages. Previous research has looked at to detect harmful webpages, by running the machine learning models on a remote website. The problem with this approach is that the detection rate is slow, because of the need to handle large number of webpages. There is a gap in knowledge to research into which machine learning algorithms are capable of detecting harmful web applications in real time on a local machine. The conventional method of detecting malicious webpages is going through the black list and checking whether the webpages are listed. Black list is a list of webpages which are classified as malicious from a user's point of view. These black lists are created by trusted organisations and volunteers. They are then used by modern web browsers such as Chrome, Firefox, Internet Explorer, etc. However, black list is ineffective because of the frequent-changing nature of webpages, growing numbers of webpages that pose scalability issues and the crawlers' inability to visit intranet webpages that require computer operators to login as authenticated users. The thesis proposes to use various machine learning algorithms, both supervised and unsupervised to categorise webpages based on parsing their features such as content (which played the most important role in this thesis), URL information, URL links and screenshots of webpages. The features were then converted to a format understandable by machine learning algorithms which analysed these features to make one important decision: whether a given webpage is malicious or not, using commonly available software and hardware. Prototype tools were developed to compare and analyse the efficiency of these machine learning techniques. These techniques include supervised algorithms such as Support Vector Machine, Naïve Bayes, Random Forest, Linear Discriminant Analysis, Quantitative Discriminant Analysis and Decision Tree. The unsupervised techniques are Self-Organising Map, Affinity Propagation and K-Means. Self-Organising Map was used instead of Neural Networks and the research suggests that the new version of Neural Network i.e. Deep Learning would be great for this research. The supervised algorithms performed better than the unsupervised algorithms and the best out of all these techniques is SVM that achieves 98% accuracy. The result was validated by the Chrome extension which used the classifier in real time. Unsupervised algorithms came close to supervised algorithms. This is surprising given the fact that they do not have access to the class information beforehand

    Malicious Web Sites Detection using C4.5 Decision Tree

    Get PDF
    The technology advancement poses the challenge to the cybercriminals for doing various online criminal acts, such as identity theft, extortion of money or simply, viruses and worms spreading. The common aim of the online criminals is to attract visitors to the Web site, which can be easily accessed by clicking on the URL. Blacklisting seems not to be the successful way of marking Web sites with the “bad” content, considering that many malicious Web sites are not blacklisted. The aim of this paper is to evaluate the ability of C4.5 decision tree classifier in detecting malicious Web sites, based on the features that characterize URLs. The classifier is evaluated through several performance evaluation criteria, namely accuracy, sensitivity, specificity and area under the ROC curve. C4.5 decision tree classifier achieved significant success in malicious Web sites detection, considering all four criteria (accuracy 96.5, sensitivity 96.4, specificity 96.5 and area under the curve 0.958)

    I see EK: A lightweight technique to reveal exploit kit family by overall URL patterns of infection chains

    Get PDF
    The prevalence and nonstop evolving technical sophistication of exploit kits (EKs) is one of the most challenging shifts in the modern cybercrime landscape. Over the last few years, malware infections via drive-by download attacks have been orchestrated with EK infrastructures. Malicious advertisements and compromised websites redirect victim browsers to web-based EK families that are assembled to exploit client-side vulnerabilities and finally deliver evil payloads. A key observation is that while the webpage contents have drastic differences between distinct intrusions executed through the same EK, the patterns in URL addresses stay similar. This is due to the fact that autogenerated URLs by EK platforms follow specific templates. This practice in use enables the development of an efficient system that is capable of classifying the responsible EK instances. This paper proposes novel URL features and a new technique to quickly categorize EK families with high accuracy using machine learning algorithms. Rather than analyzing each URL individually, the proposed overall URL patterns approach examines all URLs associated with an EK infection automatically. The method has been evaluated with a popular and publicly available dataset that contains 240 different real-world infection cases involving over 2250 URLs, the incidents being linked with the 4 major EK flavors that occurred throughout the year 2016. The system achieves up to 100% classification accuracy with the tested estimators

    Off-the-Hook: An Efficient and Usable Client-Side Phishing Prevention Application

    Get PDF
    Phishing is a major problem on the Web. Despite the significant attention it has received over the years, there has been no definitive solution. While the state-of-the-art solutions have reasonably good performance, they suffer from several drawbacks including potential to compromise user privacy, difficulty of detecting phishing websites whose content change dynamically, and reliance on features that are too dependent on the training data. To address these limitations we present a new approach for detecting phishing webpages in real-time as they are visited by a browser. It relies on modeling inherent phisher limitations stemming from the constraints they face while building a webpage. Consequently, the implementation of our approach, Off-the-Hook, exhibits several notable properties including high accuracy, brand-independence and good language-independence, speed of decision, resilience to dynamic phish and resilience to evolution in phishing techniques. Off-the-Hook is implemented as a fully-client-side browser add-on, which preserves user privacy. In addition, Off-the-Hook identifies the target website that a phishing webpage is attempting to mimic and includes this target in its warning. We evaluated Off-the-Hook in two different user studies. Our results show that users prefer Off-the-Hook warnings to Firefox warnings.Phishing is a major problem on the Web. Despite the significant attention it has received over the years, there has been no definitive solution. While the state-of-the-art solutions have reasonably good performance, they suffer from several drawbacks including potential to compromise user privacy, difficulty of detecting phishing websites whose content change dynamically, and reliance on features that are too dependent on the training data. To address these limitations we present a new approach for detecting phishing webpages in real-time as they are visited by a browser. It relies on modeling inherent phisher limitations stemming from the constraints they face while building a webpage. Consequently, the implementation of our approach, Off-the-Hook, exhibits several notable properties including high accuracy, brand-independence and good language-independence, speed of decision, resilience to dynamic phish and resilience to evolution in phishing techniques. Off-the-Hook is implemented as a fully-client-side browser add-on, which preserves user privacy. In addition, Off-the-Hook identifies the target website that a phishing webpage is attempting to mimic and includes this target in its warning. We evaluated Off-the-Hook in two different user studies. Our results show that users prefer Off-the-Hook warnings to Firefox warnings.Non Peer reviewe

    A survey on current malicious javascript behavior of infected web content in detection of malicious web pages

    Get PDF
    In recent years, the advance growth of cybercrime has become an urgent issue to the security authorities. With the improvement of web technologies enable attackers to launch the web-based attacks and other malicious code easily without having prior expert knowledge. Recently, JavaScript has become the most common attack construction language as it is the primary browser scripting language which allow developer to develop sophisticated client-side interfaces for web application. This lead to the growth of malicious websites and as main platform for distributing malware or malicious script to the user's computer when the user access to these webpages. Initial act and detection on such threats early in a timely manner is vital in order to reduce the damages which have caused billions of dollars lost every year. A number of approaches have been proposed to detect malicious web pages. However, the efficient detection of malicious web pages previously has generated many false alarm by the use of sophisticated obfuscation techniques in benign JavaScript code in web pages. Therefore, in this paper, a thoroughly survey and detailed understanding of malicious JavaScript code features will be provided, which have been collected from the web content. We conduct a thorough analysis and studies on the usage of different JavaScript features and JavaScript detection technique systematically and present the most important features of malicious threats in web pages. Then the analysis will be presented along with different dimensions (features representation, detection techniques analysis, and sample of malicious script)

    Unsupervised Anomaly-based Malware Detection using Hardware Features

    Get PDF
    Recent works have shown promise in using microarchitectural execution patterns to detect malware programs. These detectors belong to a class of detectors known as signature-based detectors as they catch malware by comparing a program's execution pattern (signature) to execution patterns of known malware programs. In this work, we propose a new class of detectors - anomaly-based hardware malware detectors - that do not require signatures for malware detection, and thus can catch a wider range of malware including potentially novel ones. We use unsupervised machine learning to build profiles of normal program execution based on data from performance counters, and use these profiles to detect significant deviations in program behavior that occur as a result of malware exploitation. We show that real-world exploitation of popular programs such as IE and Adobe PDF Reader on a Windows/x86 platform can be detected with nearly perfect certainty. We also examine the limits and challenges in implementing this approach in face of a sophisticated adversary attempting to evade anomaly-based detection. The proposed detector is complementary to previously proposed signature-based detectors and can be used together to improve security.Comment: 1 page, Latex; added description for feature selection in Section 4, results unchange

    Mitigating Adversarial Gray-Box Attacks Against Phishing Detectors

    Full text link
    Although machine learning based algorithms have been extensively used for detecting phishing websites, there has been relatively little work on how adversaries may attack such "phishing detectors" (PDs for short). In this paper, we propose a set of Gray-Box attacks on PDs that an adversary may use which vary depending on the knowledge that he has about the PD. We show that these attacks severely degrade the effectiveness of several existing PDs. We then propose the concept of operation chains that iteratively map an original set of features to a new set of features and develop the "Protective Operation Chain" (POC for short) algorithm. POC leverages the combination of random feature selection and feature mappings in order to increase the attacker's uncertainty about the target PD. Using 3 existing publicly available datasets plus a fourth that we have created and will release upon the publication of this paper, we show that POC is more robust to these attacks than past competing work, while preserving predictive performance when no adversarial attacks are present. Moreover, POC is robust to attacks on 13 different classifiers, not just one. These results are shown to be statistically significant at the p < 0.001 level

    A malicious URLs detection system using optimization and machine learning classifiers

    Get PDF
    The openness of the World Wide Web (Web) has become more exposed to cyber-attacks. An attacker performs the cyber-attacks on Web using malware Uniform Resource Locators (URLs) since it widely used by internet users. Therefore, a significant approach is required to detect malicious URLs and identify their nature attack. This study aims to assess the efficiency of the machine learning approach to detect and identify malicious URLs. In this study, we applied features optimization approaches by using a bio-inspired algorithm for selecting significant URL features which able to detect malicious URLs applications. By using machine learning approach with static analysis technique is used for detecting malicious URLs applications. Based on this combination as well as significant features, this paper shows promising results with higher detection accuracy. The bio-inspired algorithm: particle swarm optimization (PSO) is used to optimized URLs features. In detecting malicious URLs, it shows that naïve Bayes and support vector machine (SVM) are able to achieve high detection accuracy with rate value of 99%, using URL as a feature
    corecore