624 research outputs found

    An Evasion and Counter-Evasion Study in Malicious Websites Detection

    Full text link
    Malicious websites are a major cyber attack vector, and effective detection of them is an important cyber defense task. The main defense paradigm in this regard is that the defender uses some kind of machine learning algorithms to train a detection model, which is then used to classify websites in question. Unlike other settings, the following issue is inherent to the problem of malicious websites detection: the attacker essentially has access to the same data that the defender uses to train its detection models. This 'symmetry' can be exploited by the attacker, at least in principle, to evade the defender's detection models. In this paper, we present a framework for characterizing the evasion and counter-evasion interactions between the attacker and the defender, where the attacker attempts to evade the defender's detection models by taking advantage of this symmetry. Within this framework, we show that an adaptive attacker can make malicious websites evade powerful detection models, but proactive training can be an effective counter-evasion defense mechanism. The framework is geared toward the popular detection model of decision tree, but can be adapted to accommodate other classifiers

    An Evasion Attack against ML-based Phishing URL Detectors

    Full text link
    Background: Over the year, Machine Learning Phishing URL classification (MLPU) systems have gained tremendous popularity to detect phishing URLs proactively. Despite this vogue, the security vulnerabilities of MLPUs remain mostly unknown. Aim: To address this concern, we conduct a study to understand the test time security vulnerabilities of the state-of-the-art MLPU systems, aiming at providing guidelines for the future development of these systems. Method: In this paper, we propose an evasion attack framework against MLPU systems. To achieve this, we first develop an algorithm to generate adversarial phishing URLs. We then reproduce 41 MLPU systems and record their baseline performance. Finally, we simulate an evasion attack to evaluate these MLPU systems against our generated adversarial URLs. Results: In comparison to previous works, our attack is: (i) effective as it evades all the models with an average success rate of 66% and 85% for famous (such as Netflix, Google) and less popular phishing targets (e.g., Wish, JBHIFI, Officeworks) respectively; (ii) realistic as it requires only 23ms to produce a new adversarial URL variant that is available for registration with a median cost of only $11.99/year. We also found that popular online services such as Google SafeBrowsing and VirusTotal are unable to detect these URLs. (iii) We find that Adversarial training (successful defence against evasion attack) does not significantly improve the robustness of these systems as it decreases the success rate of our attack by only 6% on average for all the models. (iv) Further, we identify the security vulnerabilities of the considered MLPU systems. Our findings lead to promising directions for future research. Conclusion: Our study not only illustrate vulnerabilities in MLPU systems but also highlights implications for future study towards assessing and improving these systems.Comment: Draft for ACM TOP

    Prediction of drive-by download attacks on Twitter

    Get PDF
    The popularity of Twitter for information discovery, coupled with the automatic shortening of URLs to save space, given the 140 character limit, provides cybercriminals with an opportunity to obfuscate the URL of a malicious Web page within a tweet. Once the URL is obfuscated, the cybercriminal can lure a user to click on it with enticing text and images before carrying out a cyber attack using a malicious Web server. This is known as a drive-by download. In a drive-by download a user’s computer system is infected while interacting with the malicious endpoint, often without them being made aware the attack has taken place. An attacker can gain control of the system by exploiting unpatched system vulnerabilities and this form of attack currently represents one of the most common methods employed. In this paper we build a machine learning model using machine activity data and tweet metadata to move beyond post-execution classification of such URLs as malicious, to predict a URL will be malicious with 0.99 F-measure (using 10-fold cross-validation) and 0.833 (using an unseen test set) at 1 second into the interaction with the URL. Thus providing a basis from which to kill the connection to the server before an attack has completed and proactively blocking and preventing an attack, rather than reacting and repairing at a later date

    ANALYSIS OF CLIENT-SIDE ATTACKS THROUGH DRIVE-BY HONEYPOTS

    Get PDF
    Client-side cyberattacks on Web browsers are becoming more common relative to server-side cyberattacks. This work tested the ability of the honeypot (decoy) client software Thug to detect malicious or compromised servers that secretly download malicious files to clients, and to classify what it downloaded. Prior to using Thug we did TCP/IP fingerprinting to assess Thug’s ability to impersonate different Web browsers, and we created our own malicious Web server with some drive-by exploits to verify Thug’s functions; Thug correctly identified 85 out of 86 exploits from this server. We then tested Thug’s analysis of delivered exploits from two sets of real Web servers; one set was obtained from random Internet addresses of Web servers, and the other came from a commercial blacklist. The rates of malicious activity on 37,415 random websites and 83,667 blacklisted websites were 5.6% and 1.15%, respectively. Thug’s interaction with the blacklisted Web servers found 163 unique malware files. We demonstrated the usefulness and efficiency of client-side honeypots in analyzing harmful data presented by malicious websites. These honeypots can help government and industry defenders to proactively identify suspicious Web servers and protect users.OUSD(R&E)Outstanding ThesisLieutenant, United States NavyApproved for public release. Distribution is unlimited

    Machine learning approach for identifying suspicious uniform resource locators (URLs) on Reddit social network

    Get PDF
    The applications and advantages of the Internet for real-time information sharing can never be over-emphasized. These great benefits are too numerous to mention but they are being seriously hampered and made vulnerable due to phishing that is ravaging cyberspace. This development is, undoubtedly, frustrating the efforts of the Global Cyber Alliance – an agency with a singular purpose of reducing cyber risk. Consequently, various researchers have attempted to proffer solutions to phishing. These solutions are considered inefficient and unreliable as evident in the conflicting claims by the authors. Against this backdrop, this work has attempted to find the best approach to solving the challenge of identifying suspicious uniform resource locators (URLs) on Reddit social networks. In an effort to handle this challenge, attempts have been made to address two major problems. The first is how can the suspicious URLs be identified on Reddit social networks with machine learning techniques? And the second is how can internet users be safeguarded from unreliable and fake URLs on the Reddit social network? This work adopted six machine learning algorithms – AdaBoost, Gradient Boost, Random Forest, Linear SVM, Decision Tree, and Naïve Bayes Classifier – for training using features obtained from Reddit social network and for additional processing. A total sum of 532,403 posts were analyzed. At the end of the analysis, only 87,083 posts were considered suitable for training the models. After the experimentation, the best performing algorithm was AdaBoost with an accuracy level of 95.5% and a precision of 97.57%.publishedVersio

    Advances in Cybercrime Prediction: A Survey of Machine, Deep, Transfer, and Adaptive Learning Techniques

    Full text link
    Cybercrime is a growing threat to organizations and individuals worldwide, with criminals using increasingly sophisticated techniques to breach security systems and steal sensitive data. In recent years, machine learning, deep learning, and transfer learning techniques have emerged as promising tools for predicting cybercrime and preventing it before it occurs. This paper aims to provide a comprehensive survey of the latest advancements in cybercrime prediction using above mentioned techniques, highlighting the latest research related to each approach. For this purpose, we reviewed more than 150 research articles and discussed around 50 most recent and relevant research articles. We start the review by discussing some common methods used by cyber criminals and then focus on the latest machine learning techniques and deep learning techniques, such as recurrent and convolutional neural networks, which were effective in detecting anomalous behavior and identifying potential threats. We also discuss transfer learning, which allows models trained on one dataset to be adapted for use on another dataset, and then focus on active and reinforcement Learning as part of early-stage algorithmic research in cybercrime prediction. Finally, we discuss critical innovations, research gaps, and future research opportunities in Cybercrime prediction. Overall, this paper presents a holistic view of cutting-edge developments in cybercrime prediction, shedding light on the strengths and limitations of each method and equipping researchers and practitioners with essential insights, publicly available datasets, and resources necessary to develop efficient cybercrime prediction systems.Comment: 27 Pages, 6 Figures, 4 Table

    A First Look at the Crypto-Mining Malware Ecosystem: A Decade of Unrestricted Wealth

    Get PDF
    Illicit crypto-mining leverages resources stolen from victims to mine cryptocurrencies on behalf of criminals. While recent works have analyzed one side of this threat, i.e.: web-browser cryptojacking, only commercial reports have partially covered binary-based crypto-mining malware. In this paper, we conduct the largest measurement of crypto-mining malware to date, analyzing approximately 4.5 million malware samples (1.2 million malicious miners), over a period of twelve years from 2007 to 2019. Our analysis pipeline applies both static and dynamic analysis to extract information from the samples, such as wallet identifiers and mining pools. Together with OSINT data, this information is used to group samples into campaigns. We then analyze publicly-available payments sent to the wallets from mining-pools as a reward for mining, and estimate profits for the different campaigns. All this together is is done in a fully automated fashion, which enables us to leverage measurement-based findings of illicit crypto-mining at scale. Our profit analysis reveals campaigns with multi-million earnings, associating over 4.4% of Monero with illicit mining. We analyze the infrastructure related with the different campaigns, showing that a high proportion of this ecosystem is supported by underground economies such as Pay-Per-Install services. We also uncover novel techniques that allow criminals to run successful campaigns.Comment: A shorter version of this paper appears in the Proceedings of 19th ACM Internet Measurement Conference (IMC 2019). This is the full versio

    Reeling in Big Phish with a Deep MD5 Net

    Get PDF
    Phishing continues to grow as phishers discover new exploits and attack vectors for hosting malicious content; the traditional response using takedowns and blacklists does not appear to impede phishers significantly. A handful of law enforcement projects — for example the FBI\u27s Digital PhishNet and the Internet Crime and Complaint Center (ic3.gov) — have demonstrated that they can collect phishing data in substantial volumes, but these collections have not yet resulted in a significant decline in criminal phishing activity. In this paper, a new system is demonstrated for prioritizing investigative resources to help reduce the time and effort expended examining this particular form of online criminal activity. This research presents a means to correlate phishing websites by showing that certain websites are created by the same phishing kit. Such kits contain the content files needed to create the counterfeit website and often contain additional clues to the identity of the creators. A clustering algorithm is presented that uses collected phishing kits to establish clusters of related phishing websites. The ability to correlate websites provides law enforcement or other potential stakeholders with a means for prioritizing the allocation of limited investigative resources by identifying frequently repeating phishing offenders
    corecore