354 research outputs found

    PDF-Malware Detection: A Survey and Taxonomy of Current Techniques

    Get PDF
    Portable Document Format, more commonly known as PDF, has become, in the last 20 years, a standard for document exchange and dissemination due its portable nature and widespread adoption. The flexibility and power of this format are not only leveraged by benign users, but from hackers as well who have been working to exploit various types of vulnerabilities, overcome security restrictions, and then transform the PDF format in one among the leading malicious code spread vectors. Analyzing the content of malicious PDF files to extract the main features that characterize the malware identity and behavior, is a fundamental task for modern threat intelligence platforms that need to learn how to automatically identify new attacks. This paper surveys existing state of the art about systems for the detection of malicious PDF files and organizes them in a taxonomy that separately considers the used approaches and the data analyzed to detect the presence of malicious code. © Springer International Publishing AG, part of Springer Nature 2018

    JavaScript Metamorphic Malware Detection Using Machine Learning Techniques

    Get PDF
    Various factors like defects in the operating system, email attachments from unknown sources, downloading and installing a software from non-trusted sites make computers vulnerable to malware attacks. Current antivirus techniques lack the ability to detect metamorphic viruses, which vary the internal structure of the original malware code across various versions, but still have the exact same behavior throughout. Antivirus software typically relies on signature detection for identifying a virus, but code morphing evades signature detection quite effectively. JavaScript is used to generate metamorphic malware by changing the code’s Abstract Syntax Tree without changing the actual functionality, making it very difficult to detect by antivirus software. As JavaScript is prevalent almost everywhere, it becomes an ideal candidate language for spreading malware. This research aims to detect metamorphic malware using various machine learning models like K Nearest Neighbors, Random Forest, Support Vector Machine, and Naïve Bayes. It also aims to test the effectiveness of various morphing techniques that can be used to reduce the accuracy of the classification model. Thus, this involves improvement on both fronts of generation and detection of the malware helping antivirus software detect morphed codes with better accuracy. In this research, JavaScript based metamorphic engine reduces the accuracy of a trained malware detector. While N-gram frequency based feature vectors give good accuracy results for classifying metamorphic malware, HMM feature vectors provide the best results

    Detection and Analysis of Drive-by Downloads and Malicious Websites

    Get PDF
    A drive by download is a download that occurs without users action or knowledge. It usually triggers an exploit of vulnerability in a browser to downloads an unknown file. The malicious program in the downloaded file installs itself on the victims machine. Moreover, the downloaded file can be camouflaged as an installer that would further install malicious software. Drive by downloads is a very good example of the exponential increase in malicious activity over the Internet and how it affects the daily use of the web. In this paper, we try to address the problem caused by drive by downloads from different standpoints. We provide in depth understanding of the difficulties in dealing with drive by downloads and suggest appropriate solutions. We propose machine learning and feature selection solutions to remedy the the drive-by download problem. Experimental results reported 98.2% precision, 98.2% F-Measure and 97.2% ROC area

    XSS attack detection based on machine learning

    Get PDF
    As the popularity of web-based applications grows, so does the number of individuals who use them. The vulnerabilities of those programs, however, remain a concern. Cross-site scripting is a very prevalent assault that is simple to launch but difficult to defend against. That is why it is being studied. The current study focuses on artificial systems, such as machine learning, which can function without human interaction. As technology advances, the need for maintenance is increasing. Those maintenance systems, on the other hand, are becoming more complex. This is why machine learning technologies are becoming increasingly important in our daily lives. This study use supervised machine learning to protect against cross-site scripting, which allows the computer to find an algorithm that can identify vulnerabilities. A large collection of datasets serves as the foundation for this technique. The model will be equipped with functions extracted from datasets that will allow it to learn the model of such an attack by filtering it using common Javascript symbols or possible Document Object Model (DOM) syntax. As long as the research continues, the best conjugate algorithms will be discovered that can successfully fight against cross-site scripting. It will do multiple comparisons between different classification methods on their own or in combination to determine which one performs the best.À medida que a popularidade dos aplicativos da internet cresce, aumenta também o número de indivíduos que os utilizam. No entanto, as vulnerabilidades desses programas continuam a ser uma preocupação para o uso da internet no dia-a-dia. O cross-site scripting é um ataque muito comum que é simples de lançar, mas difícil de-se defender. Por isso, é importante que este ataque possa ser estudado. A tese atual concentra-se em sistemas baseados na utilização de inteligência artificial e Aprendizagem Automática (ML), que podem funcionar sem interação humana. À medida que a tecnologia avança, a necessidade de manutenção também vai aumentando. Por outro lado, estes sistemas vão tornando-se cada vez mais complexos. É, por isso, que as técnicas de machine learning torna-se cada vez mais importantes nas nossas vidas diárias. Este trabalho baseia-se na utilização de Aprendizagem Automática para proteger contra o ataque cross-site scripting, o que permite ao computador encontrar um algoritmo que tem a possibilidade de identificar as vulnerabilidades. Uma grande coleção de conjuntos de dados serve como a base para a abordagem proposta. A máquina virá ser equipada com o processamento de linguagem natural, o que lhe permite a aprendizagem do padrão de tal ataque e filtrando-o com o uso da mesma linguagem, javascript, que é possível usar para controlar os objectos DOM (Document Object Model). Enquanto a pesquisa continua, os melhores algoritmos conjugados serão descobertos para que possam prever com sucesso contra estes ataques. O estudo fará várias comparações entre diferentes métodos de classificação por si só ou em combinação para determinar o que tiver melhor desempenho

    DeMalFier: Detection of Malicious web pages using an effective classifier

    Get PDF
    The web has become an indispensable global platform that glues together daily communication, sharing, trading, collaboration and service delivery. Web users often store and manage critical information that attracts cybercriminals who misuse the web and the internet to exploit vulnerabilities for illegitimate benefits. Malicious web pages are transpiring threatening issue over the internet becaus

    JABBERWOCK: A Tool for WebAssembly Dataset Generation and Its Application to Malicious Website Detection

    Full text link
    Machine learning is often used for malicious website detection, but an approach incorporating WebAssembly as a feature has not been explored due to a limited number of samples, to the best of our knowledge. In this paper, we propose JABBERWOCK (JAvascript-Based Binary EncodeR by WebAssembly Optimization paCKer), a tool to generate WebAssembly datasets in a pseudo fashion via JavaScript. Loosely speaking, JABBERWOCK automatically gathers JavaScript code in the real world, convert them into WebAssembly, and then outputs vectors of the WebAssembly as samples for malicious website detection. We also conduct experimental evaluations of JABBERWOCK in terms of the processing time for dataset generation, comparison of the generated samples with actual WebAssembly samples gathered from the Internet, and an application for malicious website detection. Regarding the processing time, we show that JABBERWOCK can construct a dataset in 4.5 seconds per sample for any number of samples. Next, comparing 10,000 samples output by JABBERWOCK with 168 gathered WebAssembly samples, we believe that the generated samples by JABBERWOCK are similar to those in the real world. We then show that JABBERWOCK can provide malicious website detection with 99\% F1-score because JABBERWOCK makes a gap between benign and malicious samples as the reason for the above high score. We also confirm that JABBERWOCK can be combined with an existing malicious website detection tool to improve F1-scores. JABBERWOCK is publicly available via GitHub (https://github.com/c-chocolate/Jabberwock).Comment: Accepted in DCDS 2023 (co-located in DSN 2023

    I see EK: A lightweight technique to reveal exploit kit family by overall URL patterns of infection chains

    Get PDF
    The prevalence and nonstop evolving technical sophistication of exploit kits (EKs) is one of the most challenging shifts in the modern cybercrime landscape. Over the last few years, malware infections via drive-by download attacks have been orchestrated with EK infrastructures. Malicious advertisements and compromised websites redirect victim browsers to web-based EK families that are assembled to exploit client-side vulnerabilities and finally deliver evil payloads. A key observation is that while the webpage contents have drastic differences between distinct intrusions executed through the same EK, the patterns in URL addresses stay similar. This is due to the fact that autogenerated URLs by EK platforms follow specific templates. This practice in use enables the development of an efficient system that is capable of classifying the responsible EK instances. This paper proposes novel URL features and a new technique to quickly categorize EK families with high accuracy using machine learning algorithms. Rather than analyzing each URL individually, the proposed overall URL patterns approach examines all URLs associated with an EK infection automatically. The method has been evaluated with a popular and publicly available dataset that contains 240 different real-world infection cases involving over 2250 URLs, the incidents being linked with the 4 major EK flavors that occurred throughout the year 2016. The system achieves up to 100% classification accuracy with the tested estimators

    Performance Evaluation of Machine Learning Techniques for Identifying Forged and Phony Uniform Resource Locators (URLs)

    Get PDF
    Since the invention of Information and Communication Technology (ICT), there has been a great shift from the erstwhile traditional approach of handling information across the globe to the usage of this innovation. The application of this initiative cut across almost all areas of human endeavours. ICT is widely utilized in education and production sectors as well as in various financial institutions. It is of note that many people are using it genuinely to carry out their day to day activities while others are using it to perform nefarious activities at the detriment of other cyber users. According to several reports which are discussed in the introductory part of this work, millions of people have become victims of fake Uniform Resource Locators (URLs) sent to their mails by spammers. Financial institutions are not left out in the monumental loss recorded through this illicit act over the years. It is worth mentioning that, despite several approaches currently in place, none could confidently be confirmed to provide the best and reliable solution. According to several research findings reported in the literature, researchers have demonstrated how machine learning algorithms could be employed to verify and confirm compromised and fake URLs in the cyberspace. Inconsistencies have however been noticed in the researchers’ findings and also their corresponding results are not dependable based on the values obtained and conclusions drawn from them. Against this backdrop, the authors carried out a comparative analysis of three learning algorithms (Naïve Bayes, Decision Tree and Logistics Regression Model) for verification of compromised, suspicious and fake URLs and determine which is the best of all based on the metrics (F-Measure, Precision and Recall) used for evaluation. Based on the confusion metrics measurement, the result obtained shows that the Decision Tree (ID3) algorithm achieves the highest values for recall, precision and f-measure. It unarguably provides efficient and credible means of maximizing the detection of compromised and malicious URLs. Finally, for future work, authors are of the opinion that two or more supervised learning algorithms can be hybridized to form a single effective and more efficient algorithm for fake URLs verification.Keywords: Learning-algorithms, Forged-URL, Phoney-URL, performance-compariso

    Identification of potential malicious web pages

    Get PDF
    Malicious web pages are an emerging security concern on the Internet due to their popularity and their potential serious impact. Detecting and analysing them are very costly because of their qualities and complexities. In this paper, we present a lightweight scoring mechanism that uses static features to identify potential malicious pages. This mechanism is intended as a filter that allows us to reduce the number suspicious web pages requiring more expensive analysis by other mechanisms that require loading and interpretation of the web pages to determine whether they are malicious or benign. Given its role as a filter, our main aim is to reduce false positives while minimising false negatives. The scoring mechanism has been developed by identifying candidate static features of malicious web pages that are evaluate using a feature selection algorithm. This identifies the most appropriate set of features that can be used to efficiently distinguish between benign and malicious web pages. These features are used to construct a scoring algorithm that allows us to calculate a score for a web page's potential maliciousness. The main advantage of this scoring mechanism compared to a binary classifier is the ability to make a trade-off between accuracy and performance. This allows us to adjust the number of web pages passed to the more expensive analysis mechanism in order to tune overall performance
    • …
    corecore