1,319 research outputs found
PDF-Malware Detection: A Survey and Taxonomy of Current Techniques
Portable Document Format, more commonly known as PDF, has become, in the last 20 years, a standard for document exchange and dissemination due its portable nature and widespread adoption. The flexibility and power of this format are not only leveraged by benign users, but from hackers as well who have been working to exploit various types of vulnerabilities, overcome security restrictions, and then transform the PDF format in one among the leading malicious code spread vectors. Analyzing the content of malicious PDF files to extract the main features that characterize the malware identity and behavior, is a fundamental task for modern threat intelligence platforms that need to learn how to automatically identify new attacks. This paper surveys existing state of the art about systems for the detection of malicious PDF files and organizes them in a taxonomy that separately considers the used approaches and the data analyzed to detect the presence of malicious code. © Springer International Publishing AG, part of Springer Nature 2018
Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks
Malware still constitutes a major threat in the cybersecurity landscape, also
due to the widespread use of infection vectors such as documents. These
infection vectors hide embedded malicious code to the victim users,
facilitating the use of social engineering techniques to infect their machines.
Research showed that machine-learning algorithms provide effective detection
mechanisms against such threats, but the existence of an arms race in
adversarial settings has recently challenged such systems. In this work, we
focus on malware embedded in PDF files as a representative case of such an arms
race. We start by providing a comprehensive taxonomy of the different
approaches used to generate PDF malware, and of the corresponding
learning-based detection systems. We then categorize threats specifically
targeted against learning-based PDF malware detectors, using a well-established
framework in the field of adversarial machine learning. This framework allows
us to categorize known vulnerabilities of learning-based PDF malware detectors
and to identify novel attacks that may threaten such systems, along with the
potential defense mechanisms that can mitigate the impact of such threats. We
conclude the paper by discussing how such findings highlight promising research
directions towards tackling the more general challenge of designing robust
malware detectors in adversarial settings
Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts
Most of the JavaScript code deployed in the wild has been minified, a process
in which identifier names are replaced with short, arbitrary and meaningless
names. Minified code occupies less space, but also makes the code extremely
difficult to manually inspect and understand. This paper presents Context2Name,
a deep learningbased technique that partially reverses the effect of
minification by predicting natural identifier names for minified names. The
core idea is to predict from the usage context of a variable a name that
captures the meaning of the variable. The approach combines a lightweight,
token-based static analysis with an auto-encoder neural network that summarizes
usage contexts and a recurrent neural network that predict natural names for a
given usage context. We evaluate Context2Name with a large corpus of real-world
JavaScript code and show that it successfully predicts 47.5% of all minified
identifiers while taking only 2.9 milliseconds on average to predict a name. A
comparison with the state-of-the-art tools JSNice and JSNaughty shows that our
approach performs comparably in terms of accuracy while improving in terms of
efficiency. Moreover, Context2Name complements the state-of-the-art by
predicting 5.3% additional identifiers that are missed by both existing tools
- …