41 research outputs found
Effectiveness of Opcode ngrams for Detection of Multi Family Android Malware
With the wide diffusion of smartphones and their usage in a plethora of processes and activities, these devices have been handling an increasing variety of sensitive resources. Attackers are hence producing a large number of malware applications for Android (the most spread mobile platform), often by slightly modifying existing applications, which results in malware being organized in families. Some works in the literature showed that opcodes are informative for detecting malware, not only in the Android platform. In this paper, we investigate if frequencies of ngrams of opcodes are effective in detecting Android malware and if there is some significant malware family for which they are more or less effective. To this end, we designed a method based on state-of-the-art classifiers applied to frequencies of opcodes ngrams. Then, we experimentally evaluated it on a recent dataset composed of 11120 applications, 5560 of which are malware belonging to several different families. Results show that an accuracy of 97% can be obtained on the average, whereas perfect detection rate is achieved for more than one malware family
On the Effectiveness of System API-Related Information for Android Ransomware Detection
Ransomware constitutes a significant threat to the Android operating system.
It can either lock or encrypt the target devices, and victims are forced to pay
ransoms to restore their data. Hence, the prompt detection of such attacks has
a priority in comparison to other malicious threats. Previous works on Android
malware detection mainly focused on Machine Learning-oriented approaches that
were tailored to identifying malware families, without a clear focus on
ransomware. More specifically, such approaches resorted to complex information
types such as permissions, user-implemented API calls, and native calls.
However, this led to significant drawbacks concerning complexity, resilience
against obfuscation, and explainability. To overcome these issues, in this
paper, we propose and discuss learning-based detection strategies that rely on
System API information. These techniques leverage the fact that ransomware
attacks heavily resort to System API to perform their actions, and allow
distinguishing between generic malware, ransomware and goodware.
We tested three different ways of employing System API information, i.e.,
through packages, classes, and methods, and we compared their performances to
other, more complex state-of-the-art approaches. The attained results showed
that systems based on System API could detect ransomware and generic malware
with very good accuracy, comparable to systems that employed more complex
information. Moreover, the proposed systems could accurately detect novel
samples in the wild and showed resilience against static obfuscation attempts.
Finally, to guarantee early on-device detection, we developed and released on
the Android platform a complete ransomware and malware detector (R-PackDroid)
that employed one of the methodologies proposed in this paper
Exploiting natural language structures in software informal documentation
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Communication means, such as issue trackers, mailing lists, Q&A forums, and app reviews, are premier means of collaboration among developers, and between developers and end-users. Analyzing such sources of information is crucial to build recommenders for developers, for example suggesting experts, re-documenting source code, or transforming user feedback in maintenance and evolution strategies for developers. To ease this analysis, in previous work we proposed DECA (Development Emails Content Analyzer), a tool based on Natural Language Parsing that classifies with high precision development emails' fragments according to their purpose. However, DECA has to be trained through a manual tagging of relevant patterns, which is often effort-intensive, error-prone and requires specific expertise in natural language parsing. In this paper, we first show, with a study involving Master's and Ph.D. students, the extent to which producing rules for identifying such patterns requires effort, depending on the nature and complexity of patterns. Then, we propose an approach, named NEON (Nlp-based softwarE dOcumentation aNalyzer), that automatically mines such rules, minimizing the manual effort. We assess the performances of NEON in the analysis and classification of mobile app reviews, developers discussions, and issues. NEON simplifies the patterns' identification and rules' definition processes, allowing a savings of more than 70% of the time otherwise spent on performing such activities manually. Results also show that NEON-generated rules are close to the manually identified ones, achieving comparable recall
Malicious JavaScript Detection by Features Extraction
In recent years, JavaScript-based attacks have become one of the most common and successful types of attack. Existing techniques for detecting malicious JavaScripts could fail for different reasons. Some techniques are tailored on specific kinds of attacks, and are ineffective for others. Some other techniques require costly computational resources to be implemented. Other techniques could be circumvented with evasion methods. This paper proposes a method for detecting malicious JavaScript code based on five features that capture different characteristics of a script: execution time, external referenced domains and calls to JavaScript functions. Mixing different types of features could result in a more effective detection technique, and overcome the limitations of existing tools created for identifying malicious JavaScript. The experimentation carried out suggests that a combination of these features is able to successfully detect malicious JavaScript code (in the best cases we obtained a precision of 0.979 and a recall of 0.978)