200 research outputs found
Understanding Android Obfuscation Techniques: A Large-Scale Investigation in the Wild
In this paper, we seek to better understand Android obfuscation and depict a
holistic view of the usage of obfuscation through a large-scale investigation
in the wild. In particular, we focus on four popular obfuscation approaches:
identifier renaming, string encryption, Java reflection, and packing. To obtain
the meaningful statistical results, we designed efficient and lightweight
detection models for each obfuscation technique and applied them to our massive
APK datasets (collected from Google Play, multiple third-party markets, and
malware databases). We have learned several interesting facts from the result.
For example, malware authors use string encryption more frequently, and more
apps on third-party markets than Google Play are packed. We are also interested
in the explanation of each finding. Therefore we carry out in-depth code
analysis on some Android apps after sampling. We believe our study will help
developers select the most suitable obfuscation approach, and in the meantime
help researchers improve code analysis systems in the right direction
Testing android malware detectors against code obfuscation: a systematization of knowledge and unified methodology
The authors of mobile-malware have started to leverage program protection techniques to circumvent anti-viruses, or simply hinder reverse engineering. In response to the diffusion of anti-virus applications, several researches have proposed a plethora of analyses and approaches to highlight their limitations when malware authors employ program-protection techniques. An important contribution of this work is a systematization of the state of the art of anti-virus apps, comparing the existing approaches and providing a detailed analysis of their pros and cons. As a result of our systematization, we notice the lack of openness and reproducibility that, in our opinion, are crucial for any analysis methodology. Following this observation, the second contribution of this work is an open, reproducible, rigorous methodology to assess the effectiveness of mobile anti-virus tools against code-transformation attacks. Our unified workflow, released in the form of an open-source prototype, comprises a comprehensive set of obfuscation operators. It is intended to be used by anti-virus developers and vendors to test the resilience of their products against a large dataset of malware samples and obfuscations, and to obtain insights on how to improve their products with respect to particular classes of code-transformation attacks
Deobfuscation of VBScript-based Malware
VBScript je desktopový a webový skriptovací jazyk, který je často využíván škodlivým softwarem. Autoři tohoto software se často snaží skrýt jeho pravou funkcionalitu a zabránit ostatním ve čtení jeho kódu pomocí obfuskací. Tato práce se zaměřuje na analýzu těchto obfuskací, prozkoumání způsobů, jak je překonat, a implementaci nástroje schopného zlepšit čitelnost obfuskovaných programů pomocí statických a dynamických deobfuskačních metod.VBScript is a desktop and web based scripting language that is often used by malicious software. Authors of such sofware often attempt to conceal its true functionality and prevent others from reading the source code by using obfuscations. This thesis focuses on analyzing these obfuscations, exploring ways of reverting them and implementing a tool to improve readability of obfuscated programs using both static and dynamic deobfuscation methods
Techniques for advanced android malware triage
Mención Internacional en el título de doctorAndroid is the leading operating system in smartphones with a big difference.
Statistics show that 88% of all smartphones sold to end users in
the second quarter of 2018 were phones with the Android OS. Regardless
of the operating systems which are running on smartphones, most of
the functionalities of these devices are offered through applications. There
are currently over 2 million apps only on the official Google store, known
as Google Play. This huge market with billions of users is tempting for
attackers to develop and distribute their malicious apps (or malware).
Mobile malware has raised explosively since 2009. Symantec reported
an increase of 54% in the new mobile malware variants in 2017 as compared
to the previous year. Additionally, more incentive has been provided
for profit-driven malware by the growth of black markets. This rise has
happened for Android malware as well since only 20% of devices are running
the newest major version of Android OS based on Symantec report in
2018. Android continued to be the most targeted platform with the biggest
number of attacks in 2015. After that year, attacks against the Android
platform slowed for the first time as attackers were faced with improved
security architectures though Android is still the main appealing target OS
for attackers. Moreover, advanced types of Android malware are found
which make use of extensive anit-analysis techniques to evade static or
dynamic analysis.
To address the security and privacy concerns of complex Android malware,
this dissertation focuses on three main objectives. First of all, we
propose a light-weight yet efficient method to identify risky Android applications.
Next, we present a precise approach to characterize Android
malware based on their malicious behavior. Finally, we propose an adaptive learning system to address the security concerns of obfuscation in Android
malware.
Identifying potentially dangerous and risky applications is an important
step in Android malware analysis. To this end, we develop a triage system
to rank applications based on their potential risk. Our approach, called TriFlow, relies on static features which are quick to obtain. TriFlow combines
a probabilistic model to predict the existence of information flows with a
metric of how significant a flow is in benign and malicious apps. Based
on this, TriFlow provides a score for each application that can be used to
prioritize analysis. It also provides the analysts with an explanatory report
of the associated risk. Our tool can also be used as a complement with
computationally expensive static and dynamic analysis tools.
Another important step towards Android malware analysis lies in their
accurate characterization. Labeling Android malware is challenging yet
crucially important, as it helps to identify upcoming malware samples and
threats. A key challenge is that different researchers and anti-virus vendors
assign labels using their own criteria, and it is not known to what
extent these labels are aligned with the apps’ real behavior. Based on this,
we propose a new behavioral characterization method for Android apps
based on their extracted information flows. As information flows can be
used to track why and how apps use specific pieces of information, a flowbased
characterization provides a relatively easy-to-interpret summary of
the malware sample’s behavior.
Not all Android malware are easy to analyze due to advanced and easyto-apply anti-analysis techniques that are available nowadays. Obfuscation
is the most common anti-analysis technique that Android malware use to
evade detection. Obfuscation techniques modify an app’s source (or machine)
code in order to make it more difficult to analyze. This is typically
applied to protect intellectual property in benign apps, or to hinder the process
of extracting actionable information in the case of malware. Since
malware analysis often requires considerable resource investment, detecting
the particular obfuscation technique used may contribute to apply the
right analysis tools, thus leading to some savings.
Therefore, we propose AndrODet, a mechanism to detect three popular
types of obfuscation in Android applications, namely identifier renaming, string encryption, and control flow obfuscation. AndrODet leverages online
learning techniques, thus being suitable for resource-limited environments
that need to operate in a continuous manner. We compare our results
with a batch learning algorithm using a dataset of 34,962 apps from both
malware and benign apps. Experimental results show that online learning
approaches are not only able to compete with batch learning methods in
terms of accuracy, but they also save significant amount of time and computational
resources.
Finally, we present a number of open research directions based on the
outcome of this thesis.Android es el sistema operativo líder en teléfonos inteligentes (también
denominados con la palabra inglesa smartphones), con una gran diferencia
con respecto al resto de competidores. Las estadísticas muestran que el
88% de todos los smartphones vendidos a usuarios finales en el segundo
trimestre de 2018 fueron teléfonos con sistema operativo Android. Independientemente
de su sistema operativo, la mayoría de las funcionalidades
de estos dispositivos se ofrecen a través de aplicaciones. Actualmente hay
más de 2 millones de aplicaciones solo en la tienda oficial de Google, conocida
como Google Play. Este enorme mercado con miles de millones de
usuarios es tentador para los atacantes, que buscan distribuir sus aplicaciones
malintencionadas (o malware).
El malware para dispositivos móviles ha aumentado de forma exponencial
desde 2009. Symantec ha detectado un aumento del 54% en las nuevas
variantes de malware para dispositivos móviles en 2017 en comparación
con el año anterior. Además, el crecimiento del mercado negro (es decir,
plataformas no oficiales de descargas de aplicaciones) supone un incentivo
para los programas maliciosos con fines lucrativos. Este aumento también
ha ocurrido en el malware de Android, aprovechando la circunstancia de
que solo el 20% de los dispositivos ejecutan la versión mas reciente del sistema
operativo Android, de acuerdo con el informe de Symantec en 2018.
De hecho, Android ha sido la plataforma que ha centrado los esfuerzos de
los atacantes desde 2015, aunque los ataques decayeron ligeramente tras
ese año debido a las mejoras de seguridad incorporadas en el sistema operativo.
En todo caso, existen formas avanzadas de malware para Android
que hacen uso de técnicas sofisticadas para evadir el análisis estático o
dinámico.
Para abordar los problemas de seguridad y privacidad que causa el malware
en Android, esta Tesis se centra en tres objetivos principales. En
primer lugar, se propone un método ligero y eficiente para identificar aplicaciones
de Android que pueden suponer un riesgo. Por otra parte, se presenta
un mecanismo para la caracterización del malware atendiendo a su
comportamiento. Finalmente, se propone un mecanismo basado en aprendizaje
adaptativo para la detección de algunos tipos de ofuscación que son
empleados habitualmente en las aplicaciones maliciosas.
Identificar aplicaciones potencialmente peligrosas y riesgosas es un
paso importante en el análisis de malware de Android. Con este fin, en
esta Tesis se desarrolla un mecanismo de clasificación (llamado TriFlow)
que ordena las aplicaciones según su riesgo potencial. La aproximación
se basa en características estáticas que se obtienen rápidamente, siendo de
especial interés los flujos de información. Un flujo de información existe
cuando un cierto dato es recibido o producido mediante una cierta función
o llamada al sistema, y atraviesa la lógica de la aplicación hasta que
llega a otra función. Así, TriFlow combina un modelo probabilístico para
predecir la existencia de un flujo con una métrica de lo habitual que es
encontrarlo en aplicaciones benignas y maliciosas. Con ello, TriFlow proporciona
una puntuación para cada aplicación que puede utilizarse para
priorizar su análisis. Al mismo tiempo, proporciona a los analistas un informe
explicativo de las causas que motivan dicha valoración. Así, esta
herramienta se puede utilizar como complemento a otras técnicas de análisis
estático y dinámico que son mucho más costosas desde el punto de vista
computacional.
Otro paso importante hacia el análisis de malware de Android radica
en caracterizar su comportamiento. Etiquetar el malware de Android es
un desafío de crucial importancia, ya que ayuda a identificar las próximas
muestras y amenazas de malware. Una cuestión relevante es que los
diferentes investigadores y proveedores de antivirus asignan etiquetas utilizando
sus propios criterios, de modo no se sabe en qué medida estas etiquetas
están en línea con el comportamiento real de las aplicaciones. Sobre
esta base, en esta Tesis se propone un nuevo método de caracterización de
comportamiento para las aplicaciones de Android en función de sus flujos
de información. Como dichos flujos se pueden usar para estudiar el uso de
cada dato por parte de una aplicación, permiten proporcionar un resumen relativamente sencillo del comportamiento de una determinada muestra de
malware.
A pesar de la utilidad de las técnicas de análisis descritas, no todos los
programas maliciosos de Android son fáciles de analizar debido al uso de
técnicas anti-análisis que están disponibles en la actualidad. Entre ellas, la
ofuscación es la técnica más común que se utiliza en el malware de Android
para evadir la detección. Dicha técnica modifica el código de una
aplicación para que sea más difícil de entender y analizar. Esto se suele
aplicar para proteger la propiedad intelectual en aplicaciones benignas o
para dificultar la obtención de pistas sobre su funcionamiento en el caso
del malware. Dado que el análisis de malware a menudo requiere una inversión
considerable de recursos, detectar la técnica de ofuscación que se
ha utilizado en un caso particular puede contribuir a utilizar herramientas
de análisis adecuadas, contribuyendo así a un cierto ahorro de recursos.
Así, en esta Tesis se propone AndrODet, un mecanismo para detectar tres
tipos populares de ofuscación, a saber, el renombrado de identificadores,
cifrado de cadenas de texto y la modificación del flujo de control de la aplicación.
AndrODet se basa en técnicas de aprendizaje automático en línea
(online machine learning), por lo que es adecuado para entornos con recursos
limitados que necesitan operar de forma continua, sin interrupción.
Para medir su eficacia respecto de las técnicas de aprendizaje automático
tradicionales, se comparan los resultados con un algoritmo de aprendizaje
por lotes (batch learning) utilizando un dataset de 34.962 aplicaciones de
malware y benignas. Los resultados experimentales muestran que el enfoque
de aprendizaje en línea no solo es capaz de competir con el basado
en lotes en términos de precisión, sino que también ahorra una gran cantidad
de tiempo y recursos computacionales.
Tras la exposición de las contribuciones anteriormente mencionadas,
esta Tesis concluye con la identificación de una serie de líneas abiertas de
investigación con el fin de alentar el desarrollo de trabajos futuros en esta
dirección.Omid Mirzaei is a Ph.D. candidate in the Computer Security Lab (COSEC)
at the Department of Computer Science and Engineering of Universidad
Carlos III de Madrid (UC3M). His Ph.D. is funded by the Community
of Madrid and the European Union through the research project CIBERDINE
(Ref. S2013/ICE-3095).Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Gregorio Martínez Pérez.- Secretario: Pedro Peris López.- Vocal: Pablo Picazo Sánche
LibiD: Reliable identification of obfuscated third-party android libraries
Third-party libraries are vital components of Android apps, yet they can also introduce serious security threats and impede the accuracy and reliability of app analysis tasks, such as app clone detection. Several library detection approaches have been proposed to address these problems. However, we show these techniques are not robust against popular code obfuscators, such as ProGuard, which is now used in nearly half of all apps. We then present LibID, a library detection tool that is more resilient to code shrinking and package modification than state-of-the-art tools. We show that the library identification problem can be formulated using binary integer programming models. LibID is able to identify specific versions of third-party libraries in candidate apps through static analysis of app binaries coupled with a database of third-party libraries. We propose a novel approach to generate synthetic apps to tune the detection thresholds. Then, we use F-Droid apps as the ground truth to evaluate LibID under different obfuscation settings, which shows that LibID is more robust to code obfuscators than state-of-the-art tools. Finally, we demonstrate the utility of LibID by detecting the use of a vulnerable version of the OkHttp library in nearly 10% of 3,958 most popular apps on the Google Play Store.The Boeing Company, China Scholarship Council, Microsoft Researc
Evaluation of Android anti-malware resistance against transformation attacks
Android being most popular and user-friendly is targeted by most of the malware authors. The malware authors use various transformation techniques to create different variants of malwares. Different transformation techniques such as obfuscation, repackaging, renaming are used mostly. Many anti-malwares are developed to secure the Android devices. Android does not offer file access permissions to all the applications installed. Thus anti-malwares may not provide complete security to the Android devices. In this paper, many such different techniques are presented that can be used to evaluate different anti-malwares
A Survey and Evaluation of Android-Based Malware Evasion Techniques and Detection Frameworks
Android platform security is an active area of research where malware detection techniques continuously evolve to identify novel malware and improve the timely and accurate detection of existing malware. Adversaries are constantly in charge of employing innovative techniques to avoid or prolong malware detection effectively. Past studies have shown that malware detection systems are susceptible to evasion attacks where adversaries can successfully bypass the existing security defenses and deliver the malware to the target system without being detected. The evolution of escape-resistant systems is an open research problem. This paper presents a detailed taxonomy and evaluation of Android-based malware evasion techniques deployed to circumvent malware detection. The study characterizes such evasion techniques into two broad categories, polymorphism and metamorphism, and analyses techniques used for stealth malware detection based on the malware’s unique characteristics. Furthermore, the article also presents a qualitative and systematic comparison of evasion detection frameworks and their detection methodologies for Android-based malware. Finally, the survey discusses open-ended questions and potential future directions for continued research in mobile malware detection
Exploiting loop transformations for the protection of software
Il software conserva la maggior parte del know-how che occorre per svilupparlo. Poich\ue9 oggigiorno il software pu\uf2 essere facilmente duplicato e ridistribuito ovunque, il rischio che la propriet\ue0 intellettuale venga violata su scala globale \ue8 elevato. Una delle pi\uf9 interessanti soluzioni a questo problema \ue8 dotare il software di un watermark. Ai watermark si richiede non solo di certificare in modo univoco il proprietario del software, ma anche di essere resistenti e pervasivi. In questa tesi riformuliamo i concetti di robustezza e pervasivit\ue0 a partire dalla semantica delle tracce. Evidenziamo i cicli quali costrutti di programmazione pervasivi e introduciamo le trasformazioni di ciclo come mattone di costruzione per schemi di watermarking pervasivo. Passiamo in rassegna alcune fra tali trasformazioni, studiando i loro principi di base. Infine, sfruttiamo tali principi per costruire una tecnica di watermarking pervasivo. La robustezza rimane una difficile, quanto affascinante, questione ancora da risolvere.Software retains most of the know-how required fot its development. Because nowadays software can be easily cloned and spread worldwide, the risk of intellectual property infringement on a global scale is high. One of the most viable solutions to this problem is to endow software with a watermark. Good watermarks are required not only to state unambiguously the owner of software, but also to be resilient and pervasive. In this thesis we base resiliency and pervasiveness on trace semantics. We point out loops as pervasive programming constructs and we introduce loop transformations as the basic block of pervasive watermarking schemes. We survey several loop transformations, outlining their underlying principles. Then we exploit these principles to build some pervasive watermarking techniques. Resiliency still remains a big and challenging open issue
A comparison of code similarity analysers
Copying and pasting of source code is a common activity in software engineering. Often, the code is not copied as it is and it may be modified for various purposes; e.g. refactoring, bug fixing, or even software plagiarism. These code modifications could affect the performance of code similarity analysers including code clone and plagiarism detectors to some certain degree. We are interested in two types of code modification in this study: pervasive modifications, i.e. transformations that may have a global effect, and local modifications, i.e. code changes that are contained in a single method or code block. We evaluate 30 code similarity detection techniques and tools using five experimental scenarios for Java source code. These are (1) pervasively modified code, created with tools for source code and bytecode obfuscation, and boiler-plate code, (2) source code normalisation through compilation and decompilation using different decompilers, (3) reuse of optimal configurations over different data sets, (4) tool evaluation using ranked-based measures, and (5) local + global code modifications. Our experimental results show that in the presence of pervasive modifications, some of the general textual similarity measures can offer similar performance to specialised code similarity tools, whilst in the presence of boiler-plate code, highly specialised source code similarity detection techniques and tools outperform textual similarity measures. Our study strongly validates the use of compilation/decompilation as a normalisation technique. Its use reduced false classifications to zero for three of the tools. Moreover, we demonstrate that optimal configurations are very sensitive to a specific data set. After directly applying optimal configurations derived from one data set to another, the tools perform poorly on the new data set. The code similarity analysers are thoroughly evaluated not only based on several well-known pair-based and query-based error measures but also on each specific type of pervasive code modification. This broad, thorough study is the largest in existence and potentially an invaluable guide for future users of similarity detection in source code
- …