7 research outputs found
An in-Depth Study of the Jisut Family of Android Ransomware
Android malware is increasing in spread and complexity. Advanced obfuscation, emulation detection, delayed payload activation or dynamic code loading are some of the techniques employed by the current malware to hinder the use of reverse engineering techniques and anti-malware tools. This growing complexity is particularly noticeable in the evolution of different strands of the same malware family. Over the years, these families mature to become more effective by incorporating new and enhanced techniques. In this paper, we focus on a particular Android ransomware family named Jisut, and perform a thorough technical analysis. We also provide a detailed overall perspective, which will hopefully help to create new tools and techniques to tackle more effectively the threat posed by ransomware
Machine learning techniques for android malware detection and classification
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de Lectura: 15-03-2019la realización de esta tesis no habría sido posible sin la financiación aportada
por el proyecto CIBERDINE: Cybersecurity, Data and Risks (S2013/ICE3095)
concedido por la Comunidad de Madrid
GroupDroid: Automatically Grouping Mobile Malware by Extracting Code Similarities
As shown in previous work, malware authors often reuse portions of code in the development of their samples. Especially in the mobile scenario, there exists a phenomena, called piggybacking, that describes the act of embedding malicious code inside benign apps. In this paper, we leverage such observations to analyze mobile malware by looking at its similarities. In practice, we propose a novel approach that identifies and extracts code similarities in mobile apps. Our approach is based on static analysis and works by computing the Control Flow Graph of each method and encoding it in a feature vector used to measure similarities. We implemented our approach in a tool, GroupDroid, able to group mobile apps together according to their code similarities. Armed with GroupDroid, we then analyzed modern mobile malware samples. Our experiments show that GroupDroid is able to correctly and accurately distinguish different malware variants, and to provide useful and detailed information about the similar portions of malicious code
Techniques for advanced android malware triage
Mención Internacional en el título de doctorAndroid is the leading operating system in smartphones with a big difference.
Statistics show that 88% of all smartphones sold to end users in
the second quarter of 2018 were phones with the Android OS. Regardless
of the operating systems which are running on smartphones, most of
the functionalities of these devices are offered through applications. There
are currently over 2 million apps only on the official Google store, known
as Google Play. This huge market with billions of users is tempting for
attackers to develop and distribute their malicious apps (or malware).
Mobile malware has raised explosively since 2009. Symantec reported
an increase of 54% in the new mobile malware variants in 2017 as compared
to the previous year. Additionally, more incentive has been provided
for profit-driven malware by the growth of black markets. This rise has
happened for Android malware as well since only 20% of devices are running
the newest major version of Android OS based on Symantec report in
2018. Android continued to be the most targeted platform with the biggest
number of attacks in 2015. After that year, attacks against the Android
platform slowed for the first time as attackers were faced with improved
security architectures though Android is still the main appealing target OS
for attackers. Moreover, advanced types of Android malware are found
which make use of extensive anit-analysis techniques to evade static or
dynamic analysis.
To address the security and privacy concerns of complex Android malware,
this dissertation focuses on three main objectives. First of all, we
propose a light-weight yet efficient method to identify risky Android applications.
Next, we present a precise approach to characterize Android
malware based on their malicious behavior. Finally, we propose an adaptive learning system to address the security concerns of obfuscation in Android
malware.
Identifying potentially dangerous and risky applications is an important
step in Android malware analysis. To this end, we develop a triage system
to rank applications based on their potential risk. Our approach, called TriFlow, relies on static features which are quick to obtain. TriFlow combines
a probabilistic model to predict the existence of information flows with a
metric of how significant a flow is in benign and malicious apps. Based
on this, TriFlow provides a score for each application that can be used to
prioritize analysis. It also provides the analysts with an explanatory report
of the associated risk. Our tool can also be used as a complement with
computationally expensive static and dynamic analysis tools.
Another important step towards Android malware analysis lies in their
accurate characterization. Labeling Android malware is challenging yet
crucially important, as it helps to identify upcoming malware samples and
threats. A key challenge is that different researchers and anti-virus vendors
assign labels using their own criteria, and it is not known to what
extent these labels are aligned with the apps’ real behavior. Based on this,
we propose a new behavioral characterization method for Android apps
based on their extracted information flows. As information flows can be
used to track why and how apps use specific pieces of information, a flowbased
characterization provides a relatively easy-to-interpret summary of
the malware sample’s behavior.
Not all Android malware are easy to analyze due to advanced and easyto-apply anti-analysis techniques that are available nowadays. Obfuscation
is the most common anti-analysis technique that Android malware use to
evade detection. Obfuscation techniques modify an app’s source (or machine)
code in order to make it more difficult to analyze. This is typically
applied to protect intellectual property in benign apps, or to hinder the process
of extracting actionable information in the case of malware. Since
malware analysis often requires considerable resource investment, detecting
the particular obfuscation technique used may contribute to apply the
right analysis tools, thus leading to some savings.
Therefore, we propose AndrODet, a mechanism to detect three popular
types of obfuscation in Android applications, namely identifier renaming, string encryption, and control flow obfuscation. AndrODet leverages online
learning techniques, thus being suitable for resource-limited environments
that need to operate in a continuous manner. We compare our results
with a batch learning algorithm using a dataset of 34,962 apps from both
malware and benign apps. Experimental results show that online learning
approaches are not only able to compete with batch learning methods in
terms of accuracy, but they also save significant amount of time and computational
resources.
Finally, we present a number of open research directions based on the
outcome of this thesis.Android es el sistema operativo líder en teléfonos inteligentes (también
denominados con la palabra inglesa smartphones), con una gran diferencia
con respecto al resto de competidores. Las estadísticas muestran que el
88% de todos los smartphones vendidos a usuarios finales en el segundo
trimestre de 2018 fueron teléfonos con sistema operativo Android. Independientemente
de su sistema operativo, la mayoría de las funcionalidades
de estos dispositivos se ofrecen a través de aplicaciones. Actualmente hay
más de 2 millones de aplicaciones solo en la tienda oficial de Google, conocida
como Google Play. Este enorme mercado con miles de millones de
usuarios es tentador para los atacantes, que buscan distribuir sus aplicaciones
malintencionadas (o malware).
El malware para dispositivos móviles ha aumentado de forma exponencial
desde 2009. Symantec ha detectado un aumento del 54% en las nuevas
variantes de malware para dispositivos móviles en 2017 en comparación
con el año anterior. Además, el crecimiento del mercado negro (es decir,
plataformas no oficiales de descargas de aplicaciones) supone un incentivo
para los programas maliciosos con fines lucrativos. Este aumento también
ha ocurrido en el malware de Android, aprovechando la circunstancia de
que solo el 20% de los dispositivos ejecutan la versión mas reciente del sistema
operativo Android, de acuerdo con el informe de Symantec en 2018.
De hecho, Android ha sido la plataforma que ha centrado los esfuerzos de
los atacantes desde 2015, aunque los ataques decayeron ligeramente tras
ese año debido a las mejoras de seguridad incorporadas en el sistema operativo.
En todo caso, existen formas avanzadas de malware para Android
que hacen uso de técnicas sofisticadas para evadir el análisis estático o
dinámico.
Para abordar los problemas de seguridad y privacidad que causa el malware
en Android, esta Tesis se centra en tres objetivos principales. En
primer lugar, se propone un método ligero y eficiente para identificar aplicaciones
de Android que pueden suponer un riesgo. Por otra parte, se presenta
un mecanismo para la caracterización del malware atendiendo a su
comportamiento. Finalmente, se propone un mecanismo basado en aprendizaje
adaptativo para la detección de algunos tipos de ofuscación que son
empleados habitualmente en las aplicaciones maliciosas.
Identificar aplicaciones potencialmente peligrosas y riesgosas es un
paso importante en el análisis de malware de Android. Con este fin, en
esta Tesis se desarrolla un mecanismo de clasificación (llamado TriFlow)
que ordena las aplicaciones según su riesgo potencial. La aproximación
se basa en características estáticas que se obtienen rápidamente, siendo de
especial interés los flujos de información. Un flujo de información existe
cuando un cierto dato es recibido o producido mediante una cierta función
o llamada al sistema, y atraviesa la lógica de la aplicación hasta que
llega a otra función. Así, TriFlow combina un modelo probabilístico para
predecir la existencia de un flujo con una métrica de lo habitual que es
encontrarlo en aplicaciones benignas y maliciosas. Con ello, TriFlow proporciona
una puntuación para cada aplicación que puede utilizarse para
priorizar su análisis. Al mismo tiempo, proporciona a los analistas un informe
explicativo de las causas que motivan dicha valoración. Así, esta
herramienta se puede utilizar como complemento a otras técnicas de análisis
estático y dinámico que son mucho más costosas desde el punto de vista
computacional.
Otro paso importante hacia el análisis de malware de Android radica
en caracterizar su comportamiento. Etiquetar el malware de Android es
un desafío de crucial importancia, ya que ayuda a identificar las próximas
muestras y amenazas de malware. Una cuestión relevante es que los
diferentes investigadores y proveedores de antivirus asignan etiquetas utilizando
sus propios criterios, de modo no se sabe en qué medida estas etiquetas
están en línea con el comportamiento real de las aplicaciones. Sobre
esta base, en esta Tesis se propone un nuevo método de caracterización de
comportamiento para las aplicaciones de Android en función de sus flujos
de información. Como dichos flujos se pueden usar para estudiar el uso de
cada dato por parte de una aplicación, permiten proporcionar un resumen relativamente sencillo del comportamiento de una determinada muestra de
malware.
A pesar de la utilidad de las técnicas de análisis descritas, no todos los
programas maliciosos de Android son fáciles de analizar debido al uso de
técnicas anti-análisis que están disponibles en la actualidad. Entre ellas, la
ofuscación es la técnica más común que se utiliza en el malware de Android
para evadir la detección. Dicha técnica modifica el código de una
aplicación para que sea más difícil de entender y analizar. Esto se suele
aplicar para proteger la propiedad intelectual en aplicaciones benignas o
para dificultar la obtención de pistas sobre su funcionamiento en el caso
del malware. Dado que el análisis de malware a menudo requiere una inversión
considerable de recursos, detectar la técnica de ofuscación que se
ha utilizado en un caso particular puede contribuir a utilizar herramientas
de análisis adecuadas, contribuyendo así a un cierto ahorro de recursos.
Así, en esta Tesis se propone AndrODet, un mecanismo para detectar tres
tipos populares de ofuscación, a saber, el renombrado de identificadores,
cifrado de cadenas de texto y la modificación del flujo de control de la aplicación.
AndrODet se basa en técnicas de aprendizaje automático en línea
(online machine learning), por lo que es adecuado para entornos con recursos
limitados que necesitan operar de forma continua, sin interrupción.
Para medir su eficacia respecto de las técnicas de aprendizaje automático
tradicionales, se comparan los resultados con un algoritmo de aprendizaje
por lotes (batch learning) utilizando un dataset de 34.962 aplicaciones de
malware y benignas. Los resultados experimentales muestran que el enfoque
de aprendizaje en línea no solo es capaz de competir con el basado
en lotes en términos de precisión, sino que también ahorra una gran cantidad
de tiempo y recursos computacionales.
Tras la exposición de las contribuciones anteriormente mencionadas,
esta Tesis concluye con la identificación de una serie de líneas abiertas de
investigación con el fin de alentar el desarrollo de trabajos futuros en esta
dirección.Omid Mirzaei is a Ph.D. candidate in the Computer Security Lab (COSEC)
at the Department of Computer Science and Engineering of Universidad
Carlos III de Madrid (UC3M). His Ph.D. is funded by the Community
of Madrid and the European Union through the research project CIBERDINE
(Ref. S2013/ICE-3095).Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Gregorio Martínez Pérez.- Secretario: Pedro Peris López.- Vocal: Pablo Picazo Sánche
Program Similarity Analysis for Malware Classification and its Pitfalls
Malware classification, specifically the task of grouping malware samples into families according to their behaviour, is vital in order to understand the threat they pose and how to protect against them. Recognizing whether one program shares behaviors with another is a task that requires semantic reasoning, meaning that it needs to consider what a program actually does. This is a famously uncomputable problem, due to Rice\u2019s theorem. As there is no one-size-fits-all solution, determining program similarity in the context of malware classification requires different tools and methods depending on what is available to the malware defender. When the malware source code is readily available (or at least, easy to retrieve), most approaches employ semantic \u201cabstractions\u201d, which are computable approximations of the semantics of the program. We consider this the first scenario for this thesis: malware classification using semantic abstractions extracted from the source code in an open system. Structural features, such as the control flow graphs of programs, can be used to classify malware reasonably well. To demonstrate this, we build a tool for malware analysis, R.E.H.A. which targets the Android system and leverages its openness to extract a structural feature from the source code of malware samples. This tool is first successfully evaluated against a state of the art malware dataset and then on a newly collected dataset. We show that R.E.H.A. is able to classify the new samples into their respective families, often outperforming commercial antivirus software. However, abstractions have limitations by virtue of being approximations. We show that by increasing the granularity of the abstractions used to produce more fine-grained features, we can improve the accuracy of the results as in our second tool, StranDroid, which generates fewer false positives on the same datasets. The source code of malware samples is not often available or easily retrievable. For this reason, we introduce a second scenario in which the classification must be carried out with only the compiled binaries of malware samples on hand. Program similarity in this context cannot be done using semantic abstractions as before, since it is difficult to create meaningful abstractions from zeros and ones. Instead, by treating the compiled programs as raw data, we transform them into images and build upon common image classification algorithms using machine learning. This led us to develop novel deep learning models, a convolutional neural network and a long short-term memory, to classify the samples into their respective families. To overcome the usual obstacle of deep learning of lacking sufficiently large and balanced datasets, we utilize obfuscations as a data augmentation tool to generate semantically equivalent variants of existing samples and expand the dataset as needed. Finally, to lower the computational cost of the training process, we use transfer learning and show that a model trained on one dataset can be used to successfully classify samples in different malware datasets. The third scenario explored in this thesis assumes that even the binary itself cannot be accessed for analysis, but it can be executed, and the execution traces can then be used to extract semantic properties. However, dynamic analysis lacks the formal tools and frameworks that exist in static analysis to allow proving the effectiveness of obfuscations. For this reason, the focus shifts to building a novel formal framework that is able to assess the potency of obfuscations against dynamic analysis. We validate the new framework by using it to encode known analyses and obfuscations, and show how these obfuscations actually hinder the dynamic analysis process
Effiziente und erklärbare Erkennung von mobiler Schadsoftware mittels maschineller Lernmethoden
In recent years, mobile devices shipped with Google’s Android operating system
have become ubiquitous. Due to their popularity and the high concentration of
sensitive user data on these devices, however, they have also become a
profitable target of malware authors. As a result, thousands of new malware
instances targeting Android are found almost every day. Unfortunately, common
signature-based methods often fail to detect these applications, as these
methods can- not keep pace with the rapid development of new malware.
Consequently, there is an urgent need for new malware detection methods to
tackle this growing threat.
In this thesis, we address the problem by combining concepts of static analysis
and machine learning, such that mobile malware can be detected directly on the
mobile device with low run-time overhead. To this end, we first discuss our
analysis results of a sophisticated malware that uses an ultrasonic side
channel to spy on unwitting smartphone users. Based on the insights we gain
throughout this thesis, we gradually develop a method that allows detecting
Android malware in general. The resulting method performs a broad static
analysis, gathering a large number of features associated with an application.
These features are embedded in a joint vector space, where typical patterns
indicative of malware can be automatically identified and used for explaining
the decisions of our method. In addition to an evaluation of its overall
detection and run-time performance, we also examine the interpretability of the
underlying detection model and strengthen the classifier against realistic
evasion attacks.
In a large set of experiments, we show that the method clearly outperforms
several related approaches, including popular anti-virus scanners. In most
experiments, our approach detects more than 90% of all malicious samples in the
dataset at a low false positive rate of only 1%. Furthermore, even on older
devices, it offers a good run-time performance, and can output a decision along
with a proper explanation within a few seconds, despite the use of machine
learning techniques directly on the mobile device.
Overall, we find that the application of machine learning techniques is a
promising research direction to improve the security of mobile devices. While
these techniques alone cannot defeat the threat of mobile malware, they at
least raise the bar for malicious actors significantly, especially if combined
with existing techniques.Die Verbreitung von Smartphones, insbesondere mit dem Android-Betriebssystem,
hat in den vergangenen Jahren stark zugenommen. Aufgrund ihrer hohen
Popularität haben sich diese Geräte jedoch zugleich auch zu einem lukrativen
Ziel für Entwickler von Schadsoftware entwickelt, weshalb mittlerweile täglich
neue Schadprogramme für Android gefunden werden.
Obwohl verschiedene Lösungen existieren, die Schadprogramme auch auf mobilen
Endgeräten identifizieren sollen, bieten diese in der Praxis häufig keinen
ausreichenden Schutz. Dies liegt vor allem daran, dass diese Verfahren zumeist
signaturbasiert arbeiten und somit schädliche Programme erst zuverlässig
identifizieren können, sobald entsprechende Erkennungssignaturen vorhanden
sind. Jedoch wird es für Antiviren-Hersteller immer schwieriger, die zur
Erkennung notwendigen Signaturen rechtzeitig bereitzustellen. Daher ist die
Entwicklung von neuen Verfahren nötig, um der wachsenden Bedrohung durch mobile
Schadsoftware besser begegnen zu können.
In dieser Dissertation wird ein Verfahren vorgestellt und eingehend untersucht,
das Techniken der statischen Code-Analyse mit Methoden des maschinellen Lernens
kombiniert, um so eine zuverlässige Erkennung von mobiler Schadsoftware direkt
auf dem Mobilgerät zu ermöglichen. Die Methode analysiert hierfür mobile
Anwendungen zunächst statisch und extrahiert dabei spezielle Merkmale, die eine
Abbildung einer Applikation in einen hochdimensionalen Vektorraum ermöglichen.
In diesem Vektorraum sind schließlich maschinelle Lernmethoden in der Lage,
automatisch Muster zur Erkennung von Schadprogrammen zu finden. Die gefundenen
Muster können dabei nicht nur zur Erkennung, sondern darüber hinaus auch zur
Erklärung einer getroffenenen Entscheidung dienen.
Im Rahmen einer ausführlichen Evaluation wird nicht nur die Erkennungsleistung
und die Laufzeit der vorgestellten Methode untersucht, sondern darüber hinaus
das gelernte Erkennungsmodell im Detail analysiert. Hierbei wird auch die
Robustheit des Modells gegenüber gezielten Angriffe untersucht und verbessert.
In einer Reihe von Experimenten kann gezeigt werden, dass mit dem
vorgeschlagenen Verfahren bessere Ergebnisse erzielt werden können als mit
vergleichbaren Methoden, sogar einschließlich einiger populärer
Antivirenprogramme. In den meisten Experimenten kann die Methode Schadprogramme
zuverlässig erkennen und erreicht Erkennungsraten von über 90% bei einer
geringen Falsch-Positiv-Rate von 1%