7 research outputs found

    An in-Depth Study of the Jisut Family of Android Ransomware

    Get PDF
    Android malware is increasing in spread and complexity. Advanced obfuscation, emulation detection, delayed payload activation or dynamic code loading are some of the techniques employed by the current malware to hinder the use of reverse engineering techniques and anti-malware tools. This growing complexity is particularly noticeable in the evolution of different strands of the same malware family. Over the years, these families mature to become more effective by incorporating new and enhanced techniques. In this paper, we focus on a particular Android ransomware family named Jisut, and perform a thorough technical analysis. We also provide a detailed overall perspective, which will hopefully help to create new tools and techniques to tackle more effectively the threat posed by ransomware

    Machine learning techniques for android malware detection and classification

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de Lectura: 15-03-2019la realización de esta tesis no habría sido posible sin la financiación aportada por el proyecto CIBERDINE: Cybersecurity, Data and Risks (S2013/ICE3095) concedido por la Comunidad de Madrid

    GroupDroid: Automatically Grouping Mobile Malware by Extracting Code Similarities

    Get PDF
    As shown in previous work, malware authors often reuse portions of code in the development of their samples. Especially in the mobile scenario, there exists a phenomena, called piggybacking, that describes the act of embedding malicious code inside benign apps. In this paper, we leverage such observations to analyze mobile malware by looking at its similarities. In practice, we propose a novel approach that identifies and extracts code similarities in mobile apps. Our approach is based on static analysis and works by computing the Control Flow Graph of each method and encoding it in a feature vector used to measure similarities. We implemented our approach in a tool, GroupDroid, able to group mobile apps together according to their code similarities. Armed with GroupDroid, we then analyzed modern mobile malware samples. Our experiments show that GroupDroid is able to correctly and accurately distinguish different malware variants, and to provide useful and detailed information about the similar portions of malicious code

    Techniques for advanced android malware triage

    Get PDF
    Mención Internacional en el título de doctorAndroid is the leading operating system in smartphones with a big difference. Statistics show that 88% of all smartphones sold to end users in the second quarter of 2018 were phones with the Android OS. Regardless of the operating systems which are running on smartphones, most of the functionalities of these devices are offered through applications. There are currently over 2 million apps only on the official Google store, known as Google Play. This huge market with billions of users is tempting for attackers to develop and distribute their malicious apps (or malware). Mobile malware has raised explosively since 2009. Symantec reported an increase of 54% in the new mobile malware variants in 2017 as compared to the previous year. Additionally, more incentive has been provided for profit-driven malware by the growth of black markets. This rise has happened for Android malware as well since only 20% of devices are running the newest major version of Android OS based on Symantec report in 2018. Android continued to be the most targeted platform with the biggest number of attacks in 2015. After that year, attacks against the Android platform slowed for the first time as attackers were faced with improved security architectures though Android is still the main appealing target OS for attackers. Moreover, advanced types of Android malware are found which make use of extensive anit-analysis techniques to evade static or dynamic analysis. To address the security and privacy concerns of complex Android malware, this dissertation focuses on three main objectives. First of all, we propose a light-weight yet efficient method to identify risky Android applications. Next, we present a precise approach to characterize Android malware based on their malicious behavior. Finally, we propose an adaptive learning system to address the security concerns of obfuscation in Android malware. Identifying potentially dangerous and risky applications is an important step in Android malware analysis. To this end, we develop a triage system to rank applications based on their potential risk. Our approach, called TriFlow, relies on static features which are quick to obtain. TriFlow combines a probabilistic model to predict the existence of information flows with a metric of how significant a flow is in benign and malicious apps. Based on this, TriFlow provides a score for each application that can be used to prioritize analysis. It also provides the analysts with an explanatory report of the associated risk. Our tool can also be used as a complement with computationally expensive static and dynamic analysis tools. Another important step towards Android malware analysis lies in their accurate characterization. Labeling Android malware is challenging yet crucially important, as it helps to identify upcoming malware samples and threats. A key challenge is that different researchers and anti-virus vendors assign labels using their own criteria, and it is not known to what extent these labels are aligned with the apps’ real behavior. Based on this, we propose a new behavioral characterization method for Android apps based on their extracted information flows. As information flows can be used to track why and how apps use specific pieces of information, a flowbased characterization provides a relatively easy-to-interpret summary of the malware sample’s behavior. Not all Android malware are easy to analyze due to advanced and easyto-apply anti-analysis techniques that are available nowadays. Obfuscation is the most common anti-analysis technique that Android malware use to evade detection. Obfuscation techniques modify an app’s source (or machine) code in order to make it more difficult to analyze. This is typically applied to protect intellectual property in benign apps, or to hinder the process of extracting actionable information in the case of malware. Since malware analysis often requires considerable resource investment, detecting the particular obfuscation technique used may contribute to apply the right analysis tools, thus leading to some savings. Therefore, we propose AndrODet, a mechanism to detect three popular types of obfuscation in Android applications, namely identifier renaming, string encryption, and control flow obfuscation. AndrODet leverages online learning techniques, thus being suitable for resource-limited environments that need to operate in a continuous manner. We compare our results with a batch learning algorithm using a dataset of 34,962 apps from both malware and benign apps. Experimental results show that online learning approaches are not only able to compete with batch learning methods in terms of accuracy, but they also save significant amount of time and computational resources. Finally, we present a number of open research directions based on the outcome of this thesis.Android es el sistema operativo líder en teléfonos inteligentes (también denominados con la palabra inglesa smartphones), con una gran diferencia con respecto al resto de competidores. Las estadísticas muestran que el 88% de todos los smartphones vendidos a usuarios finales en el segundo trimestre de 2018 fueron teléfonos con sistema operativo Android. Independientemente de su sistema operativo, la mayoría de las funcionalidades de estos dispositivos se ofrecen a través de aplicaciones. Actualmente hay más de 2 millones de aplicaciones solo en la tienda oficial de Google, conocida como Google Play. Este enorme mercado con miles de millones de usuarios es tentador para los atacantes, que buscan distribuir sus aplicaciones malintencionadas (o malware). El malware para dispositivos móviles ha aumentado de forma exponencial desde 2009. Symantec ha detectado un aumento del 54% en las nuevas variantes de malware para dispositivos móviles en 2017 en comparación con el año anterior. Además, el crecimiento del mercado negro (es decir, plataformas no oficiales de descargas de aplicaciones) supone un incentivo para los programas maliciosos con fines lucrativos. Este aumento también ha ocurrido en el malware de Android, aprovechando la circunstancia de que solo el 20% de los dispositivos ejecutan la versión mas reciente del sistema operativo Android, de acuerdo con el informe de Symantec en 2018. De hecho, Android ha sido la plataforma que ha centrado los esfuerzos de los atacantes desde 2015, aunque los ataques decayeron ligeramente tras ese año debido a las mejoras de seguridad incorporadas en el sistema operativo. En todo caso, existen formas avanzadas de malware para Android que hacen uso de técnicas sofisticadas para evadir el análisis estático o dinámico. Para abordar los problemas de seguridad y privacidad que causa el malware en Android, esta Tesis se centra en tres objetivos principales. En primer lugar, se propone un método ligero y eficiente para identificar aplicaciones de Android que pueden suponer un riesgo. Por otra parte, se presenta un mecanismo para la caracterización del malware atendiendo a su comportamiento. Finalmente, se propone un mecanismo basado en aprendizaje adaptativo para la detección de algunos tipos de ofuscación que son empleados habitualmente en las aplicaciones maliciosas. Identificar aplicaciones potencialmente peligrosas y riesgosas es un paso importante en el análisis de malware de Android. Con este fin, en esta Tesis se desarrolla un mecanismo de clasificación (llamado TriFlow) que ordena las aplicaciones según su riesgo potencial. La aproximación se basa en características estáticas que se obtienen rápidamente, siendo de especial interés los flujos de información. Un flujo de información existe cuando un cierto dato es recibido o producido mediante una cierta función o llamada al sistema, y atraviesa la lógica de la aplicación hasta que llega a otra función. Así, TriFlow combina un modelo probabilístico para predecir la existencia de un flujo con una métrica de lo habitual que es encontrarlo en aplicaciones benignas y maliciosas. Con ello, TriFlow proporciona una puntuación para cada aplicación que puede utilizarse para priorizar su análisis. Al mismo tiempo, proporciona a los analistas un informe explicativo de las causas que motivan dicha valoración. Así, esta herramienta se puede utilizar como complemento a otras técnicas de análisis estático y dinámico que son mucho más costosas desde el punto de vista computacional. Otro paso importante hacia el análisis de malware de Android radica en caracterizar su comportamiento. Etiquetar el malware de Android es un desafío de crucial importancia, ya que ayuda a identificar las próximas muestras y amenazas de malware. Una cuestión relevante es que los diferentes investigadores y proveedores de antivirus asignan etiquetas utilizando sus propios criterios, de modo no se sabe en qué medida estas etiquetas están en línea con el comportamiento real de las aplicaciones. Sobre esta base, en esta Tesis se propone un nuevo método de caracterización de comportamiento para las aplicaciones de Android en función de sus flujos de información. Como dichos flujos se pueden usar para estudiar el uso de cada dato por parte de una aplicación, permiten proporcionar un resumen relativamente sencillo del comportamiento de una determinada muestra de malware. A pesar de la utilidad de las técnicas de análisis descritas, no todos los programas maliciosos de Android son fáciles de analizar debido al uso de técnicas anti-análisis que están disponibles en la actualidad. Entre ellas, la ofuscación es la técnica más común que se utiliza en el malware de Android para evadir la detección. Dicha técnica modifica el código de una aplicación para que sea más difícil de entender y analizar. Esto se suele aplicar para proteger la propiedad intelectual en aplicaciones benignas o para dificultar la obtención de pistas sobre su funcionamiento en el caso del malware. Dado que el análisis de malware a menudo requiere una inversión considerable de recursos, detectar la técnica de ofuscación que se ha utilizado en un caso particular puede contribuir a utilizar herramientas de análisis adecuadas, contribuyendo así a un cierto ahorro de recursos. Así, en esta Tesis se propone AndrODet, un mecanismo para detectar tres tipos populares de ofuscación, a saber, el renombrado de identificadores, cifrado de cadenas de texto y la modificación del flujo de control de la aplicación. AndrODet se basa en técnicas de aprendizaje automático en línea (online machine learning), por lo que es adecuado para entornos con recursos limitados que necesitan operar de forma continua, sin interrupción. Para medir su eficacia respecto de las técnicas de aprendizaje automático tradicionales, se comparan los resultados con un algoritmo de aprendizaje por lotes (batch learning) utilizando un dataset de 34.962 aplicaciones de malware y benignas. Los resultados experimentales muestran que el enfoque de aprendizaje en línea no solo es capaz de competir con el basado en lotes en términos de precisión, sino que también ahorra una gran cantidad de tiempo y recursos computacionales. Tras la exposición de las contribuciones anteriormente mencionadas, esta Tesis concluye con la identificación de una serie de líneas abiertas de investigación con el fin de alentar el desarrollo de trabajos futuros en esta dirección.Omid Mirzaei is a Ph.D. candidate in the Computer Security Lab (COSEC) at the Department of Computer Science and Engineering of Universidad Carlos III de Madrid (UC3M). His Ph.D. is funded by the Community of Madrid and the European Union through the research project CIBERDINE (Ref. S2013/ICE-3095).Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Gregorio Martínez Pérez.- Secretario: Pedro Peris López.- Vocal: Pablo Picazo Sánche

    Program Similarity Analysis for Malware Classification and its Pitfalls

    Get PDF
    Malware classification, specifically the task of grouping malware samples into families according to their behaviour, is vital in order to understand the threat they pose and how to protect against them. Recognizing whether one program shares behaviors with another is a task that requires semantic reasoning, meaning that it needs to consider what a program actually does. This is a famously uncomputable problem, due to Rice\u2019s theorem. As there is no one-size-fits-all solution, determining program similarity in the context of malware classification requires different tools and methods depending on what is available to the malware defender. When the malware source code is readily available (or at least, easy to retrieve), most approaches employ semantic \u201cabstractions\u201d, which are computable approximations of the semantics of the program. We consider this the first scenario for this thesis: malware classification using semantic abstractions extracted from the source code in an open system. Structural features, such as the control flow graphs of programs, can be used to classify malware reasonably well. To demonstrate this, we build a tool for malware analysis, R.E.H.A. which targets the Android system and leverages its openness to extract a structural feature from the source code of malware samples. This tool is first successfully evaluated against a state of the art malware dataset and then on a newly collected dataset. We show that R.E.H.A. is able to classify the new samples into their respective families, often outperforming commercial antivirus software. However, abstractions have limitations by virtue of being approximations. We show that by increasing the granularity of the abstractions used to produce more fine-grained features, we can improve the accuracy of the results as in our second tool, StranDroid, which generates fewer false positives on the same datasets. The source code of malware samples is not often available or easily retrievable. For this reason, we introduce a second scenario in which the classification must be carried out with only the compiled binaries of malware samples on hand. Program similarity in this context cannot be done using semantic abstractions as before, since it is difficult to create meaningful abstractions from zeros and ones. Instead, by treating the compiled programs as raw data, we transform them into images and build upon common image classification algorithms using machine learning. This led us to develop novel deep learning models, a convolutional neural network and a long short-term memory, to classify the samples into their respective families. To overcome the usual obstacle of deep learning of lacking sufficiently large and balanced datasets, we utilize obfuscations as a data augmentation tool to generate semantically equivalent variants of existing samples and expand the dataset as needed. Finally, to lower the computational cost of the training process, we use transfer learning and show that a model trained on one dataset can be used to successfully classify samples in different malware datasets. The third scenario explored in this thesis assumes that even the binary itself cannot be accessed for analysis, but it can be executed, and the execution traces can then be used to extract semantic properties. However, dynamic analysis lacks the formal tools and frameworks that exist in static analysis to allow proving the effectiveness of obfuscations. For this reason, the focus shifts to building a novel formal framework that is able to assess the potency of obfuscations against dynamic analysis. We validate the new framework by using it to encode known analyses and obfuscations, and show how these obfuscations actually hinder the dynamic analysis process

    Effiziente und erklärbare Erkennung von mobiler Schadsoftware mittels maschineller Lernmethoden

    Get PDF
    In recent years, mobile devices shipped with Google’s Android operating system have become ubiquitous. Due to their popularity and the high concentration of sensitive user data on these devices, however, they have also become a profitable target of malware authors. As a result, thousands of new malware instances targeting Android are found almost every day. Unfortunately, common signature-based methods often fail to detect these applications, as these methods can- not keep pace with the rapid development of new malware. Consequently, there is an urgent need for new malware detection methods to tackle this growing threat. In this thesis, we address the problem by combining concepts of static analysis and machine learning, such that mobile malware can be detected directly on the mobile device with low run-time overhead. To this end, we first discuss our analysis results of a sophisticated malware that uses an ultrasonic side channel to spy on unwitting smartphone users. Based on the insights we gain throughout this thesis, we gradually develop a method that allows detecting Android malware in general. The resulting method performs a broad static analysis, gathering a large number of features associated with an application. These features are embedded in a joint vector space, where typical patterns indicative of malware can be automatically identified and used for explaining the decisions of our method. In addition to an evaluation of its overall detection and run-time performance, we also examine the interpretability of the underlying detection model and strengthen the classifier against realistic evasion attacks. In a large set of experiments, we show that the method clearly outperforms several related approaches, including popular anti-virus scanners. In most experiments, our approach detects more than 90% of all malicious samples in the dataset at a low false positive rate of only 1%. Furthermore, even on older devices, it offers a good run-time performance, and can output a decision along with a proper explanation within a few seconds, despite the use of machine learning techniques directly on the mobile device. Overall, we find that the application of machine learning techniques is a promising research direction to improve the security of mobile devices. While these techniques alone cannot defeat the threat of mobile malware, they at least raise the bar for malicious actors significantly, especially if combined with existing techniques.Die Verbreitung von Smartphones, insbesondere mit dem Android-Betriebssystem, hat in den vergangenen Jahren stark zugenommen. Aufgrund ihrer hohen Popularität haben sich diese Geräte jedoch zugleich auch zu einem lukrativen Ziel für Entwickler von Schadsoftware entwickelt, weshalb mittlerweile täglich neue Schadprogramme für Android gefunden werden. Obwohl verschiedene Lösungen existieren, die Schadprogramme auch auf mobilen Endgeräten identifizieren sollen, bieten diese in der Praxis häufig keinen ausreichenden Schutz. Dies liegt vor allem daran, dass diese Verfahren zumeist signaturbasiert arbeiten und somit schädliche Programme erst zuverlässig identifizieren können, sobald entsprechende Erkennungssignaturen vorhanden sind. Jedoch wird es für Antiviren-Hersteller immer schwieriger, die zur Erkennung notwendigen Signaturen rechtzeitig bereitzustellen. Daher ist die Entwicklung von neuen Verfahren nötig, um der wachsenden Bedrohung durch mobile Schadsoftware besser begegnen zu können. In dieser Dissertation wird ein Verfahren vorgestellt und eingehend untersucht, das Techniken der statischen Code-Analyse mit Methoden des maschinellen Lernens kombiniert, um so eine zuverlässige Erkennung von mobiler Schadsoftware direkt auf dem Mobilgerät zu ermöglichen. Die Methode analysiert hierfür mobile Anwendungen zunächst statisch und extrahiert dabei spezielle Merkmale, die eine Abbildung einer Applikation in einen hochdimensionalen Vektorraum ermöglichen. In diesem Vektorraum sind schließlich maschinelle Lernmethoden in der Lage, automatisch Muster zur Erkennung von Schadprogrammen zu finden. Die gefundenen Muster können dabei nicht nur zur Erkennung, sondern darüber hinaus auch zur Erklärung einer getroffenenen Entscheidung dienen. Im Rahmen einer ausführlichen Evaluation wird nicht nur die Erkennungsleistung und die Laufzeit der vorgestellten Methode untersucht, sondern darüber hinaus das gelernte Erkennungsmodell im Detail analysiert. Hierbei wird auch die Robustheit des Modells gegenüber gezielten Angriffe untersucht und verbessert. In einer Reihe von Experimenten kann gezeigt werden, dass mit dem vorgeschlagenen Verfahren bessere Ergebnisse erzielt werden können als mit vergleichbaren Methoden, sogar einschließlich einiger populärer Antivirenprogramme. In den meisten Experimenten kann die Methode Schadprogramme zuverlässig erkennen und erreicht Erkennungsraten von über 90% bei einer geringen Falsch-Positiv-Rate von 1%
    corecore