18 research outputs found

    Android Malware Family Classification Based on Resource Consumption over Time

    Full text link
    The vast majority of today's mobile malware targets Android devices. This has pushed the research effort in Android malware analysis in the last years. An important task of malware analysis is the classification of malware samples into known families. Static malware analysis is known to fall short against techniques that change static characteristics of the malware (e.g. code obfuscation), while dynamic analysis has proven effective against such techniques. To the best of our knowledge, the most notable work on Android malware family classification purely based on dynamic analysis is DroidScribe. With respect to DroidScribe, our approach is easier to reproduce. Our methodology only employs publicly available tools, does not require any modification to the emulated environment or Android OS, and can collect data from physical devices. The latter is a key factor, since modern mobile malware can detect the emulated environment and hide their malicious behavior. Our approach relies on resource consumption metrics available from the proc file system. Features are extracted through detrended fluctuation analysis and correlation. Finally, a SVM is employed to classify malware into families. We provide an experimental evaluation on malware samples from the Drebin dataset, where we obtain a classification accuracy of 82%, proving that our methodology achieves an accuracy comparable to that of DroidScribe. Furthermore, we make the software we developed publicly available, to ease the reproducibility of our results.Comment: Extended Versio

    Machine learning techniques for identification using mobile and social media data

    Get PDF
    Networked access and mobile devices provide near constant data generation and collection. Users, environments, applications, each generate different types of data; from the voluntarily provided data posted in social networks to data collected by sensors on mobile devices, it is becoming trivial to access big data caches. Processing sufficiently large amounts of data results in inferences that can be characterized as privacy invasive. In order to address privacy risks we must understand the limits of the data exploring relationships between variables and how the user is reflected in them. In this dissertation we look at data collected from social networks and sensors to identify some aspect of the user or their surroundings. In particular, we find that from social media metadata we identify individual user accounts and from the magnetic field readings we identify both the (unique) cellphone device owned by the user and their course-grained location. In each project we collect real-world datasets and apply supervised learning techniques, particularly multi-class classification algorithms to test our hypotheses. We use both leave-one-out cross validation as well as k-fold cross validation to reduce any bias in the results. Throughout the dissertation we find that unprotected data reveals sensitive information about users. Each chapter also contains a discussion about possible obfuscation techniques or countermeasures and their effectiveness with regards to the conclusions we present. Overall our results show that deriving information about users is attainable and, with each of these results, users would have limited if any indication that any type of analysis was taking place

    Android malware detection using machine learning to mitigate adversarial evasion attacks

    Get PDF
    In the current digital era, smartphones have become indispensable. Over the past few years, the exponential growth of Android users has made this operating system (OS) a prime target for smartphone malware. Consequently, the arms race between Android security personnel and malware developers seems enduring. Considering Machine Learning (ML) as the core component, various techniques are proposed in the literature to counter Android malware, however, the problem of adversarial evasion attacks on ML-based malware classifiers is understated. MLbased techniques are vulnerable to adversarial evasion attacks. The malware authors constantly try to craft adversarial examples to elude existing malware detection systems. This research presents the fragility of ML-based Android malware classifiers in adversarial environments and proposes novel techniques to counter adversarial evasion attacks on ML based Android malware classifiers. First, we start our analysis by introducing the problem of Android malware detection in adversarial environments and provide a comprehensive overview of the domain. Second, we highlight the problem of malware clones in popular Android malware repositories. The malware clones in the datasets can potentially lead to biased results and computational overhead. Although many strategies are proposed in the literature to detect repackaged Android malware, these techniques require burdensome code inspection. Consequently, we employ a lightweight and novel strategy based on package names reusing to identify repackaged Android malware and build a clones-free Android malware dataset. Furthermore, we investigate the impact of repacked Android malware on various ML-based classifiers by training them on a clones free training set and testing on a set of benign, non repacked malware and all the malware clones in the dataset. Although trained on a reduced train set, we achieved up to 98.7% F1 score. Third, we propose Cure-Droid, an Android malware classification model trained on hybrid features and optimized using a tree-based pipeline optimization technique (TPoT). Fourth, to present the fragility of Cure- Droid model in adversarial environments, we formulate multiple adversarial evasion attacks to elude the model. Fifth, to counter adversarial evasion attacks on ML-based Android malware detectors, we propose CureDroid*, a novel and adversarially aware Android malware classification model. CureDroid* is based on an ensemble of ML-based models trained on distinct set of features where each model has the individual capability to detect Android malware. The CureDroid* model employs an ensemble of five ML-based models where each model is selected and optimized using TPoT. Our experimental results demonstrate that CureDroid* achieves up to 99.2% accuracy in non-adversarial settings and can detect up to 30 fabricated input features in the best case. Finally, we propose TrickDroid, a novel cumulative adversarial training framework based on Oracle and GAN-based adversarial data. Our experimental results present the efficacy of TrickDroid with up to 99.46% evasion detection

    Creating better ground truth to further understand Android malware: A large scale mining approach based on antivirus labels and malicious artifacts

    Get PDF
    Mobile applications are essential for interacting with technology and other people. With more than 2 billion devices deployed all over the world, Android offers a thriving ecosystem by making accessible the work of thousands of developers on digital marketplaces such as Google Play. Nevertheless, the success of Android also exposes millions of users to malware authors who seek to siphon private information and hijack mobile devices for their benefits. To fight against the proliferation of Android malware, the security community embraced machine learning, a branch of artificial intelligence that powers a new generation of detection systems. Machine learning algorithms, however, require a substantial number of qualified samples to learn the classification rules enforced by security experts. Unfortunately, malware ground truths are notoriously hard to construct due to the inherent complexity of Android applications and the global lack of public information about malware. In a context where both information and human resources are limited, the security community is in demand for new approaches to aid practitioners to accurately define Android malware, automate classification decisions, and improve the comprehension of Android malware. This dissertation proposes three solutions to assist with the creation of malware ground truths. The first contribution is STASE, an analytical framework that qualifies the composition of malware ground truths. STASE reviews the information shared by antivirus products with nine metrics in order to support the reproducibility of research experiments and detect potential biases. This dissertation reports the results of STASE against three typical settings and suggests additional recommendations for designing experiments based on Android malware. The second contribution is EUPHONY, a heuristic system built to unify family clusters belonging to malware ground truths. EUPHONY exploits the co-occurrence of malware labels obtained from antivirus reports to study the relationship between Android applications and proposes a single family name per sample for the sake of facilitating malware experiments. This dissertation evaluates EUPHONY on well-known malware ground truths to assess the precision of our approach and produce a large dataset of malware tags for the research community. The third contribution is AP-GRAPH, a knowledge database for dissecting the characteristics of malware ground truths. AP-GRAPH leverages the results of EUPHONY and static analysis to index artifacts that are highly correlated with malware activities and recommend the inspection of the most suspicious components. This dissertation explores the set of artifacts retrieved by AP-GRAPH from popular malware families to track down their correlation and their evolution compared to other malware populations

    On the malware detection problem : challenges and novel approaches

    Get PDF
    Orientador: AndrĂ© Ricardo Abed GrĂ©gioCoorientador: Paulo LĂ­cio de GeusTese (doutorado) - Universidade Federal do ParanĂĄ, Setor de CiĂȘncias Exatas, Programa de PĂłs-Graduação em InformĂĄtica. Defesa : Curitiba,Inclui referĂȘnciasÁrea de concentração: CiĂȘncia da ComputaçãoResumo: Software Malicioso (malware) Ă© uma das maiores ameaças aos sistemas computacionais atuais, causando danos Ă  imagem de indivĂ­duos e corporaçÔes, portanto requerendo o desenvolvimento de soluçÔes de detecção para prevenir que exemplares de malware causem danos e para permitir o uso seguro dos sistemas. Diversas iniciativas e soluçÔes foram propostas ao longo do tempo para detectar exemplares de malware, de Anti-VĂ­rus (AVs) a sandboxes, mas a detecção de malware de forma efetiva e eficiente ainda se mantĂ©m como um problema em aberto. Portanto, neste trabalho, me proponho a investigar alguns desafios, falĂĄcias e consequĂȘncias das pesquisas em detecção de malware de modo a contribuir para o aumento da capacidade de detecção das soluçÔes de segurança. Mais especificamente, proponho uma nova abordagem para o desenvolvimento de experimentos com malware de modo prĂĄtico mas ainda cientĂ­fico e utilizo-me desta abordagem para investigar quatro questĂ”es relacionadas a pesquisa em detecção de malware: (i) a necessidade de se entender o contexto das infecçÔes para permitir a detecção de ameaças em diferentes cenĂĄrios; (ii) a necessidade de se desenvolver melhores mĂ©tricas para a avaliação de soluçÔes antivĂ­rus; (iii) a viabilidade de soluçÔes com colaboração entre hardware e software para a detecção de malware de forma mais eficiente; (iv) a necessidade de predizer a ocorrĂȘncia de novas ameaças de modo a permitir a resposta Ă  incidentes de segurança de forma mais rĂĄpida.Abstract: Malware is a major threat to most current computer systems, causing image damages and financial losses to individuals and corporations, thus requiring the development of detection solutions to prevent malware to cause harm and allow safe computers usage. Many initiatives and solutions to detect malware have been proposed over time, from AntiViruses (AVs) to sandboxes, but effective and efficient malware detection remains as a still open problem. Therefore, in this work, I propose taking a look on some malware detection challenges, pitfalls and consequences to contribute towards increasing malware detection system's capabilities. More specifically, I propose a new approach to tackle malware research experiments in a practical but still scientific manner and leverage this approach to investigate four issues: (i) the need for understanding context to allow proper detection of localized threats; (ii) the need for developing better metrics for AV solutions evaluation; (iii) the feasibility of leveraging hardware-software collaboration for efficient AV implementation; and (iv) the need for predicting future threats to allow faster incident responses

    Advances in Computer Recognition, Image Processing and Communications, Selected Papers from CORES 2021 and IP&C 2021

    Get PDF
    As almost all human activities have been moved online due to the pandemic, novel robust and efficient approaches and further research have been in higher demand in the field of computer science and telecommunication. Therefore, this (reprint) book contains 13 high-quality papers presenting advancements in theoretical and practical aspects of computer recognition, pattern recognition, image processing and machine learning (shallow and deep), including, in particular, novel implementations of these techniques in the areas of modern telecommunications and cybersecurity

    Symmetry-Adapted Machine Learning for Information Security

    Get PDF
    Symmetry-adapted machine learning has shown encouraging ability to mitigate the security risks in information and communication technology (ICT) systems. It is a subset of artificial intelligence (AI) that relies on the principles of processing future events by learning past events or historical data. The autonomous nature of symmetry-adapted machine learning supports effective data processing and analysis for security detection in ICT systems without the interference of human authorities. Many industries are developing machine-learning-adapted solutions to support security for smart hardware, distributed computing, and the cloud. In our Special Issue book, we focus on the deployment of symmetry-adapted machine learning for information security in various application areas. This security approach can support effective methods to handle the dynamic nature of security attacks by extraction and analysis of data to identify hidden patterns of data. The main topics of this Issue include malware classification, an intrusion detection system, image watermarking, color image watermarking, battlefield target aggregation behavior recognition model, IP camera, Internet of Things (IoT) security, service function chain, indoor positioning system, and crypto-analysis

    Graph-Based Machine Learning for Passive Network Reconnaissance within Encrypted Networks

    Get PDF
    Network reconnaissance identifies a network’s vulnerabilities to both prevent and mitigate the impact of cyber-attacks. The difficulty of performing adequate network reconnaissance has been exacerbated by the rising complexity of modern networks (e.g., encryption). We identify that the majority of network reconnaissance solutions proposed in literature are infeasible for widespread deployment in realistic modern networks. This thesis provides novel network reconnaissance solutions to address the limitations of the existing conventional approaches proposed in literature. The existing approaches are limited by their reliance on large, heterogeneous feature sets making them difficult to deploy under realistic network conditions. In contrast, we devise a bipartite graph-based representation to create network reconnaissance solutions that rely only on a single feature (e.g., the Internet protocol (IP) address field). We exploit a widely available feature set to provide network reconnaissance solutions that are scalable, independent of encryption, and deployable across diverse Internet (TCP/IP) networks. We design bipartite graph embeddings (BGE); a graph-based machine learning (ML) technique for extracting insight from the structural properties of the bipartite graph-based representation. BGE is the first known graph embedding technique designed explicitly for network reconnaissance. We validate the use of BGE through an evaluation of a university’s enterprise network. BGE is shown to provide insight into crucial areas of network reconnaissance (e.g., device characterisation, service prediction, and network visualisation). We design an extension of BGE to acquire insight within a private network. Private networks—such as a virtual private network (VPN)—have posed significant challenges for network reconnaissance as they deny direct visibility into their composition. Our extension of BGE provides the first known solution for inferring the composition of both the devices and applications acting behind diverse private networks. This thesis provides novel graph-based ML techniques for two crucial aims of network reconnaissance—device characterisation and intrusion detection. The techniques developed within this thesis provide unique cybersecurity solutions to both prevent and mitigate the impact of cyber-attacks.Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering , 202
    corecore