78 research outputs found

    Who you gonna call? Analyzing Web Requests in Android Applications

    Full text link
    Relying on ubiquitous Internet connectivity, applications on mobile devices frequently perform web requests during their execution. They fetch data for users to interact with, invoke remote functionalities, or send user-generated content or meta-data. These requests collectively reveal common practices of mobile application development, like what external services are used and how, and they point to possible negative effects like security and privacy violations, or impacts on battery life. In this paper, we assess different ways to analyze what web requests Android applications make. We start by presenting dynamic data collected from running 20 randomly selected Android applications and observing their network activity. Next, we present a static analysis tool, Stringoid, that analyzes string concatenations in Android applications to estimate constructed URL strings. Using Stringoid, we extract URLs from 30, 000 Android applications, and compare the performance with a simpler constant extraction analysis. Finally, we present a discussion of the advantages and limitations of dynamic and static analyses when extracting URLs, as we compare the data extracted by Stringoid from the same 20 applications with the dynamically collected data

    Explainable Malware Detection System Using Transformers-Based Transfer Learning and Multi-Model Visual Representation

    Get PDF
    Android has become the leading mobile ecosystem because of its accessibility and adaptability. It has also become the primary target of widespread malicious apps. This situation needs the immediate implementation of an effective malware detection system. In this study, an explainable malware detection system was proposed using transfer learning and malware visual features. For effective malware detection, our technique leverages both textual and visual features. First, a pre-trained model called the Bidirectional Encoder Representations from Transformers (BERT) model was designed to extract the trained textual features. Second, the malware-to-image conversion algorithm was proposed to transform the network byte streams into a visual representation. In addition, the FAST (Features from Accelerated Segment Test) extractor and BRIEF (Binary Robust Independent Elementary Features) descriptor were used to efficiently extract and mark important features. Third, the trained and texture features were combined and balanced using the Synthetic Minority Over-Sampling (SMOTE) method; then, the CNN network was used to mine the deep features. The balanced features were then input into the ensemble model for efficient malware classification and detection. The proposed method was analyzed extensively using two public datasets, CICMalDroid 2020 and CIC-InvesAndMal2019. To explain and validate the proposed methodology, an interpretable artificial intelligence (AI) experiment was conducted

    Advances in Cybercrime Prediction: A Survey of Machine, Deep, Transfer, and Adaptive Learning Techniques

    Full text link
    Cybercrime is a growing threat to organizations and individuals worldwide, with criminals using increasingly sophisticated techniques to breach security systems and steal sensitive data. In recent years, machine learning, deep learning, and transfer learning techniques have emerged as promising tools for predicting cybercrime and preventing it before it occurs. This paper aims to provide a comprehensive survey of the latest advancements in cybercrime prediction using above mentioned techniques, highlighting the latest research related to each approach. For this purpose, we reviewed more than 150 research articles and discussed around 50 most recent and relevant research articles. We start the review by discussing some common methods used by cyber criminals and then focus on the latest machine learning techniques and deep learning techniques, such as recurrent and convolutional neural networks, which were effective in detecting anomalous behavior and identifying potential threats. We also discuss transfer learning, which allows models trained on one dataset to be adapted for use on another dataset, and then focus on active and reinforcement Learning as part of early-stage algorithmic research in cybercrime prediction. Finally, we discuss critical innovations, research gaps, and future research opportunities in Cybercrime prediction. Overall, this paper presents a holistic view of cutting-edge developments in cybercrime prediction, shedding light on the strengths and limitations of each method and equipping researchers and practitioners with essential insights, publicly available datasets, and resources necessary to develop efficient cybercrime prediction systems.Comment: 27 Pages, 6 Figures, 4 Table

    Detecting Android Malware Leveraging Text Semantics of Network Flows

    Get PDF
    The emergence of malicious apps poses a serious threat to the Android platform. Most types of mobile malware rely on network interface to coordinate operations, steal users' private information, and launch attack activities. In this paper, we propose an effective and automatic malware detection method using the text semantics of network traffic. In particular, we consider each HTTP flow generated by mobile apps as a text document, which can be processed by natural language processing to extract text-level features. Then, we use the text semantic features of network traffic to develop an effective malware detection model. In an evaluation using 31 706 benign flows and 5258 malicious flows, our method outperforms the existing approaches, and gets an accuracy of 99.15%. We also conduct experiments to verify that the method is effective in detecting newly discovered malware, and requires only a few samples to achieve a good detection result. When the detection model is applied to the real environment to detect unknown applications in the wild, the experimental results show that our method performs significantly better than other popular anti-virus scanners with a detection rate of 54.81%. Our method also reveals certain malware types that can avoid the detection of anti-virus scanners. In addition, we design a detection system on encrypted traffic for bring-your-own-device enterprise network, home network, and 3G/4G mobile network. The detection model is integrated into the system to discover suspicious network behaviors

    Advanced Techniques to Detect Complex Android Malware

    Get PDF
    Android is currently the most popular operating system for mobile devices in the world. However, its openness is the main reason for the majority of malware to be targeting Android devices. Various approaches have been developed to detect malware. Unfortunately, new breeds of malware utilize sophisticated techniques to defeat malware detectors. For example, to defeat signature-based detectors, malware authors change the malware’s signatures to avoid detection. As such, a more effective approach to detect malware is by leveraging malware’s behavioral characteristics. However, if a behavior-based detector is based on static analysis, its reported results may contain a large number of false positives. In real-world usage, completing static analysis within a short time budget can also be challenging. Because of the time constraint, analysts adopt approaches based on dynamic analyses to detect malware. However, dynamic analysis is inherently unsound as it only reports analysis results of the executed paths. Besides, recently discovered malware also employs structure-changing obfuscation techniques to evade detection by state-of-the-art systems. Obfuscation allows malware authors to redistribute known malware samples by changing their structures. These factors motivate a need for malware detection systems that are efficient, effective, and resilient when faced with such evasive tactics. In this dissertation, we describe the developments of three malware detection systems to detect complex malware: DroidClassifier, GranDroid, and Obfusifier. DroidClassifier is a systematic framework for classifying network traffic generated by mobile malware. GranDroid is a graph-based malware detection system that combines dynamic analysis, incremental and partial static analysis, and machine learning to provide time-sensitive malicious network behavior detection with high accuracy. Obfusifier is a highly effective machine-learning-based malware detection system that can sustain its effectiveness even when malware authors obfuscate these malicious apps using complex and composite techniques. Our empirical evaluations reveal that DroidClassifier can successfully identify different families of malware with 94.33% accuracy on average. We have also shown GranDroid is quite effective in detecting network-related malware. It achieves 93.0% accuracy, which outperforms other related systems. Lastly, we demonstrate that Obfusifier can achieve 95% precision, recall, and F-measure, collaborating its resilience to complex obfuscation techniques. Adviser: Qiben Yan and Witawas Srisa-a

    iGen: Toward Automatic Generation and Analysis of Indicators of Compromise (IOCs) using Convolutional Neural Network

    Get PDF
    abstract: Field of cyber threats is evolving rapidly and every day multitude of new information about malware and Advanced Persistent Threats (APTs) is generated in the form of malware reports, blog articles, forum posts, etc. However, current Threat Intelligence (TI) systems have several limitations. First, most of the TI systems examine and interpret data manually with the help of analysts. Second, some of them generate Indicators of Compromise (IOCs) directly using regular expressions without understanding the contextual meaning of those IOCs from the data sources which allows the tools to include lot of false positives. Third, lot of TI systems consider either one or two data sources for the generation of IOCs, and misses some of the most valuable IOCs from other data sources. To overcome these limitations, we propose iGen, a novel approach to fully automate the process of IOC generation and analysis. Proposed approach is based on the idea that our model can understand English texts like human beings, and extract the IOCs from the different data sources intelligently. Identification of the IOCs is done on the basis of the syntax and semantics of the sentence as well as context words (e.g., ``attacked'', ``suspicious'') present in the sentence which helps the approach work on any kind of data source. Our proposed technique, first removes the words with no contextual meaning like stop words and punctuations etc. Then using the rest of the words in the sentence and output label (IOC or non-IOC sentence), our model intelligently learn to classify sentences into IOC and non-IOC sentences. Once IOC sentences are identified using this learned Convolutional Neural Network (CNN) based approach, next step is to identify the IOC tokens (like domains, IP, URL) in the sentences. This CNN based classification model helps in removing false positives (like IPs which are not malicious). Afterwards, IOCs extracted from different data sources are correlated to find the links between thousands of apparently unrelated attack instances, particularly infrastructures shared between them. Our approach fully automates the process of IOC generation from gathering data from different sources to creating rules (e.g. OpenIOC, snort rules, STIX rules) for deployment on the security infrastructure. iGen has collected around 400K IOCs till now with a precision of 95\%, better than any state-of-art method.Dissertation/ThesisMasters Thesis Computer Science 201

    Identifying Authorship Style in Malicious Binaries: Techniques, Challenges & Datasets

    Get PDF
    Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to identify authorship style. Our survey explores malicious author style and the adversarial techniques used by them to remain anonymous. We examine the adversarial impact on the state-of-the-art methods. We identify key findings and explore the open research challenges. To mitigate the lack of ground truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 15,660 malware labeled to 164 threat actor groups

    Studying JavaScript Security Through Static Analysis

    Get PDF
    Mit dem stetigen Wachstum des Internets wächst auch das Interesse von Angreifern. Ursprünglich sollte das Internet Menschen verbinden; gleichzeitig benutzen aber Angreifer diese Vernetzung, um Schadprogramme wirksam zu verbreiten. Insbesondere JavaScript ist zu einem beliebten Angriffsvektor geworden, da es Angreifer ermöglicht Bugs und weitere Sicherheitslücken auszunutzen, und somit die Sicherheit und Privatsphäre der Internetnutzern zu gefährden. In dieser Dissertation fokussieren wir uns auf die Erkennung solcher Bedrohungen, indem wir JavaScript Code statisch und effizient analysieren. Zunächst beschreiben wir unsere zwei Detektoren, welche Methoden des maschinellen Lernens mit statischen Features aus Syntax, Kontroll- und Datenflüssen kombinieren zur Erkennung bösartiger JavaScript Dateien. Wir evaluieren daraufhin die Verlässlichkeit solcher statischen Systeme, indem wir bösartige JavaScript Dokumente umschreiben, damit sie die syntaktische Struktur von bestehenden gutartigen Skripten reproduzieren. Zuletzt studieren wir die Sicherheit von Browser Extensions. Zu diesem Zweck modellieren wir Extensions mit einem Graph, welcher Kontroll-, Daten-, und Nachrichtenflüsse mit Pointer Analysen kombiniert, wodurch wir externe Flüsse aus und zu kritischen Extension-Funktionen erkennen können. Insgesamt wiesen wir 184 verwundbare Chrome Extensions nach, welche die Angreifer ausnutzen könnten, um beispielsweise beliebigen Code im Browser eines Opfers auszuführen.As the Internet keeps on growing, so does the interest of malicious actors. While the Internet has become widespread and popular to interconnect billions of people, this interconnectivity also simplifies the spread of malicious software. Specifically, JavaScript has become a popular attack vector, as it enables to stealthily exploit bugs and further vulnerabilities to compromise the security and privacy of Internet users. In this thesis, we approach these issues by proposing several systems to statically analyze real-world JavaScript code at scale. First, we focus on the detection of malicious JavaScript samples. To this end, we propose two learning-based pipelines, which leverage syntactic, control and data-flow based features to distinguish benign from malicious inputs. Subsequently, we evaluate the robustness of such static malicious JavaScript detectors in an adversarial setting. For this purpose, we introduce a generic camouflage attack, which consists in rewriting malicious samples to reproduce existing benign syntactic structures. Finally, we consider vulnerable browser extensions. In particular, we abstract an extension source code at a semantic level, including control, data, and message flows, and pointer analysis, to detect suspicious data flows from and toward an extension privileged context. Overall, we report on 184 Chrome extensions that attackers could exploit to, e.g., execute arbitrary code in a victim's browser
    • …
    corecore