980 research outputs found

    The Paradox of Choice: Investigating Selection Strategies for Android Malware Datasets Using a Machine-learning Approach

    Get PDF
    The increase in the number of mobile devices that use the Android operating system has attracted the attention of cybercriminals who want to disrupt or gain unauthorized access to them through malware infections. To prevent such malware, cybersecurity experts and researchers require datasets of malware samples that most available antivirus software programs cannot detect. However, researchers have infrequently discussed how to identify evolving Android malware characteristics from different sources. In this paper, we analyze a wide variety of Android malware datasets to determine more discriminative features such as permissions and intents. We then apply machine-learning techniques on collected samples of different datasets based on the acquired features’ similarity. We perform random sampling on each cluster of collected datasets to check the antivirus software’s capability to detect the sample. We also discuss some common pitfalls in selecting datasets. Our findings benefit firms by acting as an exhaustive source of information about leading Android malware datasets

    Behavioral complexity analysis of networked systems to identify malware attacks

    Get PDF
    2020 Fall.Includes bibliographical references.Internet of Things (IoT) environments are often composed of a diverse set of devices that span a broad range of functionality, making them a challenge to secure. This diversity of function leads to a commensurate diversity in network traffic, some devices have simple network footprints and some devices have complex network footprints. This network-complexity in a device's traffic provides a differentiator that can be used by the network to distinguish which devices are most effectively managed autonomously and which devices are not. This study proposes an informed autonomous learning method by quantifying the complexity of a device based on historic traffic and applies this complexity metric to build a probabilistic model of the device's normal behavior using a Gaussian Mixture Model (GMM). This method results in an anomaly detection classifier with inlier probability thresholds customized to the complexity of each device without requiring labeled data. The model efficacy is then evaluated using seven common types of real malware traffic and across four device datasets of network traffic: one residential-based, two from labs, and one consisting of commercial automation devices. The results of the analysis of over 100 devices and 800 experiments show that the model leads to highly accurate representations of the devices and a strong correlation between the measured complexity of a device and the accuracy to which its network behavior can be modeled

    DroidDetectMW: A Hybrid Intelligent Model for Android Malware Detection

    Get PDF
    Malicious apps specifically aimed at the Android platform have increased in tandem with the proliferation of mobile devices. Malware is now so carefully written that it is difficult to detect. Due to the exponential growth in malware, manual methods of malware are increasingly ineffective. Although prior writers have proposed numerous high-quality approaches, static and dynamic assessments inherently necessitate intricate procedures. The obfuscation methods used by modern malware are incredibly complex and clever. As a result, it cannot be detected using only static malware analysis. As a result, this work presents a hybrid analysis approach, partially tailored for multiple-feature data, for identifying Android malware and classifying malware families to improve Android malware detection and classification. This paper offers a hybrid method that combines static and dynamic malware analysis to give a full view of the threat. Three distinct phases make up the framework proposed in this research. Normalization and feature extraction procedures are used in the first phase of pre-processing. Both static and dynamic features undergo feature selection in the second phase. Two feature selection strategies are proposed to choose the best subset of features to use for both static and dynamic features. The third phase involves applying a newly proposed detection model to classify android apps; this model uses a neural network optimized with an improved version of HHO. Application of binary and multi-class classification is used, with binary classification for benign and malware apps and multi-class classification for detecting malware categories and families. By utilizing the features gleaned from static and dynamic malware analysis, several machine-learning methods are used for malware classification. According to the results of the experiments, the hybrid approach improves the accuracy of detection and classification of Android malware compared to the scenario when considering static and dynamic information separately

    SCANNER: Sequence Clustering of resource Access to find Nearest Neighbors

    Get PDF
    The Android operating system currently holds 83% of the smartphone market with more than three million applications available on the leading applications (apps) stores. These apps require a set of permissions on installation and the user has to trust that they behave as expected. This can represent a risk to the user\u27s sensitive information and hence a critical question is how can we track these apps and understand how they are behaving. The current methods to characterize the behavior of applications focus on two types: 1) static analysis, extracting information from the .dex files or the manifest.xml, and 2) dynamic analysis, logging the system calls, control flow or processing sand-boxed execution traces. However, there is a lack of work involving the use of sequential resource access to help reveal critical behavior patterns and find similarly behaving applications. This work presents SCANNER, a system to analyze the applications\u27 sequential accesses to the various resources on Android devices to cluster similarly behaving applications. We propose to use the Longest Common Subsequence (LCS) to describe the ordered sequences and contrast it with the use of statistical access rates for characterizing application behaviors. Using these features, the applications are clustered to find the nearest neighbors. A set of metrics is defined to quantify how well the neighbors resemble each application and the compactness of the clusters. Our results show that the use of LCS features helps identify similarly behaving applications with resource access patterns that are not necessarily identifiable by using the access rates alone

    Cyber-threat detection system using a hybrid approach of transfer learning and multi-model image representation

    Get PDF
    Currently, Android apps are easily targeted by malicious network traffic because of their constant network access. These threats have the potential to steal vital information and disrupt the commerce, social system, and banking markets. In this paper, we present a malware detection system based on word2vec-based transfer learning and multi-model image representation. The proposed method combines the textual and texture features of network traffic to leverage the advantages of both types. Initially, the transfer learning method is used to extract trained vocab from network traffic. Then, the malware-to-image algorithm visualizes network bytes for visual analysis of data traffic. Next, the texture features are extracted from malware images using a combination of scale-invariant feature transforms (SIFTs) and oriented fast and rotated brief transforms (ORBs). Moreover, a convolutional neural network (CNN) is designed to extract deep features from a set of trained vocab and texture features. Finally, an ensemble model is designed to classify and detect malware based on the combination of textual and texture features. The proposed method is tested using two standard datasets, CIC-AAGM2017 and CICMalDroid 2020, which comprise a total of 10.2K malware and 3.2K benign samples. Furthermore, an explainable AI experiment is performed to interpret the proposed approach

    The Development of Digital Forensics Workforce Competency on the Example of Estonian Defence League

    Get PDF
    03.07.2014 kehtestati Vabariigi Valitsuse määrus nr. 108, mis reguleerib Kaitseliidu kaasamise tingimusi ja korda küberjulgeoleku tagamisel. Seega võivad Kaitseliidu küberkaitse üksuse (KL KKÜ edaspidi KKÜ) kutsuda olukorda toetama erinevad asutused: näiteks Riigi Infosüsteemide amet (RIA), infosüsteemi järelevalveasutus või kaitseministeerium või selle valitsemisala ametiasutused oma ülesannete raames. KKÜ-d saab kaasata info- ja sidetehnoloogia infrastruktuuri järjepidevuse tagamisel, turvaintsidentide kontrollimisel ja lahendamisel, rakendades nii aktiivseid kui passiivseid meetmeid. KKÜ ülesannete kaardistamisel täheldati, et KKÜ partnerasutused / organisatsioonid ei ole kaardistanud oma spetsialistide olemasolevaid pädevusi ja sellele lisaks puudub ülevaade digitaalse ekspertiisi kogukonnas vajaolevatest pädevustest. Leitut arvesse võttes seati ülesandeks vajadustest ja piirangutest (võttes arvesse digitaalse ekspertiisi kogukonda kujundavaid standardeid) ülevaatliku pildi loomine, et töötada välja digitaalse ekspertiisi kompetentsipõhine raamistik, mis toetab KKÜ spetsialistide arendamist palkamisest pensionini. Selleks uurisime KKÜ ja nende olemasolevate koolitusprogrammide hetkeolukorda ning otsustasime milliseid omadusi peab edasise arengu tarbeks uurima ja kaaluma. Võrreldavate tulemuste saa-miseks ja eesmärgi täitmiseks pidi koostatav mudel olema suuteline lahendama 5-t järgnevat ülesannet: 1. Oskuste kaardistamine, 2. Eesmärkide seadmine ja ümberhindamine, 3. Koolituskava planeerimine, 4. Värbamisprotsessi kiirendamine ning 5. Spetsialistide kestva arengu soodustamine. Raamistiku väljatöötamiseks võeti aluseks National Initiative for Cybersecurity Education (NICE) Cybersecurity Workforce Framework (NICE Framework) pädevusraamistik mida parendati digitaalse ekspertiisi spetsialistide, ja käesoleval juhul ka KKÜ, vajadusi silmas pidades. Täiendusi lisati nii tasemete, spetsialiseerumise kui ka ülesannete kirjelduste kujul. Parenduste lisamisel võeti arvesse töös tutvustatud digitaalse ekspertiisi piiranguid ja standardeid, mille lõpptulemusena esitati KKÜ-le Digitaalse Ekspertiisi Pädevuse ontoloogia, KKÜ struktuuri muudatuse ettepanek, soovitatavad õpetamisstrateegiad digitaalse ekspertiisi kasutamiseks (muudetud Bloomi taksonoomia tasemetega), uus digitaalse ekspertiisi standardi alajaotus – Mehitamata Süsteemide ekspertiis ja Digitaalse Ekspertiisi Pädevuse Mudeli Raamistik. Ülesannete ja oskuste loetelu koostati rahvusvaheliselt tunnustatud sertifitseerimis-organisatsioonide ja erialast pädevust pakkuvate õppekavade abil. Kavandatava mudeli hindamiseks kasutati mini-Delphi ehk Estimate-Talk-Estimate (ETE) tehnikat. Esialgne prognoos vajaduste ja prioriteetidega anti KKÜ partnerasutustele saamaks tehtud töö kohta ekspertarvamusi. Kogu tagasisidet silmas pidades tehti mudelisse korrektuurid ja KKÜ-le sai vormistatud ettepanek ühes edasise tööplaaniga. Üldiselt kirjeldab väljapakutud pädevusraamistik KKÜ spetsialistilt ooda-tavat pädevuse ulatust KKÜ-s, et suurendada nende rolli kiirreageerimisrühmana. Raamistik aitab määratleda digitaalse ekspertiisi eeldatavaid pädevusi ja võimekusi praktikas ning juhendab eksperte spetsialiseerumise valikul. Kavandatud mudeli juures on arvestatud pikaajalise mõjuga (palkamisest pensionini). Tulenevalt mudeli komplekssusest, on raamistikul pikk rakendusfaas – organisatsiooni arengule maksimaalse mõju saavutamiseks on prognoositud ajakava maksimaalselt 5 aastat. Antud ettepanekud on käesolevaks hetkeks KKÜ poolt heaks kiidetud ning planeeritud kava rakendati esmakordselt 2019 aasta aprillikuus.In 03.07.2014 Regulation No. 108 was introduced which regulates the conditions and pro-cedure of the involvement of the Estonian Defence League (EDL) Cyber Defence Unit (CDU) in ensuring cyber security. This means that EDL can be brought in by the Information System Authority, Ministry of Defence or the authorities of its area of government within the scope of either of their tasks e.g. ensuring the continuity of information and communication technology infrastructure and in handling and solving cyber security incidents while applying both active and passive measures. In January 2018 EDL CDU’s Digi-tal Evidence Handling Group had to be re-organized and, thus, presented a proposal for internal curriculum in order to further instruct Digital Evidence specialists. While describing the CDU's tasks, it was noted that the CDU's partner institutions / organizations have not mapped out their specialists’ current competencies. With this in mind, we set out to create a comprehensive list of needs and constraints (taking into account the community standards of DF) to develop a DF-based competence framework that supports the devel-opment of CDU professionals. Hence, we studied the current situation of CDU, their existing training program, and contemplated which features we need to consider and ex-plore for further development. In order to assemble comparable results and to achieve the goal the model had to be able to solve the 5 following tasks: 1. Competency mapping, 2. Goal setting and reassessment, 3. Scheduling the training plan, 4. Accelerating the recruitment process, and 5. Promoting the continuous development of professionals. The frame-work was developed on the basis of the National Initiative for Cybersecurity Education (NICE) Cybersecurity Workforce Framework (NICE Framework), which was revised to meet the needs of DF specialists, including EDL CDU. Additions were supplemented in terms of levels, specialization, and job descriptions. The proposals included the DF limitations and standards introduced in the work, which ultimately resulted in a proposal for a Digital Forensics Competency ontology, EDL CDU structure change, Suggested Instruc-tional Strategies for Digital Forensics Use With Each Level of revised Bloom's Taxonomy, a new DF standard subdivision – Unmanned Systems Forensics, and Digital Forensic Competency Model Framework. The list of tasks and skills were compiled from international certification distribution organizations and curricula, and their focus on DF Special-ist Competencies. Mini-Delphi or Estimate-Talk-Estimate (ETE) techniques were applied to evaluate the proposed model. An initial estimation of competencies and priorities were given to the EDL CDU partner institutions for expert advice and evaluation. Considering the feedback, improvements were made to the model and a proposal was put forward to the CDU with a future work plan. In general, the proposed competence framework describes the expected scope of competence of an DF specialist in the EDL CDU to enhance their role as a rapid response team. The framework helps in defining the expected compe-tencies and capabilities of digital forensics in practice and offers guidance to the experts in the choice of specialization. The proposed model takes into account the long-term effect (hire-to-retire). Due to the complexity of the model, the framework has a long implementation phase — the maximum time frame for achieving the full effect for the organization is expected to be 5 years. These proposals were approved by EDL CDU and the proposed plan was first launched in April 2019

    Static malware detection Using Stacked BiLSTM and GPT-2

    Get PDF
    In recent years, cyber threats and malicious software attacks have been escalated on various platforms. Therefore, it has become essential to develop automated machine learning methods for defending against malware. In the present study, we propose stacked bidirectional long short-term memory (Stacked BiLSTM) and generative pre-trained transformer based (GPT-2) deep learning language models for detecting malicious code. We developed language models using assembly instructions extracted from .text sections of malicious and benign Portable Executable (PE) files. We treated each instruction as a sentence and each .text section as a document. We also labeled each sentence and document as benign or malicious, according to the file source. We created three datasets from those sentences and documents. The first dataset, composed of documents, was fed into a Document Level Analysis Model (DLAM) based on Stacked BiLSTM. The second dataset, composed of sentences, was used in Sentence Level Analysis Models (SLAMs) based on Stacked BiLSTM and DistilBERT, Domain Specific Language Model GPT-2 (DSLM-GPT2), and General Language Model GPT-2 (GLM-GPT2). Lastly, we merged all assembly instructions without labels for creating the third dataset; then we fed a custom pre-trained model with it. We then compared malware detection performances. The results showed that the pre-trained model improved the DSLM-GPT2 and GLM-GPT2 detection performance. The experiments showed that the DLAM, the SLAM based on DistilBERT, the DSLM-GPT2, and the GLM-GPT2 achieved 98.3%, 70.4%, 86.0%, and 76.2% F1 scores, respectively
    corecore