232 research outputs found

    KLASIFIKASI MALWARE ANDROID DENGAN MENGGUNAKAN METODE CATBOOST ALGORITMA

    Get PDF
    In 2008, Android was introduced as a popular open source project due to its customizability and low hardware requirements. Mid-2021 statistics from GlobalStat Counter shows that Android dominates the mobile operating system market with 72.74%. Despite its popularity, Android is becoming a target for malware attacks in the context of cyber crime. This problem prompted this research to be carried out with the aim of identifying and classifying Android malware which is continuously developing by applying machine learning logic, especially using the methodCatBoost. This method was chosen based on its effectiveness in previous research which has been proven to provide high accuracy. Performance evaluation involves comparisons betweenCatBoost and several previous researchers' methods, inclKNN (K-Nearest Neighbors), SVM (Support Vector Machine), LR (Logistic Regression), RF (Random Forest), ET (Extra Trees), XG (XGBoost), AB (Adaboost), and BG (Bagging), using common metrics such asValidation Accuracy, Detection Accuracy, and F1-Score. The research results show thatCatBoost managed to achieveValidation Accuracy amounting to 96.66%,Detection Accuracy 96,87%, andF1-Score of 96.81% puts it in a competitive position with most other methods, exceptRF (Random Forest). CatBoost's consistent superiority in this comparison shows its potential as an effective and consistent solution in Android malware detection and classification

    Malware detection methods for Android mobile applications

    Get PDF
    Advancements in mobile computing are attracting traditional device users to transition toward mobile platforms to fulfil their data processing needs. Among these, the Android platform is the most popular, holding the majority of the market share due to its open-source policy and ability to install applications from different application stores. This fact, coupled with the amount of sensitive data these devices now store, makes it attractive for malware authors to attack the Android platform, causing a large influx of malicious applications in the ecosystem. Traditional malware detection methods cannot effectively control and prevent this influx, demanding an automatic and intelligent approach such as machine learning. In this thesis, three machine learning algorithms, XGBoost, SVM and K-NN were trained with several features, with a focus on Android permissions , to measure the effectiveness of applying machine learning techniques to combat the proliferation of malware. Given goodware to malware ratio of 99/1, four experiments with an under-sampled version of the dataset with a ratio of 70/30 were conducted to test different subsets of the feature space as well as feature elimination and aggregation before training the algorithms with the full set of features using feature normalization across two distinct scenarios. This approach showed promising results, with XGBoost, SVM and K-NN distinguishing between malware and goodware with a score of 90 % (Area Under the Receiver Operating Curve values).Os avanços na computação móvel estão a atrair utilizadores de dispositivos tradicionais a transitar para as plataformas móveis para atender às suas necessidades de processamento de dados. Entre estas, a plataforma Android é a mais popular, detendo a maioria da quota de mercado devido à sua política open-source e capacidade de instalar aplicações através de várias lojas de aplicações. Este facto, conjuntamente com a quantidade de dados sensíveis que estes dispositivos agora armazenam, torna o ataque à plataforma Android atraente para os autores de malware, causando um grande fluxo de aplicações maliciosas no ecossistema. Os métodos tradicionais de deteção de malware não conseguem controlar e prevenir este fluxo eficazmente, exigindo uma abordagem automática e inteligente, como a aprendizagem automática. Nesta tese, três algoritmos de aprendizagem automática, XGBoost, SVM e K-NN, foram treinados com diversas características, focando-se nas permissões Android e características estáticas das aplicações, para medir a eficácia da aplicação de técnicas de aprendizagem automática no combate à proliferação de malware. Dado o rácio de goodware para malware de 99/1 do conjunto de dados, realizaram-se quatro experiências com uma versão subamostrada do mesmo com um rácio de 70/30 para testar diferentes subconjuntos do espaço de características bem como eliminação e agregação de características antes de treinar os algoritmos com o conjunto completo de características usando normalização de características em dois cenários. Esta abordagem apresentou resultados promissores, com XGBoost, SVM e K-NN distinguindo entre malware e goodware com um score de 90 % (valores Area Under the Receiver Operating Curve)

    Malware detection based on dynamic analysis features

    Get PDF
    The widespread usage of mobile devices and their seamless adaptation to each users' needs by the means of useful applications (Apps), makes them a prime target for malware developers to get access to sensitive user data, such as banking details, or to hold data hostage and block user access. These apps are distributed in marketplaces that host millions and therefore have their own forms of automated malware detection in place in order to deter malware developers and keep their app store (and reputation) trustworthy, but there are still a number of apps that are able to bypass these detectors and remain available in the marketplace for any user to download. Current malware detection strategies rely mostly on using features extracted statically, dynamically or a conjunction of both, and making them suitable for machine learning applications, in order to scale detection to cover the number of apps that are submited to the marketplace. In this article, the main focus is the study of the effectiveness of these automated malware detection methods and their ability to keep up with the proliferation of new malware and its ever-shifting trends. By analising the performance of ML algorithms trained, with real world data, on diferent time periods and time scales with features extracted statically, dynamically and from user-feedback, we are able to identify the optimal setup to maximise malware detection.O uso generalizado de dispositivos móveis e sua adaptação perfeita às necessidades de cada utilizador por meio de aplicativos úteis (Apps) tornam-os um alvo principal para que criadores de malware obtenham acesso a dados confidenciais do usuário, como detalhes bancários, ou para reter dados e bloquear o acesso do utilizador. Estas apps são distribuídas em mercados que alojam milhões, e portanto, têm as suas próprias formas de detecção automatizada de malware, a fim de dissuadir os desenvolvedores de malware e manter sua loja de apps (e reputação) confiável, mas ainda existem várias apps capazes de ignorar esses detectores e permanecerem disponíveis no mercado para qualquer utilizador fazer o download. As estratégias atuais de detecção de malware dependem principalmente do uso de recursos extraídos estaticamente, dinamicamente ou de uma conjunção de ambos, e de torná-los adequados para aplicações de aprendizagem automática, a fim de dimensionar a detecção para cobrir o número de apps que são enviadas ao mercado. Neste artigo, o foco principal é o estudo da eficácia dos métodos automáticos de detecção de malware e as suas capacidades de acompanhar a popularidade de novo malware, bem como as suas tendências em constante mudança. Analisando o desempenho de algoritmos de ML treinados, com dados do mundo real, em diferentes períodos e escalas de tempo com recursos extraídos estaticamente, dinamicamente e com feedback do utilizador, é possível identificar a configuração ideal para maximizar a detecção de malware

    An Adaptive Feature Centric XG Boost Ensemble Classifier Model for Improved Malware Detection and Classification

    Get PDF
    Machine learning (ML) is often used to solve the problem of malware detection and classification and various machine learning approaches are adapted to the problem of malware classification; still  acquiring poor performance by the way of feature selection, and classification. To manage the issue, an efficient Adaptive Feature Centric XG Boost Ensemble Learner Classifier “AFC-XG Boost” novel algorithm is presented in this paper. The proposed model has been designed to handle varying data sets of malware detection obtained from Kaggle data set. The model turns the process of XG Boost classifier in several stages to optimize the performance. At preprocessing stage, the data set given has been noise removed, normalized and tamper removed using Feature Base Optimizer “FBO” algorithm. The FBO would normalize the data points as well as performs noise removal according to the feature values and their base information. Similarly, the performance of standard XG Boost has been optimized by adapting Feature selection using Class Based Principle Component Analysis “CBPCA” algorithm, which performs feature selection according to the fitness of any feature for different classes. Based on the selected features, the method generates regression tree for each feature considered. Based on the generated trees, the method performs classification by computing Tree Level Ensemble Similarity “TLES” and Class Level Ensemble Similarity “CLES”. Using both method computes the value of Class Match Similarity “CMS” based on which the malware has been classified. The proposed approach achieves 97% accuracy in malware detection and classification with the less time complexity of 34 seconds for 75000 sample

    Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection

    Full text link
    In this paper, we introduce and evaluate PROPEDEUTICA, a novel methodology and framework for efficient and effective real-time malware detection, leveraging the best of conventional machine learning (ML) and deep learning (DL) algorithms. In PROPEDEUTICA, all software processes in the system start execution subjected to a conventional ML detector for fast classification. If a piece of software receives a borderline classification, it is subjected to further analysis via more performance expensive and more accurate DL methods, via our newly proposed DL algorithm DEEPMALWARE. Further, we introduce delays to the execution of software subjected to deep learning analysis as a way to "buy time" for DL analysis and to rate-limit the impact of possible malware in the system. We evaluated PROPEDEUTICA with a set of 9,115 malware samples and 877 commonly used benign software samples from various categories for the Windows OS. Our results show that the false positive rate for conventional ML methods can reach 20%, and for modern DL methods it is usually below 6%. However, the classification time for DL can be 100X longer than conventional ML methods. PROPEDEUTICA improved the detection F1-score from 77.54% (conventional ML method) to 90.25%, and reduced the detection time by 54.86%. Further, the percentage of software subjected to DL analysis was approximately 40% on average. Further, the application of delays in software subjected to ML reduced the detection time by approximately 10%. Finally, we found and discussed a discrepancy between the detection accuracy offline (analysis after all traces are collected) and on-the-fly (analysis in tandem with trace collection). Our insights show that conventional ML and modern DL-based malware detectors in isolation cannot meet the needs of efficient and effective malware detection: high accuracy, low false positive rate, and short classification time.Comment: 17 pages, 7 figure

    Prescience:Probabilistic Guidance on the Retraining Conundrum for Malware Detection

    Get PDF
    Malware evolves perpetually and relies on increasingly sophisticatedattacks to supersede defense strategies. Datadrivenapproaches to malware detection run the risk of becomingrapidly antiquated. Keeping pace with malwarerequires models that are periodically enriched with freshknowledge, commonly known as retraining. In this work,we propose the use of Venn-Abers predictors for assessingthe quality of binary classification tasks as a first step towardsidentifying antiquated models. One of the key bene-fits behind the use of Venn-Abers predictors is that they areautomatically well calibrated and offer probabilistic guidanceon the identification of nonstationary populations ofmalware. Our framework is agnostic to the underlying classificationalgorithm and can then be used for building betterretraining strategies in the presence of concept drift. Resultsobtained over a timeline-based evaluation with about 90Ksamples show that our framework can identify when modelstend to become obsolete

    Automated android malware detection using user feedback

    Get PDF
    The widespread usage of mobile devices and their seamless adaptation to each user’s needs through useful applications (apps) makes them a prime target for malware developers. Malware is software built to harm the user, e.g., to access sensitive user data, such as banking details, or to hold data hostage and block user access. These apps are distributed in marketplaces that host millions and therefore have their forms of automated malware detection in place to deter malware developers and keep their app store (and reputation) trustworthy. Nevertheless, a non-negligible number of apps can bypass these detectors and remain available in the marketplace for any user to download and install on their device. Current malware detection strategies rely on using static or dynamic app extracted features (or a combination of both) to scale the detection and cover the growing number of apps submitted to the marketplace. In this paper, the main focus is on the apps that bypass the malware detectors and stay in the marketplace long enough to receive user feedback. This paper uses real-world data provided by an app store. The quantitative ratings and potential alert flags assigned to the apps by the users were used as features to train machine learning classifiers that successfully classify malware that evaded previous detection attempts. These results present reasonable accuracy and thus work to help to maintain a user-safe environment.info:eu-repo/semantics/publishedVersio
    corecore