5 research outputs found

    Feature Selection and Improving Classification Performance for Malware Detection

    Get PDF
    The ubiquitous advance of technology has been conducive to the proliferation of cyber threats, resulting in attacks that have grown exponentially. Consequently, researchers have developed models based on machine learning algorithms for detecting malware. However, these methods require significant amount of extracted features for correct malware classification, making that feature extraction, training, and testing take significant time; even more, it has been unexplored which are the most important features for accomplish the correct classification. In this Thesis, it is created and analyzed a dataset of malware and clean files (goodware) from the static and dynamic features provided by the online framework VirusTotal. The purpose was to select the smallest number of features that keep the classification accuracy as high as the state of the art researches. Selecting the most representative features for malware detection relies on the possibility reducing the training time, given that it increases in O(n2) with respect to the number of features, and creating an embedded program that monitors processes executed by the OS. Thus, feature selection was made taking the most important features. In addition, classification algorithms such as Random Forest, Support Vector Machine and Neural Networks were used in a novel combination that not only showed an increase in accuracy, but also in the training speed from hours to just minutes. Next, the model was tested on one additional dataset of unseen malware files. Results showed that “9” features were enough to distinguish malware from goodware files within an accuracy of 99.60%

    Performance of Malware Classification on Machine Learning using Feature Selection

    Get PDF
    The exponential growth of malware has created a significant threat in our daily lives, which heavily rely on computers running all kinds of software. Malware writers create malicious software by creating new variants, new innovations, new infections and more obfuscated malware by using techniques such as packing and encrypting techniques. Malicious software classification and detection play an important role and a big challenge for cyber security research. Due to the increasing rate of false alarm, the accurate classification and detection of malware is a big necessity issue to be solved. In this research, eight malware family have been classifying according to their family the research provides four feature selection algorithms to select best feature for multiclass classification problem. Comparing. Then find these algorithms top 100 features are selected to performance evaluations. Five machine learning algorithms is compared to find best models. Then frequency distribution of features are find by feature ranking of best model. At last it is said that frequency distribution of every character of API call sequence can be used to classify malware family

    Optimum parameter machine learning classification and prediction of Internet of Things (IoT) malwares using static malware analysis techniques

    Get PDF
    Application of machine learning in the field of malware analysis is not a new concept, there have been lots of researches done on the classification of malware in android and windows environments. However, when it comes to malware analysis in the internet of things (IoT), it still requires work to be done. IoT was not designed to keeping security/privacy under consideration. Therefore, this area is full of research challenges. This study seeks to evaluate important machine learning classifiers like Support Vector Machines, Neural Network, Random Forest, Decision Trees, Naive Bayes, Bayesian Network, etc. and proposes a framework to utilize static feature extraction and selection processes highlight issues like over-fitting and generalization of classifiers to get an optimized algorithm with better performance. For background study, we used systematic literature review to find out research gaps in IoT, presented malware as a big challenge for IoT and the reasons for applying malware analysis targeting IoT devices and finally perform classification on malware dataset. The classification process used was applied on three different datasets containing file header, program header and section headers as features. Preliminary results show the accuracy of over 90% on file header, program header, and section headers. The scope of this document just discusses these results as initial results and still require some issues to be addressed which may effect on the performance measures

    Avaliação de técnicas de análise de texturas para classificação de famílias de malware

    Get PDF
    Orientador: André Ricardo Abed GrégioDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 21/12/2018Inclui referências: p. 58-64Resumo: A quantidade de variantes de arquivos maliciosos lançadas diariamente já fez com que a análise manual de malware se tornasse inviável há algum tempo. Para isso, foram propostos diversos tipos de análise automatizadas, entre eles a análise estática e a dinâmica, os mais utilizados atualmente. Porém, desenvolvedores de malware já conseguiram identificar as falhas de cada um deles e conseguem criar novos arquivos maliciosos que não são nem mesmo detectados pelos antivírus atuais. Para resolver esse problema, pesquisadores têm proposto outros tipos de análise e investido em métodos de classificação mais rápidos e precisos. Neste trabalho de pesquisa, é feito um levantamento bibliográfico sobre o assunto e optou-se por avaliar a classificação utilizando a análise de texturas. Foram selecionadas diversas técnicas para classificação de malware usando análise de texturas através de uma revisão sistemática da literatura. Com as técnicas encontradas foram realizados experimentos em um dataset da literatura (Malimg) e reaplicados nas amostras de um dataset local, mais robusto e semelhante ao cenário do mundo real. Em ambos o algoritmo KNN apresenta os melhores resultados de classificação, mostrando-se a alternativa mais viável em direção à solução do problema de agrupamento de variantes de programas maliciosos em suas famílias corretas através da análise de texturas. As técnicas de classificação usando o descritor global GIST obtêm maior taxa de acerto quando comparadas com o descritor local LBP e o uso de uma escala maior das texturas também apresenta melhor resultado. O dataset local atinge resultados bons apenas após uma seleção de dados, apresentando uma discussão sobre o uso de datasets não apropriados pela literatura para construção de classificadores genéricos de malware. Quanto a resiliência às técnicas de ofuscação utilizadas por criadores de malware para descaracterizar um binário, os experimentos ainda apontam como outra falsa teoria sobre a análise de texturas, pois apresenta resultados de classificação bastante ruins mesmo quando utilizadas técnicas bastante simples. A análise de texturas apresenta bons resultados apenas para variantes muito similares, não podendo ser utilizadas num cenário real onde há uma grande variedade de famílias e uso de técnicas bastante sofisticadas de ofuscação. Palavras-chave: Segurança computacional. Malware. Análise de textura. Classificação de malware. Visualização de malware.Abstract: The number of malicious software variants released daily turned manual malware analysis into an impractical task a long time ago. Due to that, automated analysis techniques were proposed, such as static and dynamic code analysis, which are the most used nowadays for the malware problem. However, malware authors already identified the shortcomings of each one of these analysis types so as to create new malicious files that are not even detected by current antiviruses. To solve this problem, researchers have proposed other types of analysis and invested in faster and more accurate classification methods. In this research work, I did a bibliographic survey on the subject, which led to the decision of performing classification using texture analysis. Several techniques were filtered to classify malware using texture analysis through a literature systematic review. Experiments were carried out with these techniques applied in a literature dataset (Malimg) and then reapplied to the samples of our lab's malware dataset, more robust and similar to the real world scenario. In both datasets, KNN algorithm presented the best classification results, showing that it is the most viable approach towards solving the problem of grouping malware variants correctly into their families based on texture analysis. The classification techniques using the global descriptor GIST obtain a higher accuracy rate when compared to the local LBP descriptor and the use of a larger scale of the textures also presents better results. The local dataset achieves good results only after a data selection, presenting a discussion on the use of non-appropriate datasets in the literature for building generic malware classifiers. Related to the resilience to obfuscation techniques used by malware writers to deprive a binary, the experiments also point to another false theory about texture analysis, since it presents very bad results even when using fairly simple techniques. The texture analysis presents good results only for very similar variants, and can not be used in a real world scenario where there is a great variety of families and use of quite sophisticated techniques of obfuscation. Keywords: Computer security. Malware. Texture analysis. Malware classification. Malware visualization
    corecore