Machine-Learning based analysis and classification of Android malware signatures

Abstract

Multi-scanner Antivirus (AV) systems are often used for detecting Android malware since the same piece of software can be checked against multiple different AV engines. However, in many cases the same software application is flagged as malware by few AV engines, and often the signatures provided contradict each other, showing a clear lack of consensus between different AV engines. This work analyzes more than 80 thousand Android applications flagged as malware by at least one AV engine, with a total of almost 260 thousand malware signatures. In the analysis, we identify 41 different malware families, we study their relationships and the relationships between the AV engines involved in such detections, showing that most malware cases belong to either Adware abuse or really dangerous Harmful applications, but some others are unspecified (or Unknown). With the help of Machine Learning and Graph Community Algorithms, we can further combine the different AV detections to classify such Unknown apps into either Adware or Harmful risks, reaching F1-score above 0.84.The authors would like to acknowledge the support of the national project TEXEO (TEC2016-80339-R), funded by the Ministerio de Economia y Competitividad of SPAIN through, and the EU-funded H2020 SMOOTH project, Spain (grant no. H2020-786741). Similarly, the authors would like to remark the support provided by the Tacyt system (https://www.elevenpaths.com/es/te cnologia/tacyt/index.html) for the collection and labeling of AV information. Finally, Ignacio Martin would like to acknowledge the support granted by the Spanish Ministry of education through the FPU scholarship he holds (FPU15/03518)

    Similar works