Search CORE

1,460 research outputs found

Beyond the Hype: A Real-World Evaluation of the Impact and Cost of Machine Learning-Based Malware Detection

Author: Beaver Justin M.
Bridges Robert A.
Daniell Mark
Huffer Kelly M. T.
Iannacone Michael D.
Jewell Brian
Miles Craig
Nichols Jeff A.
Oesch Sean
Plummer Thomas
Scofield Daniel
Smith Jared M.
Tall Anne M.
Verma Miki E.
Weber Brian
Publication venue
Publication date: 15/03/2021
Field of study

There is a lack of scientific testing of commercially available malware detectors, especially those that boast accurate classification of never-before-seen (i.e., zero-day) files using machine learning (ML). The result is that the efficacy and gaps among the available approaches are opaque, inhibiting end users from making informed network security decisions and researchers from targeting gaps in current detectors. In this paper, we present a scientific evaluation of four market-leading malware detection tools to assist an organization with two primary questions: (Q1) To what extent do ML-based tools accurately classify never-before-seen files without sacrificing detection ability on known files? (Q2) Is it worth purchasing a network-level malware detector to complement host-based detection? We tested each tool against 3,536 total files (2,554 or 72% malicious, 982 or 28% benign) including over 400 zero-day malware, and tested with a variety of file types and protocols for delivery. We present statistical results on detection time and accuracy, consider complementary analysis (using multiple tools together), and provide two novel applications of a recent cost-benefit evaluation procedure by Iannaconne & Bridges that incorporates all the above metrics into a single quantifiable cost. While the ML-based tools are more effective at detecting zero-day files and executables, the signature-based tool may still be an overall better option. Both network-based tools provide substantial (simulated) savings when paired with either host tool, yet both show poor detection rates on protocols other than HTTP or SMTP. Our results show that all four tools have near-perfect precision but alarmingly low recall, especially on file types other than executables and office files -- 37% of malware tested, including all polyglot files, were undetected.Comment: Includes Actionable Takeaways for SOC

arXiv.org e-Print Archive

Comprehensive Security Framework for Global Threats Analysis

Author: Benali Fatiha
Saraydaryan Jacques
Ubeda Stéphane
Publication venue: International Journal of Computer Science Issues, IJCSI
Publication date: 01/08/2009
Field of study

Cyber criminality activities are changing and becoming more and more professional. With the growth of financial flows through the Internet and the Information System (IS), new kinds of thread arise involving complex scenarios spread within multiple IS components. The IS information modeling and Behavioral Analysis are becoming new solutions to normalize the IS information and counter these new threads. This paper presents a framework which details the principal and necessary steps for monitoring an IS. We present the architecture of the framework, i.e. an ontology of activities carried out within an IS to model security information and User Behavioral analysis. The results of the performed experiments on real data show that the modeling is effective to reduce the amount of events by 91%. The User Behavioral Analysis on uniform modeled data is also effective, detecting more than 80% of legitimate actions of attack scenarios

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

CogPrints Cognitive Sciences Eprint Archive

Hunting for new threats in a feed of malicious samples

Author: Van Liebergen Ávila Kevin Karel
Publication venue
Publication date: 01/01/2022
Field of study

Hoy en día, las compañias de seguridad recolectan cantidades masivas de malware y otros posibles ficheros benignos. Encontrar amenazas interesantes entre millones de ficheros recolectados es un gran desafío. Una de las plataformas de seguridad más populares, VirusTotal (VT), permite consultar informes de archivos que los usuarios envían. En este proyecto profundizaremos en el feed de ficheros de VT, analizamos 328.3M de reports de archivos escaneados por VT durante un año, que pertenecen a 235.7M de muestras y observamos que 209.6M de muestras son nuevas (89%). Utilizamos los reports de un año para caracterizar el VT Feed, y lo comparamos con la telemetría de uno de los motores de antivirus más grandes del planeta. Utilizamos ambos datasets para responder a reponder a estas preguntas: ¿Cómo de diverso es el feed? ¿Cuál es la distribución de los tipos de ficheros a lo largo del año? ¿Cuál de ambas plataformas detecta antes los archivos maliciosos? ¿Podemos detectar archivos maliciosos detectados por VirusTotal pero no por el motor de antivirus de la telemetría? ¿Cuál es la distribución del malware a lo largo de un año? A continuación, analizamos 3 estrategias de clustering sobre Windows y APKs ground truths datasets, Hierarchical DBSCAN (HDBSCAN), HAC-T, un HAC mejorado que agrupa sobre TLSH, que reduce la complejidad de O(n2) a O(n log n), y Feature Value Grouping (FVG). Consideramos que solo HAC-T y FVG producen clustering de alta precisión. Nuestros resultados muestran que FVG es la única estrategia escalable sobre el VT File Feed dataset de un año. Ademas, hemos desarrollado un técnica novedosa de threat hunting para identificar muestras maliciosas que supuestamente son benignas, por ejemplo, sin detecciones por motores de AV. Cuando lo aplicamos sobre los 235M del VT feed, nuestra encontramos 190K muestras benignas (no detectadas por ninguna empresa de antivirus) que pertenecen a 29K clústers maliciosos, es decir, la mayoría de las muestras de los clústers son maliciosos.Nowadays, security companies collect massive amounts of malware and other possibly benign files. Finding interesting threats among many millions of files collected is a very challenging task. One of the most popular security platforms, VirusTotal (VT), allows querying for reports of files that the users has submitted. VT offers the VT File Feed (i.e., a stream of reports), in this project we deep dive into the VT File Feed, we analyze 328.3M reports scanned by VirusTotal during one year, that belongs to 235.7M samples, we observe that 209.6M samples were new (89%). We use the one-year reports to characterize the VirusTotal Feed, and we compare it with the telemetry of a large antivirus vendor. With both datasets we want to answer the following questions: How diverse is the feed? What is the filetype distribution over a year? Which of both platforms detects earlier malicious files? Could we detect malicious files detected by VirusTotal but not by the large security vendor telemetry? What is the malware distribution over one year? Then, we evaluate three clustering approaches over windows and apk ground truth datasets, Hierarchical DBSCAN (HDBSCAN), HAC-T, an improved Hierarchical Agglomerative Clustering (HAC) over TLSH that reduces complexity from O(n2) to O(n log n), and Feature Value Grouping (FVG). We conclude that only HAC-T and FVG produces highly precission clusterings. Our results show that FVG is the only approach that scales the full one-year VT File Feed dataset. Then, we develop a novel threat hunting approach to identify malicious samples that were supposedly benign, i.e., have zero detections by AV engines. When applied on 235M samples in the VT feed, our approach identifies 190K possibly not-so-benign samples that belong to 29K malicious clusters, i.e., most cluster samples are malicious.Máster Universitario en Ciberseguridad (M179

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Android Malware Clustering through Malicious Payload Mining

Author: I Santos
J Crussell
J Kim
J Leskovec
K Rieck
M Sebastián
S Hanna
U Bayer
Publication venue
Publication date: 15/07/2017
Field of study

Clustering has been well studied for desktop malware analysis as an effective triage method. Conventional similarity-based clustering techniques, however, cannot be immediately applied to Android malware analysis due to the excessive use of third-party libraries in Android application development and the widespread use of repackaging in malware development. We design and implement an Android malware clustering system through iterative mining of malicious payload and checking whether malware samples share the same version of malicious payload. Our system utilizes a hierarchical clustering technique and an efficient bit-vector format to represent Android apps. Experimental results demonstrate that our clustering approach achieves precision of 0.90 and recall of 0.75 for Android Genome malware dataset, and average precision of 0.98 and recall of 0.96 with respect to manually verified ground-truth.Comment: Proceedings of the 20th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2017

arXiv.org e-Print Archive

Crossref