1,460 research outputs found
Beyond the Hype: A Real-World Evaluation of the Impact and Cost of Machine Learning-Based Malware Detection
There is a lack of scientific testing of commercially available malware
detectors, especially those that boast accurate classification of
never-before-seen (i.e., zero-day) files using machine learning (ML). The
result is that the efficacy and gaps among the available approaches are opaque,
inhibiting end users from making informed network security decisions and
researchers from targeting gaps in current detectors. In this paper, we present
a scientific evaluation of four market-leading malware detection tools to
assist an organization with two primary questions: (Q1) To what extent do
ML-based tools accurately classify never-before-seen files without sacrificing
detection ability on known files? (Q2) Is it worth purchasing a network-level
malware detector to complement host-based detection? We tested each tool
against 3,536 total files (2,554 or 72% malicious, 982 or 28% benign) including
over 400 zero-day malware, and tested with a variety of file types and
protocols for delivery. We present statistical results on detection time and
accuracy, consider complementary analysis (using multiple tools together), and
provide two novel applications of a recent cost-benefit evaluation procedure by
Iannaconne & Bridges that incorporates all the above metrics into a single
quantifiable cost. While the ML-based tools are more effective at detecting
zero-day files and executables, the signature-based tool may still be an
overall better option. Both network-based tools provide substantial (simulated)
savings when paired with either host tool, yet both show poor detection rates
on protocols other than HTTP or SMTP. Our results show that all four tools have
near-perfect precision but alarmingly low recall, especially on file types
other than executables and office files -- 37% of malware tested, including all
polyglot files, were undetected.Comment: Includes Actionable Takeaways for SOC
Comprehensive Security Framework for Global Threats Analysis
Cyber criminality activities are changing and becoming more and more professional. With the growth of financial flows through the Internet and the Information System (IS), new kinds of thread arise involving complex scenarios spread within multiple IS components. The IS information modeling and Behavioral Analysis are becoming new solutions to normalize the IS information and counter these new threads. This paper presents a framework which details the principal and necessary steps for monitoring an IS. We present the architecture of the framework, i.e. an ontology of activities carried out within an IS to model security information and User Behavioral analysis. The results of the performed experiments on real data show that the modeling is effective to reduce the amount of events by 91%. The User Behavioral Analysis on uniform modeled data is also effective, detecting more than 80% of legitimate actions of attack scenarios
Hunting for new threats in a feed of malicious samples
Hoy en día, las compañias de seguridad recolectan cantidades masivas de malware y otros posibles
ficheros benignos. Encontrar amenazas interesantes entre millones de ficheros recolectados es un gran
desafío. Una de las plataformas de seguridad más populares, VirusTotal (VT), permite consultar informes
de archivos que los usuarios envían. En este proyecto profundizaremos en el feed de ficheros de VT,
analizamos 328.3M de reports de archivos escaneados por VT durante un año, que pertenecen a 235.7M
de muestras y observamos que 209.6M de muestras son nuevas (89%). Utilizamos los reports de un año
para caracterizar el VT Feed, y lo comparamos con la telemetría de uno de los motores de antivirus más
grandes del planeta. Utilizamos ambos datasets para responder a reponder a estas preguntas: ¿Cómo de
diverso es el feed? ¿Cuál es la distribución de los tipos de ficheros a lo largo del año? ¿Cuál de ambas
plataformas detecta antes los archivos maliciosos? ¿Podemos detectar archivos maliciosos detectados por
VirusTotal pero no por el motor de antivirus de la telemetría? ¿Cuál es la distribución del malware a lo
largo de un año?
A continuación, analizamos 3 estrategias de clustering sobre Windows y APKs ground truths datasets,
Hierarchical DBSCAN (HDBSCAN), HAC-T, un HAC mejorado que agrupa sobre TLSH, que reduce la
complejidad de O(n2) a O(n log n), y Feature Value Grouping (FVG). Consideramos que solo HAC-T y
FVG producen clustering de alta precisión. Nuestros resultados muestran que FVG es la única estrategia
escalable sobre el VT File Feed dataset de un año.
Ademas, hemos desarrollado un técnica novedosa de threat hunting para identificar muestras maliciosas
que supuestamente son benignas, por ejemplo, sin detecciones por motores de AV. Cuando lo
aplicamos sobre los 235M del VT feed, nuestra encontramos 190K muestras benignas (no detectadas
por ninguna empresa de antivirus) que pertenecen a 29K clústers maliciosos, es decir, la mayoría de las
muestras de los clústers son maliciosos.Nowadays, security companies collect massive amounts of malware and other possibly benign files.
Finding interesting threats among many millions of files collected is a very challenging task. One of the
most popular security platforms, VirusTotal (VT), allows querying for reports of files that the users has
submitted. VT offers the VT File Feed (i.e., a stream of reports), in this project we deep dive into the
VT File Feed, we analyze 328.3M reports scanned by VirusTotal during one year, that belongs to 235.7M
samples, we observe that 209.6M samples were new (89%). We use the one-year reports to characterize the
VirusTotal Feed, and we compare it with the telemetry of a large antivirus vendor. With both datasets
we want to answer the following questions: How diverse is the feed? What is the filetype distribution over
a year? Which of both platforms detects earlier malicious files? Could we detect malicious files detected
by VirusTotal but not by the large security vendor telemetry? What is the malware distribution over
one year?
Then, we evaluate three clustering approaches over windows and apk ground truth datasets, Hierarchical
DBSCAN (HDBSCAN), HAC-T, an improved Hierarchical Agglomerative Clustering (HAC) over
TLSH that reduces complexity from O(n2) to O(n log n), and Feature Value Grouping (FVG). We conclude
that only HAC-T and FVG produces highly precission clusterings. Our results show that FVG is
the only approach that scales the full one-year VT File Feed dataset.
Then, we develop a novel threat hunting approach to identify malicious samples that were supposedly
benign, i.e., have zero detections by AV engines. When applied on 235M samples in the VT feed, our
approach identifies 190K possibly not-so-benign samples that belong to 29K malicious clusters, i.e., most
cluster samples are malicious.Máster Universitario en Ciberseguridad (M179
Android Malware Clustering through Malicious Payload Mining
Clustering has been well studied for desktop malware analysis as an effective
triage method. Conventional similarity-based clustering techniques, however,
cannot be immediately applied to Android malware analysis due to the excessive
use of third-party libraries in Android application development and the
widespread use of repackaging in malware development. We design and implement
an Android malware clustering system through iterative mining of malicious
payload and checking whether malware samples share the same version of
malicious payload. Our system utilizes a hierarchical clustering technique and
an efficient bit-vector format to represent Android apps. Experimental results
demonstrate that our clustering approach achieves precision of 0.90 and recall
of 0.75 for Android Genome malware dataset, and average precision of 0.98 and
recall of 0.96 with respect to manually verified ground-truth.Comment: Proceedings of the 20th International Symposium on Research in
Attacks, Intrusions and Defenses (RAID 2017
- …