1 research outputs found
Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild
Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and
enhancing security for organizations. However, the process of extracting
relevant information from unstructured text sources can be expensive and
time-consuming. Our empirical experience shows that existing tools for
automated structured CTI extraction have performance limitations. Furthermore,
the community lacks a common benchmark to quantitatively assess their
performance. We fill these gaps providing a new large open benchmark dataset
and aCTIon, a structured CTI information extraction tool. The dataset includes
204 real-world publicly available reports and their corresponding structured
CTI information in STIX format. Our team curated the dataset involving three
independent groups of CTI analysts working over the course of several months.
To the best of our knowledge, this dataset is two orders of magnitude larger
than previously released open source datasets. We then design aCTIon,
leveraging recently introduced large language models (GPT3.5) in the context of
two custom information extraction pipelines. We compare our method with 10
solutions presented in previous work, for which we develop our own
implementations when open-source implementations were lacking. Our results show
that aCTIon outperforms previous work for structured CTI extraction with an
improvement of the F1-score from 10%points to 50%points across all tasks