1 research outputs found
Automated Dataset Generation System for Collaborative Research of Cyber Threat Analysis
The objectives of cyberattacks are becoming sophisticated, and attackers are
concealing their identity by masquerading as other attackers. Cyber threat
intelligence (CTI) is gaining attention as a way to collect meaningful
knowledge to better understand the intention of an attacker and eventually
predict future attacks. A systemic threat analysis based on data acquired from
actual cyber incidents is a useful approach to generating intelligence for such
an objective. Developing an analysis technique requires a high volume and fine
quality data. However, researchers can become discouraged by an inaccessibility
to data because organizations rarely release their data to the research
community. Owing to a data inaccessibility issue, academic research tends to be
biased toward techniques that develope steps of the CTI process other than
analysis and production. In this paper, we propose an automated dataset
generation system called CTIMiner. The system collects threat data from
publicly available security reports and malware repositories. The data are
stored in a structured format. We released the source codes and dataset to the
public, including approximately 640,000 records from 612 security reports
published from January 2008 to June 2019. In addition, we present a statistical
feature of the dataset and techniques that can be developed using it. Moreover,
we demonstrate an application example of the dataset that analyzes the
correlation and characteristics of an incident. We believe our dataset will
promote collaborative research on threat analysis for the generation of CTI.Comment: preprint version of paper published in Security and Communication
Networks special issue on Data-Driven Cybersecurit