Content generation that is both relevant and up to date with the current
threats of the target audience is a critical element in the success of any
Cyber Security Exercise (CSE). Through this work, we explore the results of
applying machine learning techniques to unstructured information sources to
generate structured CSE content. The corpus of our work is a large dataset of
publicly available cyber security articles that have been used to predict
future threats and to form the skeleton for new exercise scenarios. Machine
learning techniques, like named entity recognition (NER) and topic extraction,
have been utilised to structure the information based on a novel ontology we
developed, named Cyber Exercise Scenario Ontology (CESO). Moreover, we used
clustering with outliers to classify the generated extracted data into objects
of our ontology. Graph comparison methodologies were used to match generated
scenario fragments to known threat actors' tactics and help enrich the proposed
scenario accordingly with the help of synthetic text generators. CESO has also
been chosen as the prominent way to express both fragments and the final
proposed scenario content by our AI-assisted Cyber Exercise Framework (AiCEF).
Our methodology was put to test by providing a set of generated scenarios for
evaluation to a group of experts to be used as part of a real-world awareness
tabletop exercise