8 research outputs found
Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild
Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and
enhancing security for organizations. However, the process of extracting
relevant information from unstructured text sources can be expensive and
time-consuming. Our empirical experience shows that existing tools for
automated structured CTI extraction have performance limitations. Furthermore,
the community lacks a common benchmark to quantitatively assess their
performance. We fill these gaps providing a new large open benchmark dataset
and aCTIon, a structured CTI information extraction tool. The dataset includes
204 real-world publicly available reports and their corresponding structured
CTI information in STIX format. Our team curated the dataset involving three
independent groups of CTI analysts working over the course of several months.
To the best of our knowledge, this dataset is two orders of magnitude larger
than previously released open source datasets. We then design aCTIon,
leveraging recently introduced large language models (GPT3.5) in the context of
two custom information extraction pipelines. We compare our method with 10
solutions presented in previous work, for which we develop our own
implementations when open-source implementations were lacking. Our results show
that aCTIon outperforms previous work for structured CTI extraction with an
improvement of the F1-score from 10%points to 50%points across all tasks
Common Peak Approach Using Mass Spectrometry Data Sets for Predicting the Effects of Anticancer Drugs on Breast Cancer
We propose a method for biomarker discovery from mass spectrometry data, improving the common peak approach developed by Fushiki et al. (BMC Bioinformatics, 7:358, 2006). The common peak method is a simple way to select the sensible peaks that are shared with many subjects among all detected peaks by combining a standard spectrum alignment and kernel density estimates. The key idea of our proposed method is to apply the common peak approach to each class label separately. Hence, the proposed method gains more informative peaks for predicting class labels, while minor peaks associated with specific subjects are deleted correctly. We used a SELDI-TOF MS data set from laser microdissected cancer tissues for predicting the treatment effects of neoadjuvant therapy using an anticancer drug on breast cancer patients. The AdaBoost algorithm is adopted for pattern recognition, based on the set of candidate peaks selected by the proposed method. The analysis gives good performance in the sense of test errors for classifying the class labels for a given feature vector of selected peak values