As the importance of intrusion detection and prevention systems (IDPSs)
increases, great costs are incurred to manage the signatures that are generated
by malicious communication pattern files. Experts in network security need to
classify signatures by importance for an IDPS to work. We propose and evaluate
a machine learning signature classification model with a reject option (RO) to
reduce the cost of setting up an IDPS. To train the proposed model, it is
essential to design features that are effective for signature classification.
Experts classify signatures with predefined if-then rules. An if-then rule
returns a label of low, medium, high, or unknown importance based on keyword
matching of the elements in the signature. Therefore, we first design two types
of features, symbolic features (SFs) and keyword features (KFs), which are used
in keyword matching for the if-then rules. Next, we design web information and
message features (WMFs) to capture the properties of signatures that do not
match the if-then rules. The WMFs are extracted as term frequency-inverse
document frequency (TF-IDF) features of the message text in the signatures. The
features are obtained by web scraping from the referenced external attack
identification systems described in the signature. Because failure needs to be
minimized in the classification of IDPS signatures, as in the medical field, we
consider introducing a RO in our proposed model. The effectiveness of the
proposed classification model is evaluated in experiments with two real
datasets composed of signatures labeled by experts: a dataset that can be
classified with if-then rules and a dataset with elements that do not match an
if-then rule. In the experiment, the proposed model is evaluated. In both
cases, the combined SFs and WMFs performed better than the combined SFs and
KFs. In addition, we also performed feature analysis.Comment: 9 pages, 5 figures, 3 table