In this paper we investigate further and extend our previous work on radar signal identification
and classification based on a data set which comprises continuous, discrete and
categorical data that represent radar pulse train characteristics such as signal frequencies,
pulse repetition, type of modulation, intervals, scan period, scanning type, etc. As the
most of the real world datasets, it also contains high percentage of missing values and
to deal with this problem we investigate three imputation techniques: Multiple Imputation
(MI); K-Nearest Neighbour Imputation (KNNI); and Bagged Tree Imputation (BTI).
We apply these methods to data samples with up to 60% missingness, this way doubling
the number of instances with complete values in the resulting dataset. The imputation
models performance is assessed with Wilcoxon’s test for statistical significance and Cohen’s
effect size metrics. To solve the classification task, we employ three intelligent approaches:
Neural Networks (NN); Support Vector Machines (SVM); and Random Forests
(RF). Subsequently, we critically analyse which imputation method influences most the
classifiers’ performance, using a multiclass classification accuracy metric, based on the
area under the ROC curves. We consider two superclasses (‘military’ and ‘civil’), each
containing several ‘subclasses’, and introduce and propose two new metrics: inner class
accuracy (IA); and outer class accuracy (OA), in addition to the overall classification accuracy
(OCA) metric. We conclude that they can be used as complementary to the OCA
when choosing the best classifier for the problem at hand