1 research outputs found
Machine Learning for Detecting Data Exfiltration: A Review
Context: Research at the intersection of cybersecurity, Machine Learning
(ML), and Software Engineering (SE) has recently taken significant steps in
proposing countermeasures for detecting sophisticated data exfiltration
attacks. It is important to systematically review and synthesize the ML-based
data exfiltration countermeasures for building a body of knowledge on this
important topic. Objective: This paper aims at systematically reviewing
ML-based data exfiltration countermeasures to identify and classify ML
approaches, feature engineering techniques, evaluation datasets, and
performance metrics used for these countermeasures. This review also aims at
identifying gaps in research on ML-based data exfiltration countermeasures.
Method: We used a Systematic Literature Review (SLR) method to select and
review {92} papers. Results: The review has enabled us to (a) classify the ML
approaches used in the countermeasures into data-driven, and behaviour-driven
approaches, (b) categorize features into six types: behavioural, content-based,
statistical, syntactical, spatial and temporal, (c) classify the evaluation
datasets into simulated, synthesized, and real datasets and (d) identify 11
performance measures used by these studies. Conclusion: We conclude that: (i)
the integration of data-driven and behaviour-driven approaches should be
explored; (ii) There is a need of developing high quality and large size
evaluation datasets; (iii) Incremental ML model training should be incorporated
in countermeasures; (iv) resilience to adversarial learning should be
considered and explored during the development of countermeasures to avoid
poisoning attacks; and (v) the use of automated feature engineering should be
encouraged for efficiently detecting data exfiltration attacks