3 research outputs found
Tracking Cyber Adversaries with Adaptive Indicators of Compromise
A forensics investigation after a breach often uncovers network and host
indicators of compromise (IOCs) that can be deployed to sensors to allow early
detection of the adversary in the future. Over time, the adversary will change
tactics, techniques, and procedures (TTPs), which will also change the data
generated. If the IOCs are not kept up-to-date with the adversary's new TTPs,
the adversary will no longer be detected once all of the IOCs become invalid.
Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular
expressions (regexes), up-to-date with a dynamic adversary. Our framework
solves the TTK problem in an automated, cyclic fashion to bracket a previously
discovered adversary. This tracking is accomplished through a data-driven
approach of self-adapting a given model based on its own detection
capabilities.
In our initial experiments, we found that the true positive rate (TPR) of the
adaptive solution degrades much less significantly over time than the naive
solution, suggesting that self-updating the model allows the continued
detection of positives (i.e., adversaries). The cost for this performance is in
the false positive rate (FPR), which increases over time for the adaptive
solution, but remains constant for the naive solution. However, the difference
in overall detection performance, as measured by the area under the curve
(AUC), between the two methods is negligible. This result suggests that
self-updating the model over time should be done in practice to continue to
detect known, evolving adversaries.Comment: This was presented at the 4th Annual Conf. on Computational Science &
Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas,
Nevada, US
Tracking Cyber Adversaries with Adaptive Indicators of Compromise
A forensics investigation after a breach often uncovers network and host
indicators of compromise (IOCs) that can be deployed to sensors to allow early
detection of the adversary in the future. Over time, the adversary will change
tactics, techniques, and procedures (TTPs), which will also change the data
generated. If the IOCs are not kept up-to-date with the adversary's new TTPs,
the adversary will no longer be detected once all of the IOCs become invalid.
Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular
expressions (regexes), up-to-date with a dynamic adversary. Our framework
solves the TTK problem in an automated, cyclic fashion to bracket a previously
discovered adversary. This tracking is accomplished through a data-driven
approach of self-adapting a given model based on its own detection
capabilities.
In our initial experiments, we found that the true positive rate (TPR) of the
adaptive solution degrades much less significantly over time than the naive
solution, suggesting that self-updating the model allows the continued
detection of positives (i.e., adversaries). The cost for this performance is in
the false positive rate (FPR), which increases over time for the adaptive
solution, but remains constant for the naive solution. However, the difference
in overall detection performance, as measured by the area under the curve
(AUC), between the two methods is negligible. This result suggests that
self-updating the model over time should be done in practice to continue to
detect known, evolving adversaries.Comment: This was presented at the 4th Annual Conf. on Computational Science &
Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas,
Nevada, US
Automated hierarchical classification of scanned documents using convolutional neural network and regular expression
This research proposed automated hierarchical classification of scanned documents with characteristics content that have unstructured text and special patterns (specific and short strings) using convolutional neural network (CNN) and regular expression method (REM). The research data using digital correspondence documents with format PDF images from pusat data teknologi dan informasi (technology and information data center). The document hierarchy covers type of letter, type of manuscript letter, origin of letter and subject of letter. The research method consists of preprocessing, classification, and storage to database. Preprocessing covers extraction using Tesseract optical character recognition (OCR) and formation of word document vector with Word2Vec. Hierarchical classification uses CNN to classify 5 types of letters and regular expression to classify 4 types of manuscript letter, 15 origins of letter and 25 subjects of letter. The classified documents are stored in the Hive database in Hadoop big data architecture. The amount of data used is 5200 documents, consisting of 4000 for training, 1000 for testing and 200 for classification prediction documents. The trial result of 200 new documents is 188 documents correctly classified and 12 documents incorrectly classified. The accuracy of automated hierarchical classification is 94%. Next, the search of classified scanned documents based on content can be developed