1 research outputs found
Nested Multiple Instance Learning in Modelling of HTTP network traffic
In many interesting cases, the application of machine learning is hindered by
data having a complicated structure stimulated by a structured file-formats
like JSONs, XMLs, or ProtoBuffers, which is non-trivial to convert to a vector
/ matrix. Moreover, since the structure frequently carries a semantic meaning,
reflecting it in the machine learning model should improve the accuracy but
more importantly it facilitates the explanation of decisions and the model.
This paper demonstrates on the identification of infected computers in the
computer network from their HTTP traffic, how to achieve this reflection using
recent progress in multiple-instance learning. The proposed model is compared
to complementary approaches from the prior art, the first relying on
human-designed features and the second on automatically learned features
through convolution neural networks. In a challenging scenario measuring
accuracy only on unseen domains/malware families, the proposed model is
superior to the prior art while providing a valuable feedback to the security
researchers. We believe that the proposed framework will found applications
elsewhere even beyond the field of security