Detecting and correctly classifying malicious executables has become one of
the major concerns in cyber security, especially because traditional detection
systems have become less effective with the increasing number and danger of
threats found nowadays. One way to differentiate benign from malicious
executables is to leverage on their hexadecimal representation by creating a
set of binary features that completely characterise each executable. In this
paper we present a novel supervised learning Bayesian nonparametric approach
for binary matrices, that provides an effective probabilistic approach for
malware detection. Moreover, and due to the model's flexible assumptions, we
are able to use it in a multi-class framework where the interest relies in
classifying malware into known families. Finally, a generalisation of the model
which provides a deeper understanding of the behaviour across groups for each
feature is also developed