Email threat is a serious issue for enterprise security, which consists of
various malicious scenarios, such as phishing, fraud, blackmail and
malvertisement. Traditional anti-spam gateway commonly requires to maintain a
greylist to filter out unexpected emails based on suspicious vocabularies
existed in the mail subject and content. However, the signature-based approach
cannot effectively discover novel and unknown suspicious emails that utilize
various hot topics at present, such as COVID-19 and US election. To address the
problem, in this paper, we present Holmes, an efficient and lightweight
semantic based engine for anomalous email detection. Holmes can convert each
event log of email to a sentence through word embedding then extract
interesting items among them by novelty detection. Based on our observations,
we claim that, in an enterprise environment, there is a stable relation between
senders and receivers, but suspicious emails are commonly from unusual sources,
which can be detected through the rareness selection. We evaluate the
performance of Holmes in a real-world enterprise environment, in which it sends
and receives around 5,000 emails each day. As a result, Holmes can achieve a
high detection rate (output around 200 suspicious emails per day) and maintain
a low false alarm rate for anomaly detection