Domain squatting is a technique used by attackers to create domain names for
phishing sites. In recent phishing attempts, we have observed many domain names
that use multiple techniques to evade existing methods for domain squatting.
These domain names, which we call generated squatting domains (GSDs), are quite
different in appearance from legitimate domain names and do not contain brand
names, making them difficult to associate with phishing. In this paper, we
propose a system called PhishReplicant that detects GSDs by focusing on the
linguistic similarity of domain names. We analyzed newly registered and
observed domain names extracted from certificate transparency logs, passive
DNS, and DNS zone files. We detected 3,498 domain names acquired by attackers
in a four-week experiment, of which 2,821 were used for phishing sites within a
month of detection. We also confirmed that our proposed system outperformed
existing systems in both detection accuracy and number of domain names
detected. As an in-depth analysis, we examined 205k GSDs collected over 150
days and found that phishing using GSDs was distributed globally. However,
attackers intensively targeted brands in specific regions and industries. By
analyzing GSDs in real time, we can block phishing sites before or immediately
after they appear.Comment: Accepted at ACSAC 202