Cybercriminals resort to phishing as a simple and cost-effective medium to
perpetrate cyber-attacks on today's Internet. Recent studies in phishing
detection are increasingly adopting automated feature selection over
traditional manually engineered features. This transition is due to the
inability of existing traditional methods to extrapolate their learning to new
data. To this end, in this paper, we propose WebPhish, a deep learning
technique using automatic feature selection extracted from the raw URL and HTML
of a web page. This approach is the first of its kind, which uses the
concatenation of URL and HTML embedding feature vectors as input into a
Convolutional Neural Network model to detect phishing attacks on web pages.
Extensive experiments on a real-world dataset yielded an accuracy of 98
percent, outperforming other state-of-the-art techniques. Also, WebPhish is a
client-side strategy that is completely language-independent and can conduct
lightweight phishing detection regardless of the web page's textual language