2 research outputs found

    Perceptual Hashing applied to Tor domains recognition

    Full text link
    The Tor darknet hosts different types of illegal content, which are monitored by cybersecurity agencies. However, manually classifying Tor content can be slow and error-prone. To support this task, we introduce Frequency-Dominant Neighborhood Structure (F-DNS), a new perceptual hashing method for automatically classifying domains by their screenshots. First, we evaluated F-DNS using images subject to various content preserving operations. We compared them with their original images, achieving better correlation coefficients than other state-of-the-art methods, especially in the case of rotation. Then, we applied F-DNS to categorize Tor domains using the Darknet Usage Service Images-2K (DUSI-2K), a dataset with screenshots of active Tor service domains. Finally, we measured the performance of F-DNS against an image classification approach and a state-of-the-art hashing method. Our proposal obtained 98.75% accuracy in Tor images, surpassing all other methods compared.Comment: To be published on the JNIC 2020 Conference. Already published research summar

    Classification of Spam Emails through Hierarchical Clustering and Supervised Learning

    Full text link
    Spammers take advantage of email popularity to send indiscriminately unsolicited emails. Although researchers and organizations continuously develop anti-spam filters based on binary classification, spammers bypass them through new strategies, like word obfuscation or image-based spam. For the first time in literature, we propose to classify spam email in categories to improve the handle of already detected spam emails, instead of just using a binary model. First, we applied a hierarchical clustering algorithm to create SPEMC-1111K (SPam EMail Classification), the first multi-class dataset, which contains three types of spam emails: Health and Technology, Personal Scams, and Sexual Content. Then, we used SPEMC-1111K to evaluate the combination of TF-IDF and BOW encodings with Na\"ive Bayes, Decision Trees and SVM classifiers. Finally, we recommend for the task of multi-class spam classification the use of (i) TF-IDF combined with SVM for the best micro F1 score performance, 95.39%95.39\%, and (ii) TD-IDF along with NB for the fastest spam classification, analyzing an email in 2.132.13ms.Comment: 4 pages, 2 figures, to be published in conference JNIC 202
    corecore