Article thumbnail

Multi-domain alias matching using machine learning

By Michael Ashcroft, Fredrik Johansson, Lisa Kaati and Amendra Shrestha


We describe a methodology for linking aliases belonging to the same individual based on a user's writing style (stylometric features extracted from the user generated content) and her time patterns (time-based features extracted from the publishing times of the user generated content). While most previous research on social media identity linkage relies on matching usernames, our methodology can also be used for users who actively try to choose dissimilar usernames when creating their aliases. In our experiments on a discussion forum dataset and a Twitter dataset, we evaluate the performance of three different classifiers. We use the best classifier (AdaBoost) to evaluate how well it works on different datasets using different features. Experiments show that combining stylometric and time based features yield good results on our synthetic datasets and a small-scale evaluation on real-world blog data confirm these results, yielding a precision over 95%. The use of emotion-related and Twitter-related features yield no significant impact on the results

Topics: Computer and Information Sciences, Data- och informationsvetenskap
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Year: 2016
DOI identifier: 10.1109/ENIC.2016.019
OAI identifier:
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)

  • To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.

    Suggested articles