2 research outputs found

    Towards a better similarity measure for keyword profiling via clustering

    Get PDF
    Automatic profiling for users and postings can help law enforcement units cluster and classify users and postings effectively so that potential problematic users and postings can be identified easily. A core problem in this application is to come up with effective profiles and a good measure to compare the similarity of two profiles. In this paper, we investigate an existing keyword-based user profiling scheme and identify its limitations. Then, we propose an improved version of it and demonstrate that our proposed version is more consistent than the existing approach with respect to the observed replied rates of a user to a posting based on the similarity of the profiles. © 2013 IEEE.published_or_final_versio

    Automatic online monitoring and data-mining internet forums

    No full text
    With the advancement of internet technology and the change in the mode of communication, it is found that much first-hand news have been discussed in Internet forums well before they are reported in traditional mass media. Also, this communication channel provides an effective channel for illegal activities such as dissemination of copyrighted movies, threatening messages and online gambling etc. The law enforcement agencies are looking for solutions to monitor these discussion forums for possible criminal activities and download suspected postings as evidence for investigation. The volume of postings is huge, for 10 popular forums in Hong Kong, we found that there are 300,000 new messages every day. In this paper, we propose an automatic system that tackles this problem. Our proposed system will download postings from selected discussion forums continuously and employ data mining techniques to identify hot topics and cluster authors into different groups using word-based user profiles. Difference techniques are applied to process the collected data and several ways are proposed to solve the problem. © 2011 IEEE.link_to_subscribed_fulltex
    corecore