1,920 research outputs found

    Digital evidence search kit

    Get PDF
    With the rapid development of electronic commerce and Internet technology, cyber crimes have become more and more common. There is a great need for automated software systems that can assist law enforcement agencies in cyber crime evidence collection. This paper describes a cyber crime evidence collection tool called DESK (Digital Evidence Search Kit), which is the product of several years of cumulative efforts of our Center together with the Hong Kong Police Force and several other law enforcement agencies of the Hong Kong Special Administrative Region. We will use DESK to illustrate some of the desirable features of an effective cyber crime evidence collection tool. © 2005 IEEE.published_or_final_versio

    A Plagiarism Detection Algorithm based on Extended Winnowing

    Full text link
    Plagiarism is a common problem faced by academia and education. Mature commercial plagiarism detection system has the advantages of comprehensive and high accuracy, but the expensive detection costs make it unsuitable for real-time, lightweight application environment such as the student assignments plagiarism detection. This paper introduces the method of extending classic Winnowing plagiarism detection algorithm, expands the algorithm in functionality. The extended algorithm can retain the text location and length information in original document while extracting the fingerprints of a document, so that the locating and marking for plagiarism text fragment are much easier to achieve. The experimental results and several years of running practice show that the expansion of the algorithm has little effect on its performance, normal hardware configuration of PC will be able to meet small and medium-sized applications requirements. Based on the characteristics of lightweight, high efficiency, reliability and flexibility of Winnowing, the extended algorithm further enhances the adaptability and extends the application areas

    CEAI: CCM based Email Authorship Identification Model

    Full text link
    In this paper we present a model for email authorship identification (EAI) by employing a Cluster-based Classification (CCM) technique. Traditionally, stylometric features have been successfully employed in various authorship analysis tasks; we extend the traditional feature-set to include some more interesting and effective features for email authorship identification (e.g. the last punctuation mark used in an email, the tendency of an author to use capitalization at the start of an email, or the punctuation after a greeting or farewell). We also included Info Gain feature selection based content features. It is observed that the use of such features in the authorship identification process has a positive impact on the accuracy of the authorship identification task. We performed experiments to justify our arguments and compared the results with other base line models. Experimental results reveal that the proposed CCM-based email authorship identification model, along with the proposed feature set, outperforms the state-of-the-art support vector machine (SVM)-based models, as well as the models proposed by Iqbal et al. [1, 2]. The proposed model attains an accuracy rate of 94% for 10 authors, 89% for 25 authors, and 81% for 50 authors, respectively on Enron dataset, while 89.5% accuracy has been achieved on authors' constructed real email dataset. The results on Enron dataset have been achieved on quite a large number of authors as compared to the models proposed by Iqbal et al. [1, 2]

    Mining digital identity insights: patent analysis using NLP

    Get PDF
    The field of digital identity innovation has grown significantly over the last 30 years, with over 6000 technology patents registered worldwide. However, many questions remain about who controls and owns our digital identity and intellectual property and, ultimately, where the future of digital identity is heading. To investigate this further, this research mines digital identity patents and explores core themes such as identity, systems, privacy, security, and emerging fields like blockchain, financial transactions, and biometric technologies, utilizing natural language processing (NLP) methods including part-of-speech (POS) tagging, clustering, topic classification, noise reduction, and lemmatisation techniques. Finally, the research employs graph modelling and statistical analysis to discern inherent trends and forecast future developments. The findings significantly contribute to the digital identity landscape, identifying key players, emerging trends, and technological progress. This research serves as a valuable resource for academia and industry stakeholders, aiding in strategic decision-making and investment in emerging technologies and facilitating navigation through the dynamic realm of digital identity technologies
    • …
    corecore