7 research outputs found

    An effective and scalable framework for authorship attribution query processing

    Get PDF
    © 2018 The Authors. Published by IEEE. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://ieeexplore.ieee.org/document/8457490Authorship attribution aims at identifying the original author of an anonymous text from a given set of candidate authors and has a wide range of applications. The main challenge in authorship attribution problem is that the real-world applications tend to have hundreds of authors, while each author may have a small number of text samples, e.g., 5-10 texts/author. As a result, building a predictive model that can accurately identify the author of an anonymous text is a challenging task. In fact, existing authorship attribution solutions based on long text focus on application scenarios, where the number of candidate authors is limited to 50. These solutions generally report a significant performance reduction as the number of authors increases. To overcome this challenge, we propose a novel data representation model that captures stylistic variations within each document, which transforms the problem of authorship attribution into a similarity search problem. Based on this data representation model, we also propose a similarity query processing technique that can effectively handle outliers. We assess the accuracy of our proposed method against the state-of-the-art authorship attribution methods using real-world data sets extracted from Project Gutenberg. Our data set contains 3000 novels from 500 authors. Experimental results from this paper show that our method significantly outperforms all competitors. Specifically, as for the closed-set and open-set authorship attribution problems, our method have achieved higher than 95% accuracy.This work was supported by the CityU Project under Grant 7200387 and Grant 6000511.Published versio

    L.: Towards a syllabus repository for computer science courses

    No full text
    A syllabus defines the contents of a course, as well as other information such as resources and assignments. In this paper, we report on our work towards creating a syllabus repository of Computer Science courses across universities in the USA. We present some statistics from our initial collection of 8000+ syllabi. We show a syllabus creator that is integrated with Moodle [5], an opensource course management system, which allows for the creation of a syllabus for a particular course. Among other information, it includes knowledge units from the Computing Curricula 2001 body of knowledge. The goal of the syllabus repository is to provide added value to the Computer Science Education community, and we present some such offerings. We conclude by presenting our future plans for the syllabus repository. These include using automated techniques to collect and classify syllabi, providing recommendations to instructors when creating a syllabus, and allowing the community to share their syllabi automatically. The syllabus collection will be part of the Computing and Informatio
    corecore