4,358 research outputs found

    Feature Type Analysis in Automated Genre Classification

    Get PDF
    In this paper, we compare classifiers based on language model, image, and stylistic features for automated genre classification. The majority of previous studies in genre classification have created models based on an amalgamated representation of a document using a multitude of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. By independently modeling and comparing classifiers based on features belonging to three types, describing visual, stylistic, and topical properties, we demonstrate that different genres have distinctive feature strengths.

    Experiments to investigate the utility of nearest neighbour metrics based on linguistically informed features for detecting textual plagiarism

    Get PDF
    Plagiarism detection is a challenge for linguistic models — most current implemented models use simple occurrence statistics for linguistic items. In this paper we report two experiments related to plagiarism detection where we use a model for distributional semantics and of sentence stylistics to compare sentence by sentence the likelihood of a text being partly plagiarised. The result of the comparison are displayed for visual inspection by a plagiarism assessor

    Foreground and background text in retrieval

    Get PDF
    Our hypothesis is that certain clauses have foreground functions in text, while other clauses have background functions and that these functions are expressed or reflected in the syntactic structure of the clause. Presumably these clauses will have differing utility for automatic approaches to text understanding; a summarization system might want to utilize background clauses to capture commonalities between numbers of documents while an indexing system might use foreground clauses in order to capture specific characteristics of a certain document
    corecore