72,409 research outputs found

    An Evaluation of Text Classification Methods for Literary Study

    Get PDF
    This article presents an empirical evaluation of text classification methods in literary domain. This study compared the performance of two popular algorithms, naı¨ve Bayes and support vector machines (SVMs) in two literary text classification tasks: the eroticism classification of Dickinson’s poems and the sentimentalism classification of chapters in early American novels. The algorithms were also combined with three text pre-processing tools, namely stemming, stopword removal, and statistical feature selection, to study the impact of these tools on the classifiers’ performance in the literary setting. Existing studies outside the literary domain indicated that SVMs are generally better than naı¨ve Bayes classifiers. However, in this study SVMs were not all winners. Both algorithms achieved high accuracy in sentimental chapter classification, but the naı¨ve Bayes classifier outperformed the SVM classifier in erotic poem classification. Self-feature selection helped both algorithms improve their performance in both tasks. However, the two algorithms selected relevant features in different frequency ranges, and therefore captured different characteristics of the target classes. The evaluation results in this study also suggest that arbitrary featurereduction steps such as stemming and stopword removal should be taken very carefully. Some stopwords were highly discriminative features for Dickinson’s erotic poem classification. In sentimental chapter classification, stemming undermined subsequent feature selection by aggressively conflating and neutralizing discriminative features

    ANÁLISE LINGUO-COGNITIVA DE UM TEXTO LITERÁRIO: MEIOS LINGUÍSTICOS DE EXPRESSÃO DE CONCEITOS E IMAGENS

    Get PDF
    The work is devoted to the study of the linguo-cognitive analysis of the literary text in the aspect of linguistic means of expressing concepts and images. The relevance of this article is due to the growing attention to the analysis of the literary text using the methods and techniques of cognitive linguistics. The main aspects of cognitive linguistics, lingua-cognitive analysis, and text as an object of cognitive linguistics research are widely covered in the work. The main features of the literary text, its components, and lingua-cognitive analysis of linguistic means of expressing concepts and images are defined. The purpose of the research is to reveal the lingo-cognitive analysis of the literary text in the aspect of linguistic means of expressing concepts and images. The object of research is a literary text as material for lingua-cognitive research. Research methods. Such research methods such as description, analysis and synthesis, comparison, generalization, cognitive analysis, linguistic analysis, and modelling were used in the work. The article reveals the lingua-cognitive analysis of the literary text in terms of linguistic means of expressing concepts and images. The essence of the concept of “cognitive linguistics” and the main aspects of this phenomenon are characterized. The interpretation of the term “lingua-cognitive analysis” is defined. The essence of the “text” concept and the classification features of the text are described. The scientific work describes the features of the literary text. The components of the literary text influencing the lingua-cognitive analysis are characterized. Such elements of the literary text as image, concept, character, and evaluation are summarized.O trabalho dedica-se ao estudo da análise linguo-cognitiva de um texto literário no aspecto dos meios linguísticos de expressão de conceitos e imagens. A relevância deste artigo se deve à crescente atenção dada à análise do texto literário utilizando os métodos e técnicas da linguística cognitiva. Os principais aspectos da linguística cognitiva, da análise linguística-cognitiva e do texto como objeto de pesquisa em linguística cognitiva são amplamente abordados no trabalho. São definidas as principais características do texto literário, seus componentes e a análise linguístico-cognitiva dos meios linguísticos de expressão de conceitos e imagens. O objetivo da pesquisa é revelar a análise lingocognitiva de um texto literário no aspecto dos meios linguísticos de expressão de conceitos e imagens. O objeto de pesquisa é um texto literário como material para pesquisas linguísticas e cognitivas. Métodos de pesquisa. Métodos de pesquisa como descrição, análise e síntese, comparação, generalização, análise cognitiva, análise linguística e modelagem foram utilizados no trabalho. O artigo revela a análise linguístico-cognitiva do texto literário em termos de meios linguísticos de expressão de conceitos e imagens. Caracteriza-se a essência do conceito de “linguística cognitiva” e os principais aspectos desse fenômeno. A interpretação do termo “análise linguístico-cognitiva” é definida. A essência do conceito de “texto” e as características de classificação do texto são descritas. O trabalho científico descreve as características do texto literário. São caracterizados os componentes do texto literário que influenciam a análise linguística-cognitiva. Elementos do texto literário como imagem, conceito, personagem e avaliação são resumidos

    Literary machine translation under the magnifying glass : assessing the quality of an NMT-translated detective novel on document level

    Get PDF
    Several studies (covering many language pairs and translation tasks) have demonstrated that translation quality has improved enormously since the emergence of neural machine translation systems. This raises the question whether such systems are able to produce high-quality translations for more creative text types such as literature and whether they are able to generate coherent translations on document level. Our study aimed to investigate these two questions by carrying out a document-level evaluation of the raw NMT output of an entire novel. We translated Agatha Christie's novel The Mysterious Affair at Styles with Google's NMT system from English into Dutch and annotated it in two steps: first all fluency errors, then all accuracy errors. We report on the overall quality, determine the remaining issues, compare the most frequent error types to those in general-domain MT, and investigate whether any accuracy and fluency errors co-occur regularly. Additionally, we assess the inter-annotator agreement on the first chapter of the novel

    English Bards and Unknown Reviewers: a Stylometric Analysis of Thomas Moore and the Christabel Review

    Get PDF
    Fraught relations between authors and critics are a commonplace of literary history. The particular case that we discuss in this article, a negative review of Samuel Taylor Coleridge's Christabel (1816), has an additional point of interest beyond the usual mixture of amusement and resentment that surrounds a critical rebuke: the authorship of the review remains, to this day, uncertain. The purpose of this article is to investigate the possible candidacy of Thomas Moore as the author of the provocative review. It seeks to solve a puzzle of almost two hundred years, and in the process clear a valuable scholarly path in Irish Studies, Romanticism, and in our understanding of Moore's role in a prominent literary controversy of the age

    CEAI: CCM based Email Authorship Identification Model

    Full text link
    In this paper we present a model for email authorship identification (EAI) by employing a Cluster-based Classification (CCM) technique. Traditionally, stylometric features have been successfully employed in various authorship analysis tasks; we extend the traditional feature-set to include some more interesting and effective features for email authorship identification (e.g. the last punctuation mark used in an email, the tendency of an author to use capitalization at the start of an email, or the punctuation after a greeting or farewell). We also included Info Gain feature selection based content features. It is observed that the use of such features in the authorship identification process has a positive impact on the accuracy of the authorship identification task. We performed experiments to justify our arguments and compared the results with other base line models. Experimental results reveal that the proposed CCM-based email authorship identification model, along with the proposed feature set, outperforms the state-of-the-art support vector machine (SVM)-based models, as well as the models proposed by Iqbal et al. [1, 2]. The proposed model attains an accuracy rate of 94% for 10 authors, 89% for 25 authors, and 81% for 50 authors, respectively on Enron dataset, while 89.5% accuracy has been achieved on authors' constructed real email dataset. The results on Enron dataset have been achieved on quite a large number of authors as compared to the models proposed by Iqbal et al. [1, 2]
    corecore