Search CORE

3,920 research outputs found

Computational Models (of Narrative) for Literary Studies

Author: Lieto Antonio
Publication venue
Publication date: 01/01/2015
Field of study

In the last decades a growing body of literature in Artificial Intelligence (AI) and Cognitive Science (CS) has approached the problem of narrative understanding by means of computational systems. Narrative, in fact, is an ubiquitous element in our everyday activity and the ability to generate and understand stories, and their structures, is a crucial cue of our intelligence. However, despite the fact that - from an historical standpoint - narrative (and narrative structures) have been an important topic of investigation in both these areas, a more comprehensive approach coupling them with narratology, digital humanities and literary studies was still lacking. With the aim of covering this empty space, in the last years, a multidisciplinary effort has been made in order to create an international meeting open to computer scientist, psychologists, digital humanists, linguists, narratologists etc.. This event has been named CMN (for Computational Models of Narrative) and was launched in the 2009 by the MIT scholars Mark A. Finlayson and Patrick H. Winston1

PhilPapers

Relation Extraction Datasets in the Digital Humanities Domain and their Evaluation with Word Embeddings

Author: Barinova Ariadna
Chernyak Ekaterina
Ilvovsky Dmitry
Mouromtsev Dmitry
Wohlgenannt Gerhard
Publication venue
Publication date: 04/03/2019
Field of study

In this research, we manually create high-quality datasets in the digital humanities domain for the evaluation of language models, specifically word embedding models. The first step comprises the creation of unigram and n-gram datasets for two fantasy novel book series for two task types each, analogy and doesn't-match. This is followed by the training of models on the two book series with various popular word embedding model types such as word2vec, GloVe, fastText, or LexVec. Finally, we evaluate the suitability of word embedding models for such specific relation extraction tasks in a situation of comparably small corpus sizes. In the evaluations, we also investigate and analyze particular aspects such as the impact of corpus term frequencies and task difficulty on accuracy. The datasets, and the underlying system and word embedding models are available on github and can be easily extended with new datasets and tasks, be used to reproduce the presented results, or be transferred to other domains

arXiv.org e-Print Archive

Researchers eye-view of sarcasm detection in social media textual content

Author: Khatavkar Vaibhav
Mane Swapnil
Publication venue
Publication date: 17/04/2023
Field of study

The enormous use of sarcastic text in all forms of communication in social media will have a physiological effect on target users. Each user has a different approach to misusing and recognising sarcasm. Sarcasm detection is difficult even for users, and this will depend on many things such as perspective, context, special symbols. So, that will be a challenging task for machines to differentiate sarcastic sentences from non-sarcastic sentences. There are no exact rules based on which model will accurately detect sarcasm from many text corpus in the current situation. So, one needs to focus on optimistic and forthcoming approaches in the sarcasm detection domain. This paper discusses various sarcasm detection techniques and concludes with some approaches, related datasets with optimal features, and the researcher's challenges.Comment: 8 page

arXiv.org e-Print Archive

Authorship Classification in a Resource Constraint Language Using Convolutional Neural Networks

Author: Dewan M. Ali Akber
Hoque Mohammed Moshiul
Hossain Md. Rajib
Islam Md. Nazmul
Sarker Iqbal H.
Siddique Nazmul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Authorship classification is a method of automatically determining the appropriate author of an unknown linguistic text. Although research on authorship classification has significantly progressed in high-resource languages, it is at a primitive stage in the realm of resource-constraint languages like Bengali. This paper presents an authorship classification approach made of Convolution Neural Networks (CNN) comprising four modules: embedding model generation, feature representation, classifier training and classifier testing. For this purpose, this work develops a new embedding corpus (named WEC) and a Bengali authorship classification corpus (called BACC-18), which are more robust in terms of authors’ classes and unique words. Using three text embedding techniques (Word2Vec, GloVe and FastText) and combinations of different hyperparameters, 90 embedding models are created in this study. All the embedding models are assessed by intrinsic evaluators and those selected are the 9 best performing models out of 90 for the authorship classification. In total 36 classification models, including four classification models (CNN, LSTM, SVM, SGD) and three embedding techniques with 100, 200 and 250 embedding dimensions, are trained with optimized hyperparameters and tested on three benchmark datasets (BACC-18, BAAD16 and LD). Among the models, the optimized CNN with GloVe model achieved the highest classification accuracies of 93.45%, 95.02%, and 98.67% for the datasets BACC-18, BAAD16, and LD, respectively

Directory of Open Access Journals

Ulster University's Research Portal