3,920 research outputs found
Computational Models (of Narrative) for Literary Studies
In the last decades a growing body of literature in Artificial Intelligence (AI) and Cognitive
Science (CS) has approached the problem of narrative understanding by means of computational
systems. Narrative, in fact, is an ubiquitous element in our everyday activity and
the ability to generate and understand stories, and their structures, is a crucial cue of our intelligence.
However, despite the fact that - from an historical standpoint - narrative (and narrative
structures) have been an important topic of investigation in both these areas, a more
comprehensive approach coupling them with narratology, digital humanities and literary
studies was still lacking.
With the aim of covering this empty space, in the last years, a multidisciplinary effort
has been made in order to create an international meeting open to computer scientist, psychologists,
digital humanists, linguists, narratologists etc.. This event has been named CMN
(for Computational Models of Narrative) and was launched in the 2009 by the MIT scholars
Mark A. Finlayson and Patrick H. Winston1
Relation Extraction Datasets in the Digital Humanities Domain and their Evaluation with Word Embeddings
In this research, we manually create high-quality datasets in the digital
humanities domain for the evaluation of language models, specifically word
embedding models. The first step comprises the creation of unigram and n-gram
datasets for two fantasy novel book series for two task types each, analogy and
doesn't-match. This is followed by the training of models on the two book
series with various popular word embedding model types such as word2vec, GloVe,
fastText, or LexVec. Finally, we evaluate the suitability of word embedding
models for such specific relation extraction tasks in a situation of comparably
small corpus sizes. In the evaluations, we also investigate and analyze
particular aspects such as the impact of corpus term frequencies and task
difficulty on accuracy. The datasets, and the underlying system and word
embedding models are available on github and can be easily extended with new
datasets and tasks, be used to reproduce the presented results, or be
transferred to other domains
Researchers eye-view of sarcasm detection in social media textual content
The enormous use of sarcastic text in all forms of communication in social
media will have a physiological effect on target users. Each user has a
different approach to misusing and recognising sarcasm. Sarcasm detection is
difficult even for users, and this will depend on many things such as
perspective, context, special symbols. So, that will be a challenging task for
machines to differentiate sarcastic sentences from non-sarcastic sentences.
There are no exact rules based on which model will accurately detect sarcasm
from many text corpus in the current situation. So, one needs to focus on
optimistic and forthcoming approaches in the sarcasm detection domain. This
paper discusses various sarcasm detection techniques and concludes with some
approaches, related datasets with optimal features, and the researcher's
challenges.Comment: 8 page
Authorship Classification in a Resource Constraint Language Using Convolutional Neural Networks
Authorship classification is a method of automatically determining the appropriate author of an unknown linguistic text. Although research on authorship classification has significantly progressed in high-resource languages, it is at a primitive stage in the realm of resource-constraint languages like Bengali. This paper presents an authorship classification approach made of Convolution Neural Networks (CNN) comprising four modules: embedding model generation, feature representation, classifier training and classifier testing. For this purpose, this work develops a new embedding corpus (named WEC) and a Bengali authorship classification corpus (called BACC-18), which are more robust in terms of authors’ classes and unique words. Using three text embedding techniques (Word2Vec, GloVe and FastText) and combinations of different hyperparameters, 90 embedding models are created in this study. All the embedding models are assessed by intrinsic evaluators and those selected are the 9 best performing models out of 90 for the authorship classification. In total 36 classification models, including four classification models (CNN, LSTM, SVM, SGD) and three embedding techniques with 100, 200 and 250 embedding dimensions, are trained with optimized hyperparameters and tested on three benchmark datasets (BACC-18, BAAD16 and LD). Among the models, the optimized CNN with GloVe model achieved the highest classification accuracies of 93.45%, 95.02%, and 98.67% for the datasets BACC-18, BAAD16, and LD, respectively
- …