Extracting Narrative Patterns in Different Textual Genres: A Multilevel Feature Discourse Analysis

Abstract

We present a data-driven approach to discover and extract patterns in textual genres with the aim of identifying whether there is an interesting variation of linguistic features among different narrative genres depending on their respective communicative purposes. We want to achieve this goal by performing a multilevel discourse analysis according to (1) the type of feature studied (shallow, syntactic, semantic, and discourse-related); (2) the texts at a document level; and (3) the textual genres of news, reviews, and children’s tales. To accomplish this, several corpora from the three textual genres were gathered from different sources to ensure a heterogeneous representation, paying attention to the presence and frequency of a series of features extracted with computational tools. This deep analysis aims at obtaining more detailed knowledge of the different linguistic phenomena that directly shape each of the genres included in the study, therefore showing the particularities that make them be considered as individual genres but also comprise them inside the narrative typology. The findings suggest that this type of multilevel linguistic analysis could be of great help for areas of research within natural language processing such as computational narratology, as they allow a better understanding of the fundamental features that define each genre and its communicative purpose. Likewise, this approach could also boost the creation of more consistent automatic story generation tools in areas of language generation.This research work is part of the R&D project “PID2021-123956OB-I00”, funded by MCIN/AEI/10.13039/501100011033/ and by “ERDF A way of making Europe”. Moreover, it was also partially funded by the project “CLEAR.TEXT: Enhancing the modernization public sector organizations by deploying natural language processing to make their digital content CLEARER to those with cognitive disabilities” (TED2021-130707B-I00), by the Generalitat Valenciana through the project “NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation” with grant reference CIPROM/2021/21, and finally by the European Commission ICT COST Action “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231)

    Similar works