11 research outputs found
Evaluation of Measures of Distinctiveness. Classification of Literary Texts on the Basis of Distinctive Words
This paper concerns an empirical evaluation of nine different measures
of distinctiveness or âkeynessâ in the context of Computational Literary
Studies. We use nine different sets of literary texts (specifically, novels) written
in seven different languages as a basis for this evaluation. The evaluation is
performed as a downstream classification task, where segments of the novels
need to be classified by subgenre or period of first publication. The classifier
receives different numbers of features identified using different measures of
distinctiveness. The main contribution of our paper is that we can show that
across a wide variety of parameters, but especially when only a small number
of features is used, (more recent) dispersion-based measures very often outperform
other (more established) frequency-based measures by significant margins.
Our findings support an emerging trend to consider dispersion as an important
property of words in addition to frequency
From Keyness to Distinctiveness â Triangulation and Evaluation in Computational Literary Studies
There is a set of statistical measures developed mostly in corpus and computational linguistics and information retrieval, known as keyness measures, which are generally expected to detect textual features that account for differences between two texts or groups of texts. These measures are based on the frequency, distribution, or dispersion of words (or other features). Searching for relevant differences or similarities between two text groups is also an activity that is characteristic of traditional literary studies, whenever two authors, two periods in the work of one author, two historical periods or two literary genres are to be compared. Therefore, applying quantitative procedures in order to search for differences seems to be promising in the field of computational literary studies as it allows to analyze large corpora and to base historical hypotheses on differences between authors, genres and periods on larger empirical evidence. However, applying quantitative procedures in order to answer questions relevant to literary studies in many cases raises methodological problems, which have been discussed on a more general level in the context of integrating or triangulating quantitative and qualitative methods in mixed methods research of the social sciences. This paper aims to solve these methodological issues concretely for the concept of distinctiveness and thus to lay the methodological foundation permitting to operationalize quantitative procedures in order to use them not only as rough exploratory tools, but in a hermeneutically meaningful way for research in literary studies.
Based on a structural definition of potential candidate measures for analyzing distinctiveness in the first section, we offer a systematic description of the issue of integrating quantitative procedures into a hermeneutically meaningful understanding of distinctiveness by distinguishing its epistemological from the methodological perspective. The second section develops a systematic strategy to solve the methodological side of this issue based on a critical reconstruction of the widespread non-integrative strategy in research on keyness measures that can be traced back to Rudolf Carnapâs model of explication. We demonstrate that it is, in the first instance, mandatory to gain a comprehensive qualitative understanding of the actual task. We show that Carnapâs model of explication suffers from a shortcoming that consists in ignoring the need for a systematic comparison of what he calls the explicatum and the explicandum. Only if there is a method of systematic comparison, the next task, namely that of evaluation can be addressed, which verifies whether the output of a quantitative procedure corresponds to the qualitative expectation that must be clarified in advance. We claim that evaluation is necessary for integrating quantitative procedures to a qualitative understanding of distinctiveness. Our reconstruction shows that both steps are usually skipped in empirical research on keyness measures that are the most important point of reference for the development of a measure of distinctiveness. Evaluation, which in turn requires thorough explication and conceptual clarification, needs to be employed to verify this relation.
In the third section we offer a qualitative clarification of the concept of distinctiveness by spanning a three-dimensional conceptual space. This flexible framework takes into account that there is no single and proper concept of distinctiveness but rather a field of possible meanings depending on research interest, theoretical framework, and access to the perceptibility or salience of textual features. Therefore, we shall, instead of stipulating any narrow and strict definition, take into account that each of these aspects â interest, theoretical framework, and access to perceptibility â represents one dimension of the heuristic space of possible uses of the concept of distinctiveness.
The fourth section discusses two possible strategies of operationalization and evaluation that we consider to be complementary to the previously provided clarification, and that complete the task of establishing a candidate measure successfully as a measure of distinctiveness in a qualitatively ambitious sense. We demonstrate that two different general strategies are worth considering, depending on the respective notion of distinctiveness and the interest as elaborated in the third section. If the interest is merely taxonomic, classification tasks based on multi-class supervised machine learning are sufficient. If the interest is aesthetic, more complex and intricate evaluation strategies are required, which have to rely on a thorough conceptual clarification of the concept of distinctiveness, in particular on the idea of salience or perceptibility. The challenge here is to correlate perceivable complex features of texts such as plot, theme (aboutness), style, form, or roles and constellation of fictional characters with the unperceived frequency and distribution of word features that are calculated by candidate measures of distinctiveness. Existing research did not clarify, so far, how to correlate such complex features with individual word features.
The paper concludes with a general reflection on the possibility of mixed methods research for computational literary studies in terms of explanatory power and exploratory use. As our strategy of combining explication and evaluation shows, integration should be understood as a strategy of combining two different perspectives on the object area: in our evaluation scenarios, that of empirical reader response and that of a specific quantitative procedure. This does not imply that measures of distinctiveness, which proved to reach explanatory power in one qualitative aspect, should be supposed to be successful in all fields of research. As long as evaluation is omitted, candidate measures of distinctiveness lack explanatory power and are limited to exploratory use. In contrast with a skepticism that has sometimes been expressed from literary scholars with regard to the relevance of computational literary studies on proper issues of the humanities, we believe that integrating computational methods into hermeneutic literary studies can be achieved in a way that reaches higher explanatory power than the usual exploratory use of keyness measures, but it can only be achieved individually for concrete tasks and not once and for all based on a general theoretical demonstration.See also the publisher version here, accessible via personal request: https://zenodo.org/record/570737
Computational Literary Studies Infrastructure (CLSINFRA): a H2020 Research Infrastructure Project that aids to connect researchers, data, and methods
The aim of this poster is to provide an overview of the principal objectives of the newly started H2020
Computational Literary Studies (CLS) project- https://www.clsinfra.io. CLS is a infrastructure project
works to develop and bring together resources of high-quality data, tools and knowledge to aid new
approaches to studying literature in the digital age. Conducting computational literary studies has a
number of challenges and opportunities from multilingual and bringing together distributing
information. At present, the landscape of literary data is diverse and fragmented. Even though many
resources are currently available in digital libraries, archives, repositories, websites or catalogues, a lack
of standardisation hinders how they are constructed, accessed and the extent to which they are reusable
(Ciotti 2014). CLS project aims to federate these resources, with the tools needed to interrogate them,
and with a widened base of users, in the spirit of the FAIR and CARE principles (Wilkinson et al. 2016).
The resulting improvements will benefit researchers by bridging gaps between greater- and lesser-
resourced communities in computational literary studies and beyond, ultimately offering opportunities
to create new research and insight into our shared and varied European cultural heritage.
Rather than building entirely new resources for literary studies, the project is committed to exploiting
and connecting the already-existing efforts and initiatives, in order to acknowledge and utilize the
immense human labour that has already been undertaken. Therefore, the project builds on recently-
compiled high-quality literary corpora, such as DraCor and ELTeC (Fischer et al. 2019, Burnard et al. 2021,
Schöch et al. in press), integrates existing tools for text analysis, e.g. TXM, stylo, multilingual NLP
pipelines (Heiden 2010, Eder et al. 2016), and takes advantage of deep integration with two other
infrastructural projects, namely the CLARIN and DARIAH ERICs. Consequently, the project aims at
building a coherent ecosystem to foster the technical and intellectual findability and accessibility of
relevant data. The ecosystem consists of (1) resources, i.e. text collections for drama, poetry and prose in
several languages, (2) tools, (3) methodological and theoretical considerations, (4) a network of CLS
scholars based at different European institutions, (5) a system of short-term research stays for both early
career researchers and seasoned scholars, (6) a repository for training materials, as well as (7) an
efficient dissemination strategy. This is achieved through a collaboration between participating
institutions: Institute of Polish Language at the Polish Academy of Sciences, Poland; University of
Potsdam, Germany; Austrian Academy of Sciences, Austria; National University of Distance Education,
Spain; Ăcole Normale SupĂ©rieure de Lyon, France; Humboldt University of Berlin, German; Charles
University, Czech Republic; Digital Research Infrastructure for the Arts and Humanities, France; Ghent
Centre for Digital Humanities, Ghent University, Belgium; Belgrade Centre for Digital Humanities, Serbia;
Huygens Institute for the History of the Netherlands (Royal Netherlands Academy of Arts and Sciences),
Netherlands; Trier Center for Digital Humanities, Trier University, Germany; Moore Institute, National
University of Ireland Galway, Ireland
CLS Infra Computational Literary Studies Infrastructure
Computational Literary Studies Infrastructure, funded by the Horizon2020 grant scheme, is a four-year, pan-European project that aims to unify the diverse landscape of computational text analysis, in terms of available texts, tools, methods, practices and so forth, within its growing international user community. The project started out in February 2021, meaning that it has been underway for just over a year. In our poster we discuss the various deliverables and activities that have come out of the CLS INFRA project in its first quarter to give an idea of its impact in practice
The International Linear Collider: Report to Snowmass 2021
The International Linear Collider (ILC) is on the table now as a new global energy-frontier accelerator laboratory taking data in the 2030s. The ILC addresses key questions for our current understanding of particle physics. It is based on a proven accelerator technology. Its experiments will challenge the Standard Model of particle physics and will provide a new window to look beyond it. This document brings the story of the ILC up to date, emphasizing its strong physics motivation, its readiness for construction, and the opportunity it presents to the US and the global particle physics community
The International Linear Collider: Report to Snowmass 2021
International audienceThe International Linear Collider (ILC) is on the table now as a new global energy-frontier accelerator laboratory taking data in the 2030s. The ILC addresses key questions for our current understanding of particle physics. It is based on a proven accelerator technology. Its experiments will challenge the Standard Model of particle physics and will provide a new window to look beyond it. This document brings the story of the ILC up to date, emphasizing its strong physics motivation, its readiness for construction, and the opportunity it presents to the US and the global particle physics community
The International Linear Collider: Report to Snowmass 2021
The International Linear Collider (ILC) is on the table now as a new global energy-frontier accelerator laboratory taking data in the 2030s. The ILC addresses key questions for our current understanding of particle physics. It is based on a proven accelerator technology. Its experiments will challenge the Standard Model of particle physics and will provide a new window to look beyond it. This document brings the story of the ILC up to date, emphasizing its strong physics motivation, its readiness for construction, and the opportunity it presents to the US and the global particle physics community