11 research outputs found

    Evaluation of Measures of Distinctiveness. Classification of Literary Texts on the Basis of Distinctive Words

    Get PDF
    This paper concerns an empirical evaluation of nine different measures of distinctiveness or ‘keyness’ in the context of Computational Literary Studies. We use nine different sets of literary texts (specifically, novels) written in seven different languages as a basis for this evaluation. The evaluation is performed as a downstream classification task, where segments of the novels need to be classified by subgenre or period of first publication. The classifier receives different numbers of features identified using different measures of distinctiveness. The main contribution of our paper is that we can show that across a wide variety of parameters, but especially when only a small number of features is used, (more recent) dispersion-based measures very often outperform other (more established) frequency-based measures by significant margins. Our findings support an emerging trend to consider dispersion as an important property of words in addition to frequency

    From Keyness to Distinctiveness – Triangulation and Evaluation in Computational Literary Studies

    Get PDF
    There is a set of statistical measures developed mostly in corpus and computational linguistics and information retrieval, known as keyness measures, which are generally expected to detect textual features that account for differences between two texts or groups of texts. These measures are based on the frequency, distribution, or dispersion of words (or other features). Searching for relevant differences or similarities between two text groups is also an activity that is characteristic of traditional literary studies, whenever two authors, two periods in the work of one author, two historical periods or two literary genres are to be compared. Therefore, applying quantitative procedures in order to search for differences seems to be promising in the field of computational literary studies as it allows to analyze large corpora and to base historical hypotheses on differences between authors, genres and periods on larger empirical evidence. However, applying quantitative procedures in order to answer questions relevant to literary studies in many cases raises methodological problems, which have been discussed on a more general level in the context of integrating or triangulating quantitative and qualitative methods in mixed methods research of the social sciences. This paper aims to solve these methodological issues concretely for the concept of distinctiveness and thus to lay the methodological foundation permitting to operationalize quantitative procedures in order to use them not only as rough exploratory tools, but in a hermeneutically meaningful way for research in literary studies. Based on a structural definition of potential candidate measures for analyzing distinctiveness in the first section, we offer a systematic description of the issue of integrating quantitative procedures into a hermeneutically meaningful understanding of distinctiveness by distinguishing its epistemological from the methodological perspective. The second section develops a systematic strategy to solve the methodological side of this issue based on a critical reconstruction of the widespread non-integrative strategy in research on keyness measures that can be traced back to Rudolf Carnap’s model of explication. We demonstrate that it is, in the first instance, mandatory to gain a comprehensive qualitative understanding of the actual task. We show that Carnap’s model of explication suffers from a shortcoming that consists in ignoring the need for a systematic comparison of what he calls the explicatum and the explicandum. Only if there is a method of systematic comparison, the next task, namely that of evaluation can be addressed, which verifies whether the output of a quantitative procedure corresponds to the qualitative expectation that must be clarified in advance. We claim that evaluation is necessary for integrating quantitative procedures to a qualitative understanding of distinctiveness. Our reconstruction shows that both steps are usually skipped in empirical research on keyness measures that are the most important point of reference for the development of a measure of distinctiveness. Evaluation, which in turn requires thorough explication and conceptual clarification, needs to be employed to verify this relation. In the third section we offer a qualitative clarification of the concept of distinctiveness by spanning a three-dimensional conceptual space. This flexible framework takes into account that there is no single and proper concept of distinctiveness but rather a field of possible meanings depending on research interest, theoretical framework, and access to the perceptibility or salience of textual features. Therefore, we shall, instead of stipulating any narrow and strict definition, take into account that each of these aspects – interest, theoretical framework, and access to perceptibility – represents one dimension of the heuristic space of possible uses of the concept of distinctiveness. The fourth section discusses two possible strategies of operationalization and evaluation that we consider to be complementary to the previously provided clarification, and that complete the task of establishing a candidate measure successfully as a measure of distinctiveness in a qualitatively ambitious sense. We demonstrate that two different general strategies are worth considering, depending on the respective notion of distinctiveness and the interest as elaborated in the third section. If the interest is merely taxonomic, classification tasks based on multi-class supervised machine learning are sufficient. If the interest is aesthetic, more complex and intricate evaluation strategies are required, which have to rely on a thorough conceptual clarification of the concept of distinctiveness, in particular on the idea of salience or perceptibility. The challenge here is to correlate perceivable complex features of texts such as plot, theme (aboutness), style, form, or roles and constellation of fictional characters with the unperceived frequency and distribution of word features that are calculated by candidate measures of distinctiveness. Existing research did not clarify, so far, how to correlate such complex features with individual word features. The paper concludes with a general reflection on the possibility of mixed methods research for computational literary studies in terms of explanatory power and exploratory use. As our strategy of combining explication and evaluation shows, integration should be understood as a strategy of combining two different perspectives on the object area: in our evaluation scenarios, that of empirical reader response and that of a specific quantitative procedure. This does not imply that measures of distinctiveness, which proved to reach explanatory power in one qualitative aspect, should be supposed to be successful in all fields of research. As long as evaluation is omitted, candidate measures of distinctiveness lack explanatory power and are limited to exploratory use. In contrast with a skepticism that has sometimes been expressed from literary scholars with regard to the relevance of computational literary studies on proper issues of the humanities, we believe that integrating computational methods into hermeneutic literary studies can be achieved in a way that reaches higher explanatory power than the usual exploratory use of keyness measures, but it can only be achieved individually for concrete tasks and not once and for all based on a general theoretical demonstration.See also the publisher version here, accessible via personal request: https://zenodo.org/record/570737

    Computational Literary Studies Infrastructure (CLSINFRA): a H2020 Research Infrastructure Project that aids to connect researchers, data, and methods

    No full text
    The aim of this poster is to provide an overview of the principal objectives of the newly started H2020 Computational Literary Studies (CLS) project- https://www.clsinfra.io. CLS is a infrastructure project works to develop and bring together resources of high-quality data, tools and knowledge to aid new approaches to studying literature in the digital age. Conducting computational literary studies has a number of challenges and opportunities from multilingual and bringing together distributing information. At present, the landscape of literary data is diverse and fragmented. Even though many resources are currently available in digital libraries, archives, repositories, websites or catalogues, a lack of standardisation hinders how they are constructed, accessed and the extent to which they are reusable (Ciotti 2014). CLS project aims to federate these resources, with the tools needed to interrogate them, and with a widened base of users, in the spirit of the FAIR and CARE principles (Wilkinson et al. 2016). The resulting improvements will benefit researchers by bridging gaps between greater- and lesser- resourced communities in computational literary studies and beyond, ultimately offering opportunities to create new research and insight into our shared and varied European cultural heritage. Rather than building entirely new resources for literary studies, the project is committed to exploiting and connecting the already-existing efforts and initiatives, in order to acknowledge and utilize the immense human labour that has already been undertaken. Therefore, the project builds on recently- compiled high-quality literary corpora, such as DraCor and ELTeC (Fischer et al. 2019, Burnard et al. 2021, Schöch et al. in press), integrates existing tools for text analysis, e.g. TXM, stylo, multilingual NLP pipelines (Heiden 2010, Eder et al. 2016), and takes advantage of deep integration with two other infrastructural projects, namely the CLARIN and DARIAH ERICs. Consequently, the project aims at building a coherent ecosystem to foster the technical and intellectual findability and accessibility of relevant data. The ecosystem consists of (1) resources, i.e. text collections for drama, poetry and prose in several languages, (2) tools, (3) methodological and theoretical considerations, (4) a network of CLS scholars based at different European institutions, (5) a system of short-term research stays for both early career researchers and seasoned scholars, (6) a repository for training materials, as well as (7) an efficient dissemination strategy. This is achieved through a collaboration between participating institutions: Institute of Polish Language at the Polish Academy of Sciences, Poland; University of Potsdam, Germany; Austrian Academy of Sciences, Austria; National University of Distance Education, Spain; École Normale SupĂ©rieure de Lyon, France; Humboldt University of Berlin, German; Charles University, Czech Republic; Digital Research Infrastructure for the Arts and Humanities, France; Ghent Centre for Digital Humanities, Ghent University, Belgium; Belgrade Centre for Digital Humanities, Serbia; Huygens Institute for the History of the Netherlands (Royal Netherlands Academy of Arts and Sciences), Netherlands; Trier Center for Digital Humanities, Trier University, Germany; Moore Institute, National University of Ireland Galway, Ireland

    CLS Infra Computational Literary Studies Infrastructure

    No full text
    Computational Literary Studies Infrastructure, funded by the Horizon2020 grant scheme, is a four-year, pan-European project that aims to unify the diverse landscape of computational text analysis, in terms of available texts, tools, methods, practices and so forth, within its growing international user community. The project started out in February 2021, meaning that it has been underway for just over a year. In our poster we discuss the various deliverables and activities that have come out of the CLS INFRA project in its first quarter to give an idea of its impact in practice

    International Corporate Tax Avoidance: A Review of the Channels, Magnitudes, and Blind Spots

    No full text

    The International Linear Collider: Report to Snowmass 2021

    No full text
    The International Linear Collider (ILC) is on the table now as a new global energy-frontier accelerator laboratory taking data in the 2030s. The ILC addresses key questions for our current understanding of particle physics. It is based on a proven accelerator technology. Its experiments will challenge the Standard Model of particle physics and will provide a new window to look beyond it. This document brings the story of the ILC up to date, emphasizing its strong physics motivation, its readiness for construction, and the opportunity it presents to the US and the global particle physics community

    The International Linear Collider:Report to Snowmass 2021

    No full text

    The International Linear Collider: Report to Snowmass 2021

    No full text
    International audienceThe International Linear Collider (ILC) is on the table now as a new global energy-frontier accelerator laboratory taking data in the 2030s. The ILC addresses key questions for our current understanding of particle physics. It is based on a proven accelerator technology. Its experiments will challenge the Standard Model of particle physics and will provide a new window to look beyond it. This document brings the story of the ILC up to date, emphasizing its strong physics motivation, its readiness for construction, and the opportunity it presents to the US and the global particle physics community

    The International Linear Collider: Report to Snowmass 2021

    No full text
    The International Linear Collider (ILC) is on the table now as a new global energy-frontier accelerator laboratory taking data in the 2030s. The ILC addresses key questions for our current understanding of particle physics. It is based on a proven accelerator technology. Its experiments will challenge the Standard Model of particle physics and will provide a new window to look beyond it. This document brings the story of the ILC up to date, emphasizing its strong physics motivation, its readiness for construction, and the opportunity it presents to the US and the global particle physics community

    The International Linear Collider:Report to Snowmass 2021

    No full text
    corecore