4,579 research outputs found

    Numerals in authorial Turkish-language texts and the stylometric analysis

    Full text link
    Two approaches to the statistical analysis of texts are suggested, both based on the study of numerals occurrence in coherent texts. The first approach is related to the study of the frequency distribution of various leading digits of numerals occurring in the text. These frequencies are unequal: the digit 1 is strongly dominating; usually, the incidence of subsequent digits is monotonically decreasing. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic author's style feature, manifested in all (sufficiently long) texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced discourse analysis. This paper deals with the application of the second approach to the literary texts in Turkish. We have analysed almost the whole corpus of works by are illustrated by examples of computer analysis of the literary texts by O. Pamuk and Y. Kemal - two of Turkey's most prominent novelists. The hierarchical cluster analysis based on the occurrence of numerals in the texts by Pamuk and Kemal shows the author, genre, and chronology differences of numerals usage in the literary texts of these authors. © The Authors, published by EDP Sciences, 2021.We believe that the methodology we are developing can be a useful addition to the traditional stylometric practices of taking into account the length of sentences and words, the frequency of use of service words and certain significant parts of speech, etc. This work was supported by a grant from the Russian Foundation for Basic Research, project No. 19-012-00199A, “A New Method of Text Attribution Based on Statistics of Numerals”. This work was partially supported by a scholarship from the Slovak Academic Information Agency

    Spatial and verbal routes to number comparison in young children

    Get PDF
    The ability to compare the numerical magnitude of symbolic numbers represents a milestone in the development of numerical skills. However, it remains unclear how basic numerical abilities contribute to the understanding of symbolic magnitude and whether the impact of these abilities may vary when symbolic numbers are presented as number words (e.g., \u201csix vs. eight\u201d) vs. Arabic numbers (e.g., 6 vs. 8). In the present study on preschool children, we show that comparison of number words is related to cardinality knowledge whereas the comparison of Arabic digits is related to both cardinality knowledge and the ability to spatially map numbers. We conclude that comparison of symbolic numbers in preschool children relies on multiple numerical skills and representations, which can be differentially weighted depending on the presentation format. In particular, the spatial arrangement of digits on the number line seems to scaffold the development of a \u201cspatial route\u201d to understanding the exact magnitude of numerals

    Existential witness extraction in classical realizability and via a negative translation

    Full text link
    We show how to extract existential witnesses from classical proofs using Krivine's classical realizability---where classical proofs are interpreted as lambda-terms with the call/cc control operator. We first recall the basic framework of classical realizability (in classical second-order arithmetic) and show how to extend it with primitive numerals for faster computations. Then we show how to perform witness extraction in this framework, by discussing several techniques depending on the shape of the existential formula. In particular, we show that in the Sigma01-case, Krivine's witness extraction method reduces to Friedman's through a well-suited negative translation to intuitionistic second-order arithmetic. Finally we discuss the advantages of using call/cc rather than a negative translation, especially from the point of view of an implementation.Comment: 52 pages. Accepted in Logical Methods for Computer Science (LMCS), 201

    Journal of translational internal medicine : TJIM

    Get PDF

    Data Analysis on the Basis of Numerals Statistics

    Full text link
    Two approaches to content analysis of text data are suggested, both based on the statistical study of numerals occurrence in texts. The first approach is related to counting the frequency distribution of various leading digits of numerals occurring in the text. These frequencies are unequal: the digit 1 is strongly dominating; usually, the incidence of subsequent digits is monotonically decreasing. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic author's style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in Lithuanian – by S. Daukantas, A. Baranauskas, Maironis, and J. Tumas-Vaižgantas

    Deep Dialog Act Recognition using Multiple Token, Segment, and Context Information Representations

    Get PDF
    Dialog act (DA) recognition is a task that has been widely explored over the years. Recently, most approaches to the task explored different DNN architectures to combine the representations of the words in a segment and generate a segment representation that provides cues for intention. In this study, we explore means to generate more informative segment representations, not only by exploring different network architectures, but also by considering different token representations, not only at the word level, but also at the character and functional levels. At the word level, in addition to the commonly used uncontextualized embeddings, we explore the use of contextualized representations, which provide information concerning word sense and segment structure. Character-level tokenization is important to capture intention-related morphological aspects that cannot be captured at the word level. Finally, the functional level provides an abstraction from words, which shifts the focus to the structure of the segment. We also explore approaches to enrich the segment representation with context information from the history of the dialog, both in terms of the classifications of the surrounding segments and the turn-taking history. This kind of information has already been proved important for the disambiguation of DAs in previous studies. Nevertheless, we are able to capture additional information by considering a summary of the dialog history and a wider turn-taking context. By combining the best approaches at each step, we achieve results that surpass the previous state-of-the-art on generic DA recognition on both SwDA and MRDA, two of the most widely explored corpora for the task. Furthermore, by considering both past and future context, simulating annotation scenario, our approach achieves a performance similar to that of a human annotator on SwDA and surpasses it on MRDA.Comment: 38 pages, 7 figures, 9 tables, submitted to JAI
    corecore