4,579 research outputs found
Numerals in authorial Turkish-language texts and the stylometric analysis
Two approaches to the statistical analysis of texts are suggested, both based on the study of numerals occurrence in coherent texts. The first approach is related to the study of the frequency distribution of various leading digits of numerals occurring in the text. These frequencies are unequal: the digit 1 is strongly dominating; usually, the incidence of subsequent digits is monotonically decreasing. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic author's style feature, manifested in all (sufficiently long) texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced discourse analysis. This paper deals with the application of the second approach to the literary texts in Turkish. We have analysed almost the whole corpus of works by are illustrated by examples of computer analysis of the literary texts by O. Pamuk and Y. Kemal - two of Turkey's most prominent novelists. The hierarchical cluster analysis based on the occurrence of numerals in the texts by Pamuk and Kemal shows the author, genre, and chronology differences of numerals usage in the literary texts of these authors. © The Authors, published by EDP Sciences, 2021.We believe that the methodology we are developing can be a useful addition to the traditional stylometric practices of taking into account the length of sentences and words, the frequency of use of service words and certain significant parts of speech, etc. This work was supported by a grant from the Russian Foundation for Basic Research, project No. 19-012-00199A, “A New Method of Text Attribution Based on Statistics of Numerals”. This work was partially supported by a scholarship from the Slovak Academic Information Agency
Spatial and verbal routes to number comparison in young children
The ability to compare the numerical magnitude of symbolic numbers represents a milestone in the development of numerical skills. However, it remains unclear how basic numerical abilities contribute to the understanding of symbolic magnitude and whether the impact of these abilities may vary when symbolic numbers are presented as number words (e.g., \u201csix vs. eight\u201d) vs. Arabic numbers (e.g., 6 vs. 8). In the present study on preschool children, we show that comparison of number words is related to cardinality knowledge whereas the comparison of Arabic digits is related to both cardinality knowledge and the ability to spatially map numbers. We conclude that comparison of symbolic numbers in preschool children relies on multiple numerical skills and representations, which can be differentially weighted depending on the presentation format. In particular, the spatial arrangement of digits on the number line seems to scaffold the development of a \u201cspatial route\u201d to understanding the exact magnitude of numerals
Existential witness extraction in classical realizability and via a negative translation
We show how to extract existential witnesses from classical proofs using
Krivine's classical realizability---where classical proofs are interpreted as
lambda-terms with the call/cc control operator. We first recall the basic
framework of classical realizability (in classical second-order arithmetic) and
show how to extend it with primitive numerals for faster computations. Then we
show how to perform witness extraction in this framework, by discussing several
techniques depending on the shape of the existential formula. In particular, we
show that in the Sigma01-case, Krivine's witness extraction method reduces to
Friedman's through a well-suited negative translation to intuitionistic
second-order arithmetic. Finally we discuss the advantages of using call/cc
rather than a negative translation, especially from the point of view of an
implementation.Comment: 52 pages. Accepted in Logical Methods for Computer Science (LMCS),
201
Data Analysis on the Basis of Numerals Statistics
Two approaches to content analysis of text data are suggested, both based on the statistical study of numerals occurrence in texts. The first approach is related to counting the frequency distribution of various leading digits of numerals occurring in the text. These frequencies are unequal: the digit 1 is strongly dominating; usually, the incidence of subsequent digits is monotonically decreasing. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic author's style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in Lithuanian – by S. Daukantas, A. Baranauskas, Maironis, and J. Tumas-Vaižgantas
Deep Dialog Act Recognition using Multiple Token, Segment, and Context Information Representations
Dialog act (DA) recognition is a task that has been widely explored over the
years. Recently, most approaches to the task explored different DNN
architectures to combine the representations of the words in a segment and
generate a segment representation that provides cues for intention. In this
study, we explore means to generate more informative segment representations,
not only by exploring different network architectures, but also by considering
different token representations, not only at the word level, but also at the
character and functional levels. At the word level, in addition to the commonly
used uncontextualized embeddings, we explore the use of contextualized
representations, which provide information concerning word sense and segment
structure. Character-level tokenization is important to capture
intention-related morphological aspects that cannot be captured at the word
level. Finally, the functional level provides an abstraction from words, which
shifts the focus to the structure of the segment. We also explore approaches to
enrich the segment representation with context information from the history of
the dialog, both in terms of the classifications of the surrounding segments
and the turn-taking history. This kind of information has already been proved
important for the disambiguation of DAs in previous studies. Nevertheless, we
are able to capture additional information by considering a summary of the
dialog history and a wider turn-taking context. By combining the best
approaches at each step, we achieve results that surpass the previous
state-of-the-art on generic DA recognition on both SwDA and MRDA, two of the
most widely explored corpora for the task. Furthermore, by considering both
past and future context, simulating annotation scenario, our approach achieves
a performance similar to that of a human annotator on SwDA and surpasses it on
MRDA.Comment: 38 pages, 7 figures, 9 tables, submitted to JAI
- …