Search CORE

898 research outputs found

Data Analysis on the Basis of Numerals Statistics

Author: Zenkov A.
Zenkov E.
Zenkov M.
Publication venue: УрФУ
Publication date: 01/01/2022
Field of study

Two approaches to content analysis of text data are suggested, both based on the statistical study of numerals occurrence in texts. The first approach is related to counting the frequency distribution of various leading digits of numerals occurring in the text. These frequencies are unequal: the digit 1 is strongly dominating; usually, the incidence of subsequent digits is monotonically decreasing. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic author's style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in Lithuanian – by S. Daukantas, A. Baranauskas, Maironis, and J. Tumas-Vaižgantas

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

The similarity metric

Author: Chen Xin
Li Ming
Li Xin
Ma Bin
Vitanyi Paul
Publication venue
Publication date: 01/01/2003
Field of study

A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new ``normalized information distance'', based on the noncomputable notion of Kolmogorov complexity, and show that it is in this class and it minorizes every computable distance in the class (that is, it is universal in that it discovers all computable similarities). We demonstrate that it is a metric and call it the {\em similarity metric}. This theory forms the foundation for a new practical tool. To evidence generality and robustness we give two distinctive applications in widely divergent areas using standard compression programs like gzip and GenCompress. First, we compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we fully automatically compute the language tree of 52 different languages.Comment: 13 pages, LaTex, 5 figures, Part of this work appeared in Proc. 14th ACM-SIAM Symp. Discrete Algorithms, 2003. This is the final, corrected, version to appear in IEEE Trans Inform. T

arXiv.org e-Print Archive

CiteSeerX

International Migration, Integration and Social Cohesion online publications

Making and shaping things in creative economies

Author: Jerlei Triin
Publication venue: 'Vilnius University Press'
Publication date: 15/11/2019
Field of study

Abstract book for symposium “Making and shaping things in creative economies. From history to present day” organised by Vilnius University, 28-30 November 2019. This symposium studies the ways design is organised and managed with different political processes and policies, both in past and present. Instead of focusing solely on the content of policies, politics and management, it attempts to create a wider debate within the framework of culture, creativity and economy. The event looks at the impact that the specific policies and individuals, organisations or institutions behind them have on existing design culture. In addition to the act of designing, the possible subjects include policies shaping all stages in the life cycle of an object, for example promotion, consumption, collecting objects or recycling them, as well as positioning design in a wider political context. Within the international symposium “Making and shaping things in creative economies” an event is dedicated to the study of art and culture within local creative economies and industries in Lithuania and nearby. The aim is to research the ways art is managed and organised with different strategies, processes and policies. Instead of focusing solely on the content of policies, politics and management, it creates a wider debate within the framework of culture, creativity and economy

Vilnius University Proceedings

Preface

Author: Press Vilnius University
Publication venue: 'Vilnius University Press'
Publication date: 01/01/2018
Field of study

DAMSS-2018 is the jubilee 10th international workshop on data analysis methods for software systems, organized in Druskininkai, Lithuania, at the end of the year. The same place and the same time every year. Ten years passed from the first workshop. History of the workshop starts from 2009 with 16 presentations. The idea of such workshop came up at the Institute of Mathematics and Informatics. Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea. This idea got approval both in the Lithuanian research community and abroad. The number of this year presentations is 81. The number of registered participants is 113 from 13 countries. In 2010, the Institute of Mathematics and Informatics became a member of Vilnius University, the largest university of Lithuania. In 2017, the institute changes its name into the Institute of Data Science and Digital Technologies. This name reflects recent activities of the institute. The renewed institute has eight research groups: Cognitive Computing, Image and Signal Analysis, Cyber-Social Systems Engineering, Statistics and Probability, Global Optimization, Intelligent Technologies, Education Systems, Blockchain Technologies. The main goal of the workshop is to introduce the research undertaken at Lithuanian and foreign universities in the fields of data science and software engineering. Annual organization of the workshop allows the fast interchanging of new ideas among the research community. Even 11 companies supported the workshop this year. This means that the topics of the workshop are actual for business, too. Topics of the workshop cover big data, bioinformatics, data science, blockchain technologies, deep learning, digital technologies, high-performance computing, visualization methods for multidimensional data, machine learning, medical informatics, ontological engineering, optimization in data science, business rules, and software engineering. Seeking to facilitate relations between science and business, a special session and panel discussion is organized this year about topical business problems that may be solved together with the research community. This book gives an overview of all presentations of DAMSS-2018.DAMSS-2018 is the jubilee 10th international workshop on data analysis methods for software systems, organized in Druskininkai, Lithuania, at the end of the year. The same place and the same time every year. Ten years passed from the first workshop. History of the workshop starts from 2009 with 16 presentations. The idea of such workshop came up at the Institute of Mathematics and Informatics. Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea. This idea got approval both in the Lithuanian research community and abroad. The number of this year presentations is 81. The number of registered participants is 113 from 13 countries. In 2010, the Institute of Mathematics and Informatics became a member of Vilnius University, the largest university of Lithuania. In 2017, the institute changes its name into the Institute of Data Science and Digital Technologies. This name reflects recent activities of the institute. The renewed institute has eight research groups: Cognitive Computing, Image and Signal Analysis, Cyber-Social Systems Engineering, Statistics and Probability, Global Optimization, Intelligent Technologies, Education Systems, Blockchain Technologies. The main goal of the workshop is to introduce the research undertaken at Lithuanian and foreign universities in the fields of data science and software engineering. Annual organization of the workshop allows the fast interchanging of new ideas among the research community. Even 11 companies supported the workshop this year. This means that the topics of the workshop are actual for business, too. Topics of the workshop cover big data, bioinformatics, data science, blockchain technologies, deep learning, digital technologies, high-performance computing, visualization methods for multidimensional data, machine learning, medical informatics, ontological engineering, optimization in data science, business rules, and software engineering. Seeking to facilitate relations between science and business, a special session and panel discussion is organized this year about topical business problems that may be solved together with the research community. This book gives an overview of all presentations of DAMSS-2018

Crossref

Vilnius University Proceedings

Archivio istituzionale della ricerca - Università di Ferrara

Numerals in authorial Turkish-language texts and the stylometric analysis

Author: Sazanova L.
Zenkov A.
Zenkov E.
Zenkov M.
Publication venue: 'EDP Sciences'
Publication date: 01/01/2021
Field of study

Two approaches to the statistical analysis of texts are suggested, both based on the study of numerals occurrence in coherent texts. The first approach is related to the study of the frequency distribution of various leading digits of numerals occurring in the text. These frequencies are unequal: the digit 1 is strongly dominating; usually, the incidence of subsequent digits is monotonically decreasing. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic author's style feature, manifested in all (sufficiently long) texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced discourse analysis. This paper deals with the application of the second approach to the literary texts in Turkish. We have analysed almost the whole corpus of works by are illustrated by examples of computer analysis of the literary texts by O. Pamuk and Y. Kemal - two of Turkey's most prominent novelists. The hierarchical cluster analysis based on the occurrence of numerals in the texts by Pamuk and Kemal shows the author, genre, and chronology differences of numerals usage in the literary texts of these authors. © The Authors, published by EDP Sciences, 2021.We believe that the methodology we are developing can be a useful addition to the traditional stylometric practices of taking into account the length of sentences and words, the frequency of use of service words and certain significant parts of speech, etc. This work was supported by a grant from the Russian Foundation for Basic Research, project No. 19-012-00199A, “A New Method of Text Attribution Based on Statistics of Numerals”. This work was partially supported by a scholarship from the Slovak Academic Information Agency

EDP Sciences OAI-PMH repository (1.2.0)

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

CLARIN

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 30/01/2023
Field of study

The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

Directory of Open Access Books (DOAB)

Gender influences in Digital Humanities co-authorship networks

Author: Duke-Williams Oliver
Gao Jin
Mahony Simon
Nyhan Julianne
Publication venue: 'Emerald'
Publication date: 19/04/2022
Field of study

PURPOSE: This paper presents a co-authorship study of authors who published in Digital Humanities journals and examines the apparent influence of gender, or more specifically, the quantitatively detectable influence of gender in the networks they form. DESIGN/METHODOLOGY/APPROACH: This study applied co-authorship network analysis. Data has been collected from three canonical Digital Humanities journals over 52 years (1966–2017) and analysed. FINDINGS: The results are presented as visualised networks and suggest that female scholars in Digital Humanities play more central roles and act as the main bridges of collaborative networks even though overall female authors are fewer in number than male authors in the network. ORIGINALITY/VALUE: This is the first co-authorship network study in Digital Humanities to examine the role that gender appears to play in these co-authorship networks using statistical analysis and visualisation

UCL Discovery

CLARIN. The infrastructure for language resources

Author: Fišer Darja
Witt Andreas
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 17/10/2022
Field of study

CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

Publikationsserver des Instituts für Deutsche Sprache

Recommended from our members

How to design for persistence and retention in MOOCs?

Author: Brasher Andrew
McAndrew Patrick
Weller Martin
Publication venue: EADTU
Publication date: 10/06/2016
Field of study

Design of educational interventions is typically carried out following a design cycle involving phases of investigation, conceptualization, prototyping, implementation, execution and evaluation. This cycle can be applied at different levels of granularity e.g. learning activity, module, course or programme. In this paper we consider an aspect of learner behavior that can be critical to the success of many MOOCs i.e. their persistence to study, and the related theme of learner retention. We reflect on the impact that consideration of these can have on design decisions at different stages in the design cycle with the aim of en-hancing MOOC design in relation to learner persistence and retention, with particular attention to the European context

Open Research Online (The Open University)

Теория и практика приходской проповеди в Речи Посполитой в конце XVIII столетия

Author: Witecki Stanisław
Publication venue: Slověne = Словѣне. International Journal of Slavic Studies
Publication date: 29/12/2017
Field of study

In the last decades of the 18th century, a few Polish dioceses were governed by representatives of the Catholic Enlightenment. Their pastoral activities focused on the reform of the priesthood and, especially, on the duty of preaching. Despite being perceived as members of a single group, their ideas differed to the point of being mutually contradictory. Interpretation of the ideological differences among these bishops is the preliminary aim of the paper. I examined pastoral letters and preacher handbooks written by four of these bishops: Michał Poniatowski, Ignacy Massalski, Wojciech Skarszewski, and Porfiriusz Skarbek-Ważyński. However, my main concern is the social practice of parochial preachers in their dioceses. I was interested in the methodology of sermonizing, the frequency of preaching topics, and the style and content of homilies delivered by clergy. I based my research on pastoral visitations, especially from the Diocese of Płock, providing information about the printed collections of sermons used by parochial clergy as well as the texts they wrote. The main conclusions are as follows: the clergy adopted to some extent only those reforms which were adjusted to their parochial needs and were supported by administrative pressure. Regardless of theoretical programs, preaching in the Commonwealth was changing in the direction of “Enlightened Tridentine Catholicism.” This means that the clergy accepted an enlightened style and language and a focus on morality, but not models of social and natural worlds. However, by rejecting the latter, they avoided enhancing the process of division between popular and elite.В последние десятилетия XVIII века несколько польских епархий находились под управлением представителей католического священства. Их пастырская деятельность сконцентрировалась на реформе духовенства, в частности, на распространении проповедничества. Несмотря на то, что эти люди воспринимались как члены одной группы, их идеи могли радикально различаться. Я исследовал пастырские письма и “Учебник для проповедника”, написанные четырьмя епископами: Михалом Понятовским, Игнатием Массальским, Войцехом Скаршевским и Порфирием Скарбек-Важинским. Однако моей главной задачей было изучение социальной практики приходских священников в их епархиях. Меня интересовала их методика проповедования, частота тем, стиль и содержание проповедей, произнесённых духовенством. Мое исследование опирается на описание пасторских визитов (в первую очередь в Полоцкой епархии), в которых сохраняется информация об использованных коллекциях проповедей, а также на тексты, написанные приходскими священниками. В результате исследования можно сделать вывод, что священники только частично усвоили результаты реформ, приспособленных для приходских нужд и поддержанных административными средствами. Несмотря на теоретические программы, проповедь в Речи Посполитой менялась в направлении “Католического триденского Просвещения”. Это значит, что священники приняли стиль и язык Просвещения и сосредоточились на нравоучении, а не на моделях социальных и естественных миров. Однако, отказываясь от них, они избежали распространения разделения культуры на элитарную и популярную

Slověne = Словѣне. International Journal of Slavic Studies (Institute for Slavic Studies of the Russian Academy of Sciences)