898 research outputs found

    Data Analysis on the Basis of Numerals Statistics

    Full text link
    Two approaches to content analysis of text data are suggested, both based on the statistical study of numerals occurrence in texts. The first approach is related to counting the frequency distribution of various leading digits of numerals occurring in the text. These frequencies are unequal: the digit 1 is strongly dominating; usually, the incidence of subsequent digits is monotonically decreasing. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic author's style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in Lithuanian – by S. Daukantas, A. Baranauskas, Maironis, and J. Tumas-Vaižgantas

    The similarity metric

    Full text link
    A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new ``normalized information distance'', based on the noncomputable notion of Kolmogorov complexity, and show that it is in this class and it minorizes every computable distance in the class (that is, it is universal in that it discovers all computable similarities). We demonstrate that it is a metric and call it the {\em similarity metric}. This theory forms the foundation for a new practical tool. To evidence generality and robustness we give two distinctive applications in widely divergent areas using standard compression programs like gzip and GenCompress. First, we compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we fully automatically compute the language tree of 52 different languages.Comment: 13 pages, LaTex, 5 figures, Part of this work appeared in Proc. 14th ACM-SIAM Symp. Discrete Algorithms, 2003. This is the final, corrected, version to appear in IEEE Trans Inform. T

    Making and shaping things in creative economies

    Get PDF
    Abstract book for symposium “Making and shaping things in creative economies. From history to present day” organised by Vilnius University, 28-30 November 2019. This symposium studies the ways design is organised and managed with different political processes and policies, both in past and present. Instead of focusing solely on the content of policies, politics and management, it attempts to create a wider debate within the framework of culture, creativity and economy. The event looks at the impact that the specific policies and individuals, organisations or institutions behind them have on existing design culture. In addition to the act of designing, the possible subjects include policies shaping all stages in the life cycle of an object, for example promotion, consumption, collecting objects or recycling them, as well as positioning design in a wider political context. Within the international symposium “Making and shaping things in creative economies” an event is dedicated to the study of art and culture within local creative economies and industries in Lithuania and nearby. The aim is to research the ways art is managed and organised with different strategies, processes and policies. Instead of focusing solely on the content of policies, politics and management, it creates a wider debate within the framework of culture, creativity and economy

    Preface

    Get PDF
    DAMSS-2018 is the jubilee 10th international workshop on data analysis methods for software systems, organized in Druskininkai, Lithuania, at the end of the year. The same place and the same time every year. Ten years passed from the first workshop. History of the workshop starts from 2009 with 16 presentations. The idea of such workshop came up at the Institute of Mathematics and Informatics. Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea. This idea got approval both in the Lithuanian research community and abroad. The number of this year presentations is 81. The number of registered participants is 113 from 13 countries. In 2010, the Institute of Mathematics and Informatics became a member of Vilnius University, the largest university of Lithuania. In 2017, the institute changes its name into the Institute of Data Science and Digital Technologies. This name reflects recent activities of the institute. The renewed institute has eight research groups: Cognitive Computing, Image and Signal Analysis, Cyber-Social Systems Engineering, Statistics and Probability, Global Optimization, Intelligent Technologies, Education Systems, Blockchain Technologies. The main goal of the workshop is to introduce the research undertaken at Lithuanian and foreign universities in the fields of data science and software engineering. Annual organization of the workshop allows the fast interchanging of new ideas among the research community. Even 11 companies supported the workshop this year. This means that the topics of the workshop are actual for business, too. Topics of the workshop cover big data, bioinformatics, data science, blockchain technologies, deep learning, digital technologies, high-performance computing, visualization methods for multidimensional data, machine learning, medical informatics, ontological engineering, optimization in data science, business rules, and software engineering. Seeking to facilitate relations between science and business, a special session and panel discussion is organized this year about topical business problems that may be solved together with the research community. This book gives an overview of all presentations of DAMSS-2018.DAMSS-2018 is the jubilee 10th international workshop on data analysis methods for software systems, organized in Druskininkai, Lithuania, at the end of the year. The same place and the same time every year. Ten years passed from the first workshop. History of the workshop starts from 2009 with 16 presentations. The idea of such workshop came up at the Institute of Mathematics and Informatics. Lithuanian Academy of Sciences and the Lithuanian Computer Society supported this idea. This idea got approval both in the Lithuanian research community and abroad. The number of this year presentations is 81. The number of registered participants is 113 from 13 countries. In 2010, the Institute of Mathematics and Informatics became a member of Vilnius University, the largest university of Lithuania. In 2017, the institute changes its name into the Institute of Data Science and Digital Technologies. This name reflects recent activities of the institute. The renewed institute has eight research groups: Cognitive Computing, Image and Signal Analysis, Cyber-Social Systems Engineering, Statistics and Probability, Global Optimization, Intelligent Technologies, Education Systems, Blockchain Technologies. The main goal of the workshop is to introduce the research undertaken at Lithuanian and foreign universities in the fields of data science and software engineering. Annual organization of the workshop allows the fast interchanging of new ideas among the research community. Even 11 companies supported the workshop this year. This means that the topics of the workshop are actual for business, too. Topics of the workshop cover big data, bioinformatics, data science, blockchain technologies, deep learning, digital technologies, high-performance computing, visualization methods for multidimensional data, machine learning, medical informatics, ontological engineering, optimization in data science, business rules, and software engineering. Seeking to facilitate relations between science and business, a special session and panel discussion is organized this year about topical business problems that may be solved together with the research community. This book gives an overview of all presentations of DAMSS-2018

    Numerals in authorial Turkish-language texts and the stylometric analysis

    Full text link
    Two approaches to the statistical analysis of texts are suggested, both based on the study of numerals occurrence in coherent texts. The first approach is related to the study of the frequency distribution of various leading digits of numerals occurring in the text. These frequencies are unequal: the digit 1 is strongly dominating; usually, the incidence of subsequent digits is monotonically decreasing. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic author's style feature, manifested in all (sufficiently long) texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced discourse analysis. This paper deals with the application of the second approach to the literary texts in Turkish. We have analysed almost the whole corpus of works by are illustrated by examples of computer analysis of the literary texts by O. Pamuk and Y. Kemal - two of Turkey's most prominent novelists. The hierarchical cluster analysis based on the occurrence of numerals in the texts by Pamuk and Kemal shows the author, genre, and chronology differences of numerals usage in the literary texts of these authors. © The Authors, published by EDP Sciences, 2021.We believe that the methodology we are developing can be a useful addition to the traditional stylometric practices of taking into account the length of sentences and words, the frequency of use of service words and certain significant parts of speech, etc. This work was supported by a grant from the Russian Foundation for Basic Research, project No. 19-012-00199A, “A New Method of Text Attribution Based on Statistics of Numerals”. This work was partially supported by a scholarship from the Slovak Academic Information Agency

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    Gender influences in Digital Humanities co-authorship networks

    Get PDF
    PURPOSE: This paper presents a co-authorship study of authors who published in Digital Humanities journals and examines the apparent influence of gender, or more specifically, the quantitatively detectable influence of gender in the networks they form. DESIGN/METHODOLOGY/APPROACH: This study applied co-authorship network analysis. Data has been collected from three canonical Digital Humanities journals over 52 years (1966–2017) and analysed. FINDINGS: The results are presented as visualised networks and suggest that female scholars in Digital Humanities play more central roles and act as the main bridges of collaborative networks even though overall female authors are fewer in number than male authors in the network. ORIGINALITY/VALUE: This is the first co-authorship network study in Digital Humanities to examine the role that gender appears to play in these co-authorship networks using statistical analysis and visualisation

    CLARIN. The infrastructure for language resources

    Get PDF
    CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

    Теория и практика приходской проповеди в Речи Посполитой в конце XVIII столетия

    Get PDF
    In the last decades of the 18th century, a few Polish dioceses were governed by representatives of the Catholic Enlightenment. Their pastoral activities focused on the reform of the priesthood and, especially, on the duty of preaching. Despite being perceived as members of a single group, their ideas differed to the point of being mutually contradictory. Interpretation of the ideological differences among these bishops is the preliminary aim of the paper. I examined pastoral letters and preacher handbooks written by four of these bishops: Michał Poniatowski, Ignacy Massalski, Wojciech Skarszewski, and Porfiriusz Skarbek-Ważyński. However, my main concern is the social practice of parochial preachers in their dioceses. I was interested in the methodology of sermonizing, the frequency of preaching topics, and the style and content of homilies delivered by clergy. I based my research on pastoral visitations, especially from the Diocese of Płock, providing information about the printed collections of sermons used by parochial clergy as well as the texts they wrote. The main conclusions are as follows: the clergy adopted to some extent only those reforms which were adjusted to their parochial needs and were supported by administrative pressure. Regardless of theoretical programs, preaching in the Commonwealth was changing in the direction of “Enlightened Tridentine Catholicism.” This means that the clergy accepted an enlightened style and language and a focus on morality, but not models of social and natural worlds. However, by rejecting the latter, they avoided enhancing the process of division between popular and elite.В последние десятилетия XVIII века несколько польских епархий находились под управлением представителей католического священства. Их пастырская деятельность сконцентрировалась на реформе духовенства, в частности, на распространении проповедничества. Несмотря на то, что эти люди воспринимались как члены одной группы, их идеи могли радикально различаться. Я исследовал пастырские письма и “Учебник для проповедника”, написанные четырьмя епископами: Михалом Понятовским, Игнатием Массальским, Войцехом Скаршевским и Порфирием Скарбек-Важинским. Однако моей главной задачей было изучение социальной практики приходских священников в их епархиях. Меня интересовала их методика проповедования, частота тем, стиль и содержание проповедей, произнесённых духовенством. Мое исследование опирается на описание пасторских визитов (в первую очередь в Полоцкой епархии), в которых сохраняется информация об использованных коллекциях проповедей, а также на тексты, написанные приходскими священниками. В результате исследования можно сделать вывод, что священники только частично усвоили результаты реформ, приспособленных для приходских нужд и поддержанных административными средствами. Несмотря на теоретические программы, проповедь в Речи Посполитой менялась в направлении “Католического триденского Просвещения”. Это значит, что священники приняли стиль и язык Просвещения и сосредоточились на нравоучении, а не на моделях социальных и естественных миров. Однако, отказываясь от них, они избежали распространения разделения культуры на элитарную и популярную
    corecore