7 research outputs found

    When humans and machines collaborate: Cross-lingual Label Editing in Wikidata

    Get PDF
    The quality and maintainability of a knowledge graph are determined by the process in which it is created. There are different approaches to such processes; extraction or conversion of available data in the web (automated extraction of knowledge such as DBpedia from Wikipedia), community-created knowledge graphs, often by a group of experts, and hybrid approaches where humans maintain the knowledge graph alongside bots. We focus in this work on the hybrid approach of human edited knowledge graphs supported by automated tools. In particular, we analyse the editing of natural language data, i.e. labels. Labels are the entry point for humans to understand the information, and therefore need to be carefully maintained. We take a step toward the understanding of collaborative editing of humans and automated tools across languages in a knowledge graph. We use Wikidata as it has a large and active community of humans and bots working together covering over 300 languages. In this work, we analyse the different editor groups and how they interact with the different language data to understand the provenance of the current label data

    Visualization of the evolution of collaboration and communication networks in wikis

    Get PDF
    Commons-based peer production communities can be analyzed with the help of social network analysis. However, since they are fluid organizations that change over time, the time dimension needs to be taken into account. In this work we present a web application, WikiChron networks, to facilitate the study of the evolution of wiki communities over time. The tool displays three different community networks depending on the pages considered for the interactions: articles, talk pages of articles or talk pages of users. The consideration of these three networks offer complementary views of the same community, while the time dimension makes possible to observe how the network structures changes over time and the changes in the network role experimented by some editors. We illustrate the usefulness of our tool analyzing the evolution of a wiki community in different moments and showing network structures that can be seen in other wiki communities. WikiChron networks is open source and is publicly available. We hope that it will stimulate research on the evolution of collaboration and communication in wiki communities

    Diversity and bias in DBpedia and Wikidata as a challenge for text-analysis tools

    Get PDF
    Diversity Searcher ist ein Tool, das ursprünglich entwickelt wurde, um bei der Analyse von Diversität in Nachrichtentexten zu helfen. Es beruht auf einer automatisierten Inhaltsanalyse und stützt sich daher auf Annahmen und hängt von Designentscheidungen in Bezug auf Diversität ab. In diesem Artikel untersuchen wir die Auswirkungen davon, dass Ergebnisse einer automatisierten Inhaltsanalyse in der Regel von externen Wissensquellen abhängig sind. Wir vergleichen zwei Datenquellen, mit denen der Diversity Searcher arbeitet – DBpedia und Wikidata – im Hinblick auf ihre ontologische Abdeckung und Diversität und beschreiben die Auswirkungen auf die daraus resultierenden Analysen von Textkorpora. Wir beschreiben eine Fallstudie zur relativen Über- bzw. Unterrepräsentation belgischer politischer Parteien zwischen 1990 und 2020. Insbesondere stießen wir auf eine erstaunlich starke Überrepräsentation der politischen Rechten in der englischsprachigen DBpedia.Diversity Searcher is a tool originally developed to help analyse diversity in news media texts. It relies on automated content analysis and thus rests on prior assumptions and depends on certain design choices related to diversity. One such design choice is the external knowledge source(s) used. In this article, we discuss implications that these sources can have on the results of content analysis. We compare two data sources that Diversity Searcher has worked with – DBpedia and Wikidata – with respect to their ontological coverage and diversity, and describe implications for the resulting analyses of text corpora. We describe a case study of the relative over- or  underrepresentation of Belgian political parties between 1990 and 2020. In particular, we found a staggering overrepresentation of the political right in the English-language DBpedia

    Predicting Open Source Forked Pattern Survivability

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.The motivational behaviour of open source (OS) developers has always been an active focus of research. With the introduction of the forking technique a related research area of developer forking motivational behaviour has gained significance, partly due to the problem of forking scarcity and low fork visibility performance. The objective of forking is to improve and innovate source code quality from voluntary developers. Unfortunately, the forking technique is not very sustainable in improving fork efficiency and efficacy. Further, developers may spend time forking source codes that may become inactive and consequently prove to be a waste of time and effort. From the perspective of project owners, if their repositories do not receive a good fork response from developers, their repositories will not grow. This doctoral research study aimed to address these problems by avoiding forking scarcity, increasing high fork visibility performance, and promoting positive developer forking motivation. We also needed to investigate OS environment compliance to determine whether it contributes to improved fork visibility, reduced fork deficiency and/or is viewed positively by developers. The research approach was to apply a model to predict high fork visibility. The model is based on the K Nearest Neighbour machine learning algorithm, using the Euclidean distance metric to predict high fork visibility performance. We piloted it using nine repository classifiers and then conducted a longitudinal study of five select repository classifiers to determine accuracy and distance approximation. Our work adds a new body of knowledge to OS forking theory and provides a deeper understanding of developer forking motivational behaviour. In the first phase of this study, we conducted a literature review of forking motivation and research methods used in OSS. We then developed and tested our model. In the last phase, we identified OSS patterns and detected fork longevity to determine whether environmental compliance was fully, partially or not at all satisfied. Most importantly, we showed that high fork visibility environmental compliance distance approximation can positively predict developer forking interest

    Opinion Mining for Software Development: A Systematic Literature Review

    Get PDF
    Opinion mining, sometimes referred to as sentiment analysis, has gained increasing attention in software engineering (SE) studies. SE researchers have applied opinion mining techniques in various contexts, such as identifying developers’ emotions expressed in code comments and extracting users’ critics toward mobile apps. Given the large amount of relevant studies available, it can take considerable time for researchers and developers to figure out which approaches they can adopt in their own studies and what perils these approaches entail. We conducted a systematic literature review involving 185 papers. More specifically, we present 1) well-defined categories of opinion mining-related software development activities, 2) available opinion mining approaches, whether they are evaluated when adopted in other studies, and how their performance is compared, 3) available datasets for performance evaluation and tool customization, and 4) concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques. The results of our study serve as references to choose suitable opinion mining tools for software development activities, and provide critical insights for the further development of opinion mining techniques in the SE domain

    Promoting data science in schools:Facilitating the use of open data and sensors in secondary education

    Get PDF
    corecore