160 research outputs found

    Building automated vandalism detection tools for Wikidata

    Full text link
    Wikidata, like Wikipedia, is a knowledge base that anyone can edit. This open collaboration model is powerful in that it reduces barriers to participation and allows a large number of people to contribute. However, it exposes the knowledge base to the risk of vandalism and low-quality contributions. In this work, we build on past work detecting vandalism in Wikipedia to detect vandalism in Wikidata. This work is novel in that identifying damaging changes in a structured knowledge-base requires substantially different feature engineering work than in a text-based wiki like Wikipedia. We also discuss the utility of these classifiers for reducing the overall workload of vandalism patrollers in Wikidata. We describe a machine classification strategy that is able to catch 89% of vandalism while reducing patrollers' workload by 98%, by drawing lightly from contextual features of an edit and heavily from the characteristics of the user making the edit

    Cartographic Vandalism in the Era of Location-Based Games—The Case of OpenStreetMap and Pokémon GO

    Get PDF
    User-generated map data is increasingly used by the technology industry for background mapping, navigation and beyond. An example is the integration of OpenStreetMap (OSM) data in widely-used smartphone and web applications, such as Pokémon GO (PGO), a popular augmented reality smartphone game. As a result of OSM’s increased popularity, the worldwide audience that uses OSM through external applications is directly exposed to malicious edits which represent cartographic vandalism. Multiple reports of obscene and anti-semitic vandalism in OSM have surfaced in popular media over the years. These negative news related to cartographic vandalism undermine the credibility of collaboratively generated maps. Similarly, commercial map providers (e.g., Google Maps and Waze) are also prone to carto-vandalism through their crowdsourcing mechanism that they may use to keep their map products up-to-date. Using PGO as an example, this research analyzes harmful edits in OSM that originate from PGO players. More specifically, this paper analyzes the spatial, temporal and semantic characteristics of PGO carto-vandalism and discusses how the mapping community handles it. Our findings indicate that most harmful edits are quickly discovered and that the community becomes faster at detecting and fixing these harmful edits over time. Gaming related carto-vandalism in OSM was found to be a short-term, sporadic activity by individuals, whereas the task of fixing vandalism is persistently pursued by a dedicated user group within the OSM community. The characteristics of carto-vandalism identified in this research can be used to improve vandalism detection systems in the future

    Proceedings of the Academic Track at State of the Map 2019 - Heidelberg (Germany), September 21-23, 2019

    Get PDF
    State of the Map featured a full day of academic talks. Building upon the motto of SotM 2019 in "Bridging the Map" the Academic Track session was aimed to provide the bridge to join together the experience, understanding, ideas, concepts and skills from different groups of researchers, academics and scientists from around the world. In particular, the Academic Track session was meant to build this bridge that connects members of the OpenStreetMap community and the academic community by providing an open passage for exchange of ideas, communication and opportunities for increased collaboration. These proceedings include 14 abstracts accepted as oral presentations and 6 abstracts presented as posters. Contributions were received from different academic fields, for example geography, remote sensing, computer and information sciences, geomatics, GIScience, the humanities and social sciences, and even from industry actors. We are particularly delighted to have included abstracts from both experienced researchers and students. Overall, it is our hope that these proceedings accurately showcase the ongoing innovation and maturity of scientific investigations and research into OpenStreetMap, showing how it as a research object converges multiple research areas together. Our aim is to show how the sum total of investigations of issues like Volunteered Geographic Information, geo-information, and geo-digital processes and representation shed light on the relations between crowds, real-world applications, technological developments, and scientific research

    A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web

    Full text link
    Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects

    Recent Developments and Future Trends in Volunteered Geographic Information Research: The Case of OpenStreetMap

    Get PDF
    User-generated content (UGC) platforms on the Internet have experienced a steep increase in data contributions in recent years. The ubiquitous usage of location-enabled devices, such as smartphones, allows contributors to share their geographic information on a number of selected online portals. The collected information is oftentimes referred to as volunteered geographic information (VGI). One of the most utilized, analyzed and cited VGI-platforms, with an increasing popularity over the past few years, is OpenStreetMap (OSM), whose main goal it is to create a freely available geographic database of the world. This paper presents a comprehensive overview of the latest developments in VGI research, focusing on its collaboratively collected geodata and corresponding contributor patterns. Additionally, trends in the realm of OSM research are discussed, highlighting which aspects need to be investigated more closely in the near future

    VEWS: A Wikipedia Vandal Early Warning System

    Full text link
    We study the problem of detecting vandals on Wikipedia before any human or known vandalism detection system reports flagging potential vandals so that such users can be presented early to Wikipedia administrators. We leverage multiple classical ML approaches, but develop 3 novel sets of features. Our Wikipedia Vandal Behavior (WVB) approach uses a novel set of user editing patterns as features to classify some users as vandals. Our Wikipedia Transition Probability Matrix (WTPM) approach uses a set of features derived from a transition probability matrix and then reduces it via a neural net auto-encoder to classify some users as vandals. The VEWS approach merges the previous two approaches. Without using any information (e.g. reverts) provided by other users, these algorithms each have over 85% classification accuracy. Moreover, when temporal recency is considered, accuracy goes to almost 90%. We carry out detailed experiments on a new data set we have created consisting of about 33K Wikipedia users (including both a black list and a white list of editors) and containing 770K edits. We describe specific behaviors that distinguish between vandals and non-vandals. We show that VEWS beats ClueBot NG and STiki, the best known algorithms today for vandalism detection. Moreover, VEWS detects far more vandals than ClueBot NG and on average, detects them 2.39 edits before ClueBot NG when both detect the vandal. However, we show that the combination of VEWS and ClueBot NG can give a fully automated vandal early warning system with even higher accuracy.Comment: To appear in Proceedings of the 21st ACM SIGKDD Conference of Knowledge Discovery and Data Mining (KDD 2015

    Quality Assessment of the Canadian OpenStreetMap Road Networks

    Get PDF
    Volunteered geographic information (VGI) has been applied in many fields such as participatory planning, humanitarian relief and crisis management because of its cost-effectiveness. However, coverage and accuracy of VGI cannot be guaranteed. OpenStreetMap (OSM) is a popular VGI platform that allows users to create or edit maps using GPS-enabled devices or aerial imageries. The issue of geospatial data quality in OSM has become a trending research topic because of the large size of the dataset and the multiple channels of data access. The objective of this study is to examine the overall reliability of the Canadian OSM data. A systematic review is first presented to provide details on the quality evaluation process of OSM. A case study of London, Ontario is followed as an experimental analysis of completeness, positional accuracy and attribute accuracy of the OSM street networks. Next, a national study of the Canadian OSM data assesses the overall semantic accuracy and lineage in addition to the quality measures mentioned above. Results of the quality evaluation are compared with associated OSM provenance metadata to examine potential correlations. The Canadian OSM road networks were found to have comparable accuracy with the tested commercial database (DMTI). Although statistical analysis suggests that there are no significant relations between OSM accuracy and its editing history, the study presents the complex processes behind OSM contributions possibly influenced by data import and remote mapping. The findings of this thesis can potentially guide cartographic product selection for interested parties and offer a better understanding of future quality improvement in OSM
    corecore