160 research outputs found
Building automated vandalism detection tools for Wikidata
Wikidata, like Wikipedia, is a knowledge base that anyone can edit. This open
collaboration model is powerful in that it reduces barriers to participation
and allows a large number of people to contribute. However, it exposes the
knowledge base to the risk of vandalism and low-quality contributions. In this
work, we build on past work detecting vandalism in Wikipedia to detect
vandalism in Wikidata. This work is novel in that identifying damaging changes
in a structured knowledge-base requires substantially different feature
engineering work than in a text-based wiki like Wikipedia. We also discuss the
utility of these classifiers for reducing the overall workload of vandalism
patrollers in Wikidata. We describe a machine classification strategy that is
able to catch 89% of vandalism while reducing patrollers' workload by 98%, by
drawing lightly from contextual features of an edit and heavily from the
characteristics of the user making the edit
Cartographic Vandalism in the Era of Location-Based Games—The Case of OpenStreetMap and Pokémon GO
User-generated map data is increasingly used by the technology industry for background mapping, navigation and beyond. An example is the integration of OpenStreetMap (OSM) data in widely-used smartphone and web applications, such as Pokémon GO (PGO), a popular augmented reality smartphone game. As a result of OSM’s increased popularity, the worldwide audience that uses OSM through external applications is directly exposed to malicious edits which represent cartographic vandalism. Multiple reports of obscene and anti-semitic vandalism in OSM have surfaced in popular media over the years. These negative news related to cartographic vandalism undermine the credibility of collaboratively generated maps. Similarly, commercial map providers (e.g., Google Maps and Waze) are also prone to carto-vandalism through their crowdsourcing mechanism that they may use to keep their map products up-to-date. Using PGO as an example, this research analyzes harmful edits in OSM that originate from PGO players. More specifically, this paper analyzes the spatial, temporal and semantic characteristics of PGO carto-vandalism and discusses how the mapping community handles it. Our findings indicate that most harmful edits are quickly discovered and that the community becomes faster at detecting and fixing these harmful edits over time. Gaming related carto-vandalism in OSM was found to be a short-term, sporadic activity by individuals, whereas the task of fixing vandalism is persistently pursued by a dedicated user group within the OSM community. The characteristics of carto-vandalism identified in this research can be used to improve vandalism detection systems in the future
Proceedings of the Academic Track at State of the Map 2019 - Heidelberg (Germany), September 21-23, 2019
State of the Map featured a full day of academic talks. Building upon the motto of SotM 2019 in "Bridging the Map" the Academic Track session was aimed to provide the bridge to join together the experience, understanding, ideas, concepts and skills from different groups of researchers, academics and scientists from around the world. In particular, the Academic Track session was meant to build this bridge that connects members of the OpenStreetMap community and the academic community by providing an open passage for exchange of ideas, communication and opportunities for increased collaboration. These proceedings include 14 abstracts accepted as oral presentations and 6 abstracts presented as posters. Contributions were received from different academic fields, for example geography, remote sensing, computer and information sciences, geomatics, GIScience, the humanities and social sciences, and even from industry actors. We are particularly delighted to have included abstracts from both experienced researchers and students. Overall, it is our hope that these proceedings accurately showcase the ongoing innovation and maturity of scientific investigations and research into OpenStreetMap, showing how it as a research object converges multiple research areas together. Our aim is to show how the sum total of investigations of issues like Volunteered Geographic Information, geo-information, and geo-digital processes and representation shed light on the relations between crowds, real-world applications, technological developments, and scientific research
A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web
Over the past decade, rapid advances in web technologies, coupled with
innovative models of spatial data collection and consumption, have generated a
robust growth in geo-referenced information, resulting in spatial information
overload. Increasing 'geographic intelligence' in traditional text-based
information retrieval has become a prominent approach to respond to this issue
and to fulfill users' spatial information needs. Numerous efforts in the
Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the
Linking Open Data initiative have converged in a constellation of open
knowledge bases, freely available online. In this article, we survey these open
knowledge bases, focusing on their geospatial dimension. Particular attention
is devoted to the crucial issue of the quality of geo-knowledge bases, as well
as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic
Network, is outlined as our contribution to this area. Research directions in
information integration and Geographic Information Retrieval (GIR) are then
reviewed, with a critical discussion of their current limitations and future
prospects
Recent Developments and Future Trends in Volunteered Geographic Information Research: The Case of OpenStreetMap
User-generated content (UGC) platforms on the Internet have experienced a steep increase in data contributions in recent years. The ubiquitous usage of location-enabled devices, such as smartphones, allows contributors to share their geographic information on a number of selected online portals. The collected information is oftentimes referred to as volunteered geographic information (VGI). One of the most utilized, analyzed and cited VGI-platforms, with an increasing popularity over the past few years, is OpenStreetMap (OSM), whose main goal it is to create a freely available geographic database of the world. This paper presents a comprehensive overview of the latest developments in VGI research, focusing on its collaboratively collected geodata and corresponding contributor patterns. Additionally, trends in the realm of OSM research are discussed, highlighting which aspects need to be investigated more closely in the near future
VEWS: A Wikipedia Vandal Early Warning System
We study the problem of detecting vandals on Wikipedia before any human or
known vandalism detection system reports flagging potential vandals so that
such users can be presented early to Wikipedia administrators. We leverage
multiple classical ML approaches, but develop 3 novel sets of features. Our
Wikipedia Vandal Behavior (WVB) approach uses a novel set of user editing
patterns as features to classify some users as vandals. Our Wikipedia
Transition Probability Matrix (WTPM) approach uses a set of features derived
from a transition probability matrix and then reduces it via a neural net
auto-encoder to classify some users as vandals. The VEWS approach merges the
previous two approaches. Without using any information (e.g. reverts) provided
by other users, these algorithms each have over 85% classification accuracy.
Moreover, when temporal recency is considered, accuracy goes to almost 90%. We
carry out detailed experiments on a new data set we have created consisting of
about 33K Wikipedia users (including both a black list and a white list of
editors) and containing 770K edits. We describe specific behaviors that
distinguish between vandals and non-vandals. We show that VEWS beats ClueBot NG
and STiki, the best known algorithms today for vandalism detection. Moreover,
VEWS detects far more vandals than ClueBot NG and on average, detects them 2.39
edits before ClueBot NG when both detect the vandal. However, we show that the
combination of VEWS and ClueBot NG can give a fully automated vandal early
warning system with even higher accuracy.Comment: To appear in Proceedings of the 21st ACM SIGKDD Conference of
Knowledge Discovery and Data Mining (KDD 2015
Quality Assessment of the Canadian OpenStreetMap Road Networks
Volunteered geographic information (VGI) has been applied in many fields such as participatory planning, humanitarian relief and crisis management because of its cost-effectiveness. However, coverage and accuracy of VGI cannot be guaranteed. OpenStreetMap (OSM) is a popular VGI platform that allows users to create or edit maps using GPS-enabled devices or aerial imageries. The issue of geospatial data quality in OSM has become a trending research topic because of the large size of the dataset and the multiple channels of data access. The objective of this study is to examine the overall reliability of the Canadian OSM data. A systematic review is first presented to provide details on the quality evaluation process of OSM. A case study of London, Ontario is followed as an experimental analysis of completeness, positional accuracy and attribute accuracy of the OSM street networks. Next, a national study of the Canadian OSM data assesses the overall semantic accuracy and lineage in addition to the quality measures mentioned above. Results of the quality evaluation are compared with associated OSM provenance metadata to examine potential correlations. The Canadian OSM road networks were found to have comparable accuracy with the tested commercial database (DMTI). Although statistical analysis suggests that there are no significant relations between OSM accuracy and its editing history, the study presents the complex processes behind OSM contributions possibly influenced by data import and remote mapping. The findings of this thesis can potentially guide cartographic product selection for interested parties and offer a better understanding of future quality improvement in OSM
- …