The Impact of Biases in the Crowdsourced Trajectories on the Output of Data Mining Processes

Abstract

The emergence of the Geoweb has provided an unprecedented capacity for generating and sharing digital content by professional and non- professional participants in the form of crowdsourcing projects, such as OpenStreetMap (OSM) or Wikimapia. Despite the success of such projects, the impacts of the inherent biases within the ‘crowd’ and/or the ‘crowdsourced’ data it produces are not well explored. In this paper we examine the impact of biased trajectory data on the output of spatio-temporal data mining process. To do so, an experiment was conducted. The biases are intentionally added to the input data; i.e. the input trajectories were divided into two sets of training and control datasets but not randomly (as opposed to the data mining procedures). They are divided by time of day and week, weather conditions, contributors’ gender and spatial and temporal density of trajectory in 1km grids. The accuracy of the predictive models are then measured (both for training and control data) and biases gradually moderated to see how the accuracy of the very same model is changing with respect to the biased input data. We show that the same data mining technique yields different results in terms of the nature of the clusters and identified attributes

    Similar works