How average is average? : Temporal patterns and variability in mobile phone data

Abstract

Mobile phone data – with file sizes scaling into terabytes – easily overwhelm the computational capacity available to some researchers. Moreover, for ethical reasons, data access is often granted only to particular subsets, restricting analyses to cover single days, weeks, or geographical areas. Consequently, it is frequently impossible to set a particular analysis or event in its context and know how typical it is, compared to other days, weeks or months. This is important for academic referees questioning research on mobile phone data and for the analysts in deciding how to sample, how much data to process, and which events are anomalous. All these issues require an understanding of variability in Big Data to answer the question of how average is average? This paper provides a method, using a large mobile phone dataset, to answer these basic but necessary questions. We show that file size is a robust proxy for the activity level of phone users by profiling the temporal variability of the data at an hourly, daily and monthly level. We then apply time-series analysis to isolate temporal periodicity. Finally, we discuss confidence limits to anomalous events in the data. We recommend an analytical approach to mobile phone data selection which suggests that ideally data should be sampled across days, across working weeks, and across the year, to obtain a representative average. However, where this is impossible, the temporal variability is such that specific weekdays’ data can provide a fair picture of other days in their general structure.

    Similar works

    Full text

    thumbnail-image