    Depicting urban boundaries from a mobility network of spatial interactions: A case study of Great Britain with geo-located Twitter data

    Existing urban boundaries are usually defined by government agencies for administrative, economic, and political purposes. Defining urban boundaries that consider socio-economic relationships and citizen commute patterns is important for many aspects of urban and regional planning. In this paper, we describe a method to delineate urban boundaries based upon human interactions with physical space inferred from social media. Specifically, we depicted the urban boundaries of Great Britain using a mobility network of Twitter user spatial interactions, which was inferred from over 69 million geo-located tweets. We define the non-administrative anthropographic boundaries in a hierarchical fashion based on different physical movement ranges of users derived from the collective mobility patterns of Twitter users in Great Britain. The results of strongly connected urban regions in the form of communities in the network space yield geographically cohesive, non-overlapping urban areas, which provide a clear delineation of the non-administrative anthropographic urban boundaries of Great Britain. The method was applied to both national (Great Britain) and municipal scales (the London metropolis). While our results corresponded well with the administrative boundaries, many unexpected and interesting boundaries were identified. Importantly, as the depicted urban boundaries exhibited a strong instance of spatial proximity, we employed a gravity model to understand the distance decay effects in shaping the delineated urban boundaries. The model explains how geographical distances found in the mobility patterns affect the interaction intensity among different non-administrative anthropographic urban areas, which provides new insights into human spatial interactions with urban space.Comment: 32 pages, 7 figures, International Journal of Geographic Information Scienc

    Identifying Hidden Visits from Sparse Call Detail Record Data

    Despite a large body of literature on trip inference using call detail record (CDR) data, a fundamental understanding of their limitations is lacking. In particular, because of the sparse nature of CDR data, users may travel to a location without being revealed in the data, which we refer to as a "hidden visit". The existence of hidden visits hinders our ability to extract reliable information about human mobility and travel behavior from CDR data. In this study, we propose a data fusion approach to obtain labeled data for statistical inference of hidden visits. In the absence of complementary data, this can be accomplished by extracting labeled observations from more granular cellular data access records, and extracting features from voice call and text messaging records. The proposed approach is demonstrated using a real-world CDR dataset of 3 million users from a large Chinese city. Logistic regression, support vector machine, random forest, and gradient boosting are used to infer whether a hidden visit exists during a displacement observed from CDR data. The test results show significant improvement over the naive no-hidden-visit rule, which is an implicit assumption adopted by most existing studies. Based on the proposed model, we estimate that over 10% of the displacements extracted from CDR data involve hidden visits. The proposed data fusion method offers a systematic statistical approach to inferring individual mobility patterns based on telecommunication records

    Evidence and future potential of mobile phone data for disease disaster management

    This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this record.Global health threats such as the recent Ebola and Zika virus outbreaks require rapid and robust responses to prevent, reduce and recover from disease dispersion. As part of broader big data and digital humanitarianism discourses, there is an emerging interest in data produced through mobile phone communications for enhancing the data environment in such circumstances. This paper assembles user perspectives and critically examines existing evidence and future potential of mobile phone data derived from call detail records (CDRs) and two-way short message service (SMS) platforms, for managing and responding to humanitarian disasters caused by communicable disease outbreaks. We undertake a scoping review of relevant literature and in-depth interviews with key informants to ascertain the: (i) information that can be gathered from CDRs or SMS data; (ii) phase(s) in the disease disaster management cycle when mobile data may be useful; (iii) value added over conventional approaches to data collection and transfer; (iv) barriers and enablers to use of mobile data in disaster contexts; and (v) the social and ethical challenges. Based on this evidence we develop a typology of mobile phone data sources, types, and end-uses, and a decision-tree for mobile data use, designed to enable effective use of mobile data for disease disaster management. We show that mobile data holds great potential for improving the quality, quantity and timing of selected information required for disaster management, but that testing and evaluation of the benefits, constraints and limitations of mobile data use in a wider range of mobile-user and disaster contexts is needed to fully understand its utility, validity, and limitations.A portion of this research was funded as part of the Science for Humanitarian Emergencies and Resilience (SHEAR) programme, by the UK Department for International Development (DFID), the Natural Environment Research Council (NERC) and the Economic and Social Research Council (ESRC)

    Cell Towers as Urban Sensors: Understanding the Strengths and Limitations of Mobile Phone Location Data

    Understanding urban dynamics and human mobility patterns not only benefits a wide range of real-world applications (e.g., business site selection, public transit planning), but also helps address many urgent issues caused by the rapid urbanization processes (e.g., population explosion, congestion, pollution). In the past few years, given the pervasive usage of mobile devices, call detail records collected by mobile network operators has been widely used in urban dynamics and human mobility studies. However, the derived knowledge might be strongly biased due to the uneven distribution of people’s phone communication activities in space and time. This dissertation research applies different analytical methods to better understand human activity and urban environment, as well as their interactions, mainly based on a new type of data source: actively tracked mobile phone location data. In particular, this dissertation research achieves three main research objectives. First, this research develops visualization and analysis approaches to uncover hidden urban dynamics patterns from actively tracked mobile phone location data. Second, this research designs quantitative methods to evaluate the representativeness issue of call detail record data. Third, this research develops an appropriate approach to evaluate the performance of different types of tracking data in urban dynamics research. The major contributions of this dissertation research include: 1) uncovering the dynamics of stay/move activities and distance decay effects, and the changing human mobility patterns based on several mobility indicators derived from actively tracked mobile phone location data; 2) taking the first step to evaluate the representativeness and effectiveness of call detail record and revealing its bias in human mobility research; and 3) extracting and comparing urban-level population movement patterns derived from three different types of tracking data as well as their pros and cons in urban population movement analysis

    Using iPhone Significant Location Data to Improve Air Pollution Exposure Estimation

    An accurate estimation of human exposure to ambient air pollution is crucial for air pollution health studies. Time-activity patterns may introduce substantial uncertainties in exposure estimation. As smartphones are becoming increasingly popular and their ownership is becoming ubiquitous in the US. Virtually all smartphones can collect location data, and such data is continuously somewhere. Therefore, it is clear that such stored location data has the potential to be used for characterizing an individual\u27s time-activity patterns for air pollution health studies. However, studies on the accuracy and feasibility of using a smartphone\u27s location data in air pollution exposure estimation are still limited. Here, a pilot study was conducted to evaluate the accuracy of the iPhone\u27s Significant Location (iSL) data, in capturing an individual\u27s time-activity patterns. Specifically, iSL data collected from a single individual were compared with reference GPS data to evaluate the ability of iSL in capturing: 1) all microenvironments the subject visited during the study period; 2) the duration and frequency the subject spent in each microenvironment, if the location is labelled as significant and captured by iSL; and 3) the impact of neglecting time-activity pattern on the subject\u27s air pollution exposure estimates. The results showed a favorable performance of the iSL data, which accurately captured the time the subject spent in 16 microenvironments encompassing 93% of all time during the study period. To further understand the availability of iSL data, an online survey was conducted among 349 participants. Among the surveyed users, 72% have iSL data available which highlighted the potential substantial coverage of iSL data. With the popularity of iPhone, detailed significant location data could be available for a considerable portion of the population, and such iSL data may have great potentials for improving retrospective air pollution exposure estimation

    How much data is enough to track tourists? The tradeoff between data granularity and storage costs

    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceIn the increasingly technology-dependent world, data is one of the key strategic resources for organizations. Often, the challenge that many decision-makers face is to determine which data and how much to collect, and what needs to be kept in their data storage. The challenge is to preserve enough information to inform decisions but doing so without overly high costs of storage and data processing cost. In this thesis, this challenge is studied in the context of a collection of mobile signaling data for studying tourists’ behavioral patterns. Given the number of mobile phones in use, and frequency of their interaction with network infrastructure and location reporting, mobile data sets represent a rich source of information for mobility studies. The objective of this research is to analyze to what extent can individual trajectories be reconstructed if only a fraction of the original location data is preserved, providing insights about the tradeoff between the volume of data available and the accuracy of reconstructed paths. To achieve this, a signaling data of 277,093 anonymized foreign travelers is sampled with different sampling rates, and the full trajectories are reconstructed, using the last seen, linear, and cubic interpolations completion methods. The results of the comparison are discussed from the perspective of data management and implications on the research, especially the results of research with lower time-density mobile phone data

    Comparing Regional Patterns of Individual Movement Using Corrected Mobility Entropy

    In this paper, we propose a correction of the Mobility Entropy indicator (ME) used to describe the diversity of individual movement patterns as can be captured by data from mobile phones. We argue that a correction is necessary because standard calculations of ME show a structural dependency on the geographical density of observation points, rendering results biased and comparisons between regions incorrect. As a solution, we propose the Corrected Mobility Entropy (CME). We apply our solution to a French mobile phone dataset with ∼18.5 million users. Results show CME to be less correlated to cell-tower density (r = –0.17 instead of –0.59 for ME). As a spatial pattern of mobility diversity, we find CME values to be higher in suburban regions compared to their related urban centers, while both decrease considerably with lowering urban center sizes. Based on regression models, we find mobility diversity to relate to factors like income and employment. Additionally, using CME reveals the role of car use in relation to land use, which was not recognized when using ME values. Our solution enables a better description of individual mobility at a large scale, which has applications in official statistics, urban planning and policy, and mobility research

    Uncovering population dynamics using mobile phone data : the case of Helsinki Metropolitan Area

    Understanding the whereabouts of people in time and space is necessary for unraveling how our societies function. Regardless, our understanding of human presence is predominantly based on static residential population data, which is often outdated and excludes certain population groups, such as commuters or tourists. In the light of development towards 24-hour societies and the needs for promoting sustainable and equitable urban planning, reliable data of population dynamics are needed. To this end, ubiquitous mobile phones provide an attractive source for estimating the spatiotemporal digital footprints of people. In this study, I set out to investigate 1) the feasibility of three different aggregated network-based mobile phone data – the number of voice calls, data transmission and general network connection attempts – as a proxy for human presence, 2) how does the population distribution vary in Helsinki Metropolitan Area over the course of a regular weekday and 3) the role of temporally-sensitive population data when analysing dynamic accessibility to grocery stores and transport hubs. To my best knowledge, this is the first attempt when mobile phone data is used to reveal population dynamics for scientific purposes in Finland. Mobile phone data collected by the mobile network operator Elisa in 2017–2018 and ancillary data about land cover, buildings and a time use survey were used to estimate the 24-hour population distribution of the Helsinki Metropolitan Area. The mobile phone data were allocated to statistical 250 m x 250 m grid cells using an advanced dasymetric interpolation method and validated against population register data from Statistics Finland. The resulting 24-hour population was used to map the pulse of the city and to introduce the first fully dynamic accessibility model in the study area. The results show that data use is a good proxy for people and outperforms voice calls or overall network connection attempts. During daytime, the static population overestimates the population in residential areas and underestimates the population in work and service areas. In general, the 24-hour population reveals the pulse of a city, which is highlighted especially in the inner city of Helsinki, where the relative share of population of the study area increases by 50 % from the share at night-time to its peak at noon. The results of the case study suggest that integrating dynamic population data to location-based accessibility analysis provides more realistic results compared to static population data, but the significance of dynamic population data depends on the study context and research questions. In summary, aggregated network-driven mobile phone data is a feasible alternative for dynamic population modelling, however, different mobile phone data types vary in representativeness, which should be taken into account when using mobile phone data in research. To this end, critical evaluation of data and transparent data description are essential. Overall, understanding 24-hour societies and supporting sustainable urban planning necessitates dynamic population data, but advancements in data policy and availability are needed to harvest these possibilities. The results of this study also provide new empirical insights of the population dynamics in the study area, which can be used to advance planning and decision making.Ymmärrys väestön alueellisen jakautumisen ajallisesta vaihtelusta on keskeistä yhteiskuntamme toiminnan ymmärtämiseksi. Tästä huolimatta ymmärrys ihmisten läsnäolosta on vähäistä ja perustuu pääasiassa staattisiin asuinpaikkakohtaisiin väestötietoihin, jotka ovat usein vanhentuneita ja saattavat johtaa eräiden väestöryhmien, kuten työmatkalaisten tai turistien, sivuuttamiseen. Kehityksen kohti ympärivuorokautista yhteiskuntaa ja kestävän ja tasa-arvoisen kaupunkisuunnittelun edistämisen tarpeiden valossa tarvitaan luotettavia tietoja väestön dynamiikasta. Tässä tutkimuksessa tarkastelin 1) kolmen eri verkkopohjaisen matkapuhelinaineiston – puheluiden, tiedonsiirtoyhteyksien ja verkkoyhteyksien muodostusyritysten lukumäärän – soveltuvuutta ihmisen läsnäolon kuvaajana, 2) miten väestöjakauma vaihtelee pääkaupunkiseudulla säännöllisen arkipäivän aikana ja 3) temporaalisten väestötietojen käytön roolia saavutettavuusmallinnuksessa tarkasteltaessa ruokakauppojen ja liikenteen solmukohtien saavutettavuutta joukkoliikenteellä. Parhaan tietämykseni mukaan tämä on ensimmäinen kerta, kun matkapuhelinaineistoja käytetään väestön dynamiikan tarkasteluun tieteellisiin tarkoituksiin Suomessa. Matkapuhelinoperaattori Elisan keräämiä matkapuhelinaineistoja (2017–2018) sekä aineistoja maankäytöstä, rakennuksista ja ajankäyttötutkimuksen tuloksia käytettiin pääkaupunkiseudun 24 tunnin väestöjakauman arvioimiseen. Matkapuhelimen tiedot allokoitiin 250 m x 250 m tilastoruutuihin käyttäen edistynyttä dasymetristä interpolointimenetelmää ja validoitiin Tilastokeskuksen väestörekisteritietoja käyttäen. Tuloksena saatua 24 tunnin väestöaineistoa käytettiin kaupungin pulssin analysointiin ja ensimmäisen täysin dynaamisen saavutettavuusmallin toteuttamiseen tutkimusalueella. Tutkimuksen tulokset osoittavat, että matkapuhelinten tiedonsiirto on hyvä kuvaaja ihmisten sijainnille ja parempi kuin puhelut tai verkkoyhteyksien muodostusyritykset. Päivän aikana staattinen väestöaineisto yliarvioi väestöä erityisesti asuinalueilla samalla aliarvioiden väestöä alueilla, joilla on työpaikka- tai palvelukeskittymiä. Yleisesti katsottuna 24 tunnin väestö paljastaa kaupungin pulssin, mikä korostuu erityisesti Helsingin keskustassa, jossa tutkimusalueen väestön suhteellinen osuus kasvaa 50 %:lla yöstä sen huippuun keskipäivällä. Tapaustutkimuksen tulokset havainnollistavat kuinka dynaamisen väestötietojen integroiminen sijaintipohjaiseen saavutettavuustarkasteluun tarjoaa realistisempia tuloksia verrattuna staattiseen väestöaineistoon, mutta dynaamisten väestötietojen integroimisen merkitys riippuu tutkimuksen kontekstista ja tutkimuskysymyksistä. Yhteenvetona voidaan todeta, että aggregoitu verkkopohjainen matkapuhelinaineisto on hyvä vaihtoehto dynaamisen väestön mallintamiseen, mutta soveltuvuus vaihtelee aineistojen välillä, mikä on tärkeä huomioida käytettäessä matkapuhelinaineistoja tutkimuksessa. Tätä vasten aineiston kriittinen tarkastelu ja läpinäkyvä aineiston dokumentointi on olennaista. Kaiken kaikkiaan 24 tunnin yhteiskuntien ymmärtäminen ja kestävän kaupunkisuunnittelun tukeminen edellyttävät dynaamisia väestötietoja, mutta tietopolitiikan ja aineistojen saatavuuden edistäminen on välttämätöntä tämän toteutumiseksi. Tämä työ tarjoaa myös uutta empiiristä tietoa väestön dynamiikasta pääkaupunkiseudulla, jota voidaan käyttää suunnittelun ja päätöksenteon tukena