186 research outputs found

    SLIM : Scalable Linkage of Mobility Data

    Get PDF
    We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup

    The Use of Mobility Data for Responding to the COVID-19 Pandemic

    Get PDF
    As the COVID-19 pandemic continues to upend the way people move, work, and gather, governments, businesses, and public health researchers have looked increasingly at mobility data to support pandemic response. This data, assets that describe human location and movement, generally has been collected for purposes directly related to a company's business model, including optimizing the delivery of consumer services, supply chain management or targeting advertisements. However, these call detail records, smartphone-mobility data, vehicle-derived GPS, and other mobility data assets can also be used to study patterns of movement. These patterns of movement have, in turn, been used by organizations to forecast disease spread and inform decisions on how to best manage activity in certain locations.Researchers at The GovLab and Cuebiq, supported by the Open Data Institute, identified 51 notable projects from around the globe launched by public sector and research organizations with companies that use mobility data for these purposes. It curated five projects among this listing that highlight the specific opportunities (and risks) presented by using this asset. Though few of these highlighted projects have provided public outputs that make assessing project success difficult, organizations interviewed considered mobility data to be a useful asset that enabled better public health surveillance, supported existing decision-making processes, or otherwise allowed groups to achieve their research goals.The report below summarizes some of the major points identified in those case studies. While acknowledging that location data can be a highly sensitive data type that can facilitate surveillance or expose data subjects if used carelessly, it finds mobility data can support research and inform decisions when applied toward narrowly defined research questions through frameworks that acknowledge and proactively mitigate risk. These frameworks can vary based on the individual circumstances facing data users, suppliers, and subjects. However, there are a few conditions that can enable users and suppliers to promote publicly beneficial and responsible data use and overcome the serious obstacles facing them.For data users (governments and research institutions), functional access to real-time and contextually relevant data can support research goals, even though a lack of data science competencies and both short and long-term funding sources represent major obstacles for this goal. Data suppliers (largely companies), meanwhile, need governance structures and mechanisms that facilitate responsible re-use, including data re-use agreements that define who, what, where, and when, and under what conditions data can be shared. A lack of regulatory clarity and the absence of universal governance and privacy standards have impeded effective and responsible dissemination of mobility for research and humanitarian purposes. Finally, for both data users and suppliers, we note that collaborative research networks that allow organizations to seek out and provide data can serve as enablers of project success by facilitating exchange of methods and resources, and closing the gap between research and practice.Based on these findings, we recommend the development of clear governance and privacy frameworks, increased capacity building around data use within the public sector, and more regular convenings of ecosystem stakeholders (including the public and data subjects) to broaden collaborative networks. We also propose solutions towards making the responsible use of mobility data more sustainable for longterm impact beyond the current pandemic. A failure to develop regulatory and governance frameworks that can responsibly manage mobility data could lead to a regression to the ad hoc and uncoordinated approaches that previously defined mobility data applications. It could also lead to disparate standards about organizations' responsibilities to the public

    Developing Travel Behaviour Models Using Mobile Phone Data

    Get PDF
    Improving the performance and efficiency of transport systems requires sound decision-making supported by data and models. However, conducting travel surveys to facilitate travel behaviour model estimation is an expensive venture. Hence, such surveys are typically infrequent in nature, and cover limited sample sizes. Furthermore, the quality of such data is often affected by reporting errors and changes in the respondents’ behaviour due to awareness of being observed. On the other hand, large and diverse quantities of time-stamped location data are nowadays passively generated as a by-product of technological growth. These passive data sources include Global Positioning System (GPS) traces, mobile phone network records, smart card data and social media data, to name but a few. Among these, mobile phone network records (i.e. call detail records (CDRs) and Global Systems for Mobile Communication (GSM) data) offer the biggest promise due to the increasing mobile phone penetration rates in both the developed and the developing worlds. Previous studies using mobile phone data have primarily focused on extracting travel patterns and trends rather than establishing mathematical relationships between the observed behaviour and the causal factors to predict the travel behaviour in alternative policy scenarios. This research aims to extend the application of mobile phone data to travel behaviour modelling and policy analysis by augmenting the data with information derived from other sources. This comes along with significant challenges stemming from the anonymous and noisy nature of the data. Consequently, novel data fusion and modelling frameworks have been developed and tested for different modelling scenarios to demonstrate the potential of this emerging low-cost data source. In the context of trip generation, a hybrid modelling framework has been developed to account for the anonymous nature of CDR data. This involves fusing the CDR and demographic data of a sub-sample of the users to estimate a demographic prediction sub-model based on phone usage variables extracted from the data. The demographic group membership probabilities from this model are then used as class weights in a latent class model for trip generation based on trip rates extracted from the GSM data of the same users. Once estimated, the hybrid model can be applied to probabilistically infer the socio-demographics, and subsequently, the trip generation of a large proportion of the population where only large-scale anonymous CDR data is available as an input. The estimation and validation results using data from Switzerland show that the hybrid model competes well against a typical trip generation model estimated using data with known socio-demographics of the users. The hybrid framework can be applied to other travel behaviour modelling contexts using CDR data (in mode or route choice for instance). The potential of CDR data to capture rational route choice behaviour for long-distance inter-regional O-D pairs (joined by highly overlapping routes) is demonstrated through data fusion with information on the attributes of the alternatives extracted from multiple external sources. The effect of location discontinuities in CDR data (due to its event-driven nature), and how this impacts the ability to observe the users’ trajectories in a highly overlapping network is discussed prompting the development of a route identification algorithm that distinguishes between unique and broad sub-group route choices. The broad choice framework, which was developed in the context of vehicle type choice is then adapted to leverage this limitation where unique route choices cannot be observed for some users, and only the broad sub-groups of the possible overlapping routes are identifiable. The estimation and validation results using data from Senegal show that CDR data can capture rational route choice behaviour, as well as reasonable value of travel time estimates. Still relying on data fusion, a novel method based on the mixed logit framework is developed to enable the analysis of departure time choice behaviour using passively collected data (GSM and GPS data) where the challenge is to deal with the lack of information on the desired times of travel. The proposed method relies on data fusion with travel time information extracted from Google Maps in the context of Switzerland. It is unique in the sense that it allows the modeller to understand the sensitivity attached to schedule delay, thus enabling its valuation, despite the passive nature of the data. The model results are in line with the expected travel behaviour, and the schedule delay valuation estimates are reasonable for the study area. Finally, a joint trip generation modelling framework fusing CDR, household travel survey, and census data is developed. The framework adjusts the scaling factors of a traditional trip generation model (based on household travel survey data only) to optimise model performance at both the disaggregate and aggregate levels. The framework is calibrated using data from Bangladesh and the adjusted models are found to have better spatial and temporal transferability. Thus, besides demonstrating the potential of mobile phone data, the thesis makes significant methodological and applied contributions. The use of different datasets provides rich insights that can inform policy measures related to the adoption of big data for transport studies. The research findings are particularly timely for transport agencies and practitioners working in contexts with severe data limitations (especially in developing countries), as well as academics generally interested in exploring the potential of emerging big data sources, both in transport and beyond

    Privacy in trajectory micro-data publishing : a survey

    Get PDF
    We survey the literature on the privacy of trajectory micro-data, i.e., spatiotemporal information about the mobility of individuals, whose collection is becoming increasingly simple and frequent thanks to emerging information and communication technologies. The focus of our review is on privacy-preserving data publishing (PPDP), i.e., the publication of databases of trajectory micro-data that preserve the privacy of the monitored individuals. We classify and present the literature of attacks against trajectory micro-data, as well as solutions proposed to date for protecting databases from such attacks. This paper serves as an introductory reading on a critical subject in an era of growing awareness about privacy risks connected to digital services, and provides insights into open problems and future directions for research.Comment: Accepted for publication at Transactions for Data Privac

    Towards matching user mobility traces in large-scale datasets

    Get PDF
    The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people's mobility traces, coming from a mobile network operator and transportation smart card usage. We extract the relevant statistical properties which influence the matching process and analyze their impact on the matchability of users. We show that for individuals with typical activity in the transportation system (those making 3-4 trips per day on average), a matching algorithm based on the co-occurrence of their activities is expected to achieve a 16.8% success only after a one-week long observation of their mobility traces, and over 55% after four weeks. We show that the main determinant of matchability is the expected number of co-occurring records in the two datasets. Finally, we discuss different scenarios in terms of data collection frequency and give estimates of matchability over time. We show that with higher frequency data collection becoming more common, we can expect much higher success rates in even shorter intervals

    Evidence and future potential of mobile phone data for disease disaster management

    Get PDF
    This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this record.Global health threats such as the recent Ebola and Zika virus outbreaks require rapid and robust responses to prevent, reduce and recover from disease dispersion. As part of broader big data and digital humanitarianism discourses, there is an emerging interest in data produced through mobile phone communications for enhancing the data environment in such circumstances. This paper assembles user perspectives and critically examines existing evidence and future potential of mobile phone data derived from call detail records (CDRs) and two-way short message service (SMS) platforms, for managing and responding to humanitarian disasters caused by communicable disease outbreaks. We undertake a scoping review of relevant literature and in-depth interviews with key informants to ascertain the: (i) information that can be gathered from CDRs or SMS data; (ii) phase(s) in the disease disaster management cycle when mobile data may be useful; (iii) value added over conventional approaches to data collection and transfer; (iv) barriers and enablers to use of mobile data in disaster contexts; and (v) the social and ethical challenges. Based on this evidence we develop a typology of mobile phone data sources, types, and end-uses, and a decision-tree for mobile data use, designed to enable effective use of mobile data for disease disaster management. We show that mobile data holds great potential for improving the quality, quantity and timing of selected information required for disaster management, but that testing and evaluation of the benefits, constraints and limitations of mobile data use in a wider range of mobile-user and disaster contexts is needed to fully understand its utility, validity, and limitations.A portion of this research was funded as part of the Science for Humanitarian Emergencies and Resilience (SHEAR) programme, by the UK Department for International Development (DFID), the Natural Environment Research Council (NERC) and the Economic and Social Research Council (ESRC)

    Where You Are Is What You Do: On Inferring Offline Activities From Location Data

    Full text link
    In this paper we investigate the ability of modern machine learning algorithms in inferring basic offline activities,~e.g., shopping and dining, from location data. Using anonymized data of thousands of users of a prominent location-based social network, we empirically demonstrate that not only state-of-the-art machine learning excels at the task at hand~(F1 score>0.9) but also tabular models are among the best performers. The findings we report here not only fill an existing gap in the literature, but also highlight the potential risks of such capabilities given the ubiquity of location data and the high accessibility of tabular machine learning models.Comment: Accepted to IEEE ICDM Workshops 202
    • …
    corecore