thesis

Developing Travel Behaviour Models Using Mobile Phone Data

Abstract

Improving the performance and efficiency of transport systems requires sound decision-making supported by data and models. However, conducting travel surveys to facilitate travel behaviour model estimation is an expensive venture. Hence, such surveys are typically infrequent in nature, and cover limited sample sizes. Furthermore, the quality of such data is often affected by reporting errors and changes in the respondents’ behaviour due to awareness of being observed. On the other hand, large and diverse quantities of time-stamped location data are nowadays passively generated as a by-product of technological growth. These passive data sources include Global Positioning System (GPS) traces, mobile phone network records, smart card data and social media data, to name but a few. Among these, mobile phone network records (i.e. call detail records (CDRs) and Global Systems for Mobile Communication (GSM) data) offer the biggest promise due to the increasing mobile phone penetration rates in both the developed and the developing worlds. Previous studies using mobile phone data have primarily focused on extracting travel patterns and trends rather than establishing mathematical relationships between the observed behaviour and the causal factors to predict the travel behaviour in alternative policy scenarios. This research aims to extend the application of mobile phone data to travel behaviour modelling and policy analysis by augmenting the data with information derived from other sources. This comes along with significant challenges stemming from the anonymous and noisy nature of the data. Consequently, novel data fusion and modelling frameworks have been developed and tested for different modelling scenarios to demonstrate the potential of this emerging low-cost data source. In the context of trip generation, a hybrid modelling framework has been developed to account for the anonymous nature of CDR data. This involves fusing the CDR and demographic data of a sub-sample of the users to estimate a demographic prediction sub-model based on phone usage variables extracted from the data. The demographic group membership probabilities from this model are then used as class weights in a latent class model for trip generation based on trip rates extracted from the GSM data of the same users. Once estimated, the hybrid model can be applied to probabilistically infer the socio-demographics, and subsequently, the trip generation of a large proportion of the population where only large-scale anonymous CDR data is available as an input. The estimation and validation results using data from Switzerland show that the hybrid model competes well against a typical trip generation model estimated using data with known socio-demographics of the users. The hybrid framework can be applied to other travel behaviour modelling contexts using CDR data (in mode or route choice for instance). The potential of CDR data to capture rational route choice behaviour for long-distance inter-regional O-D pairs (joined by highly overlapping routes) is demonstrated through data fusion with information on the attributes of the alternatives extracted from multiple external sources. The effect of location discontinuities in CDR data (due to its event-driven nature), and how this impacts the ability to observe the users’ trajectories in a highly overlapping network is discussed prompting the development of a route identification algorithm that distinguishes between unique and broad sub-group route choices. The broad choice framework, which was developed in the context of vehicle type choice is then adapted to leverage this limitation where unique route choices cannot be observed for some users, and only the broad sub-groups of the possible overlapping routes are identifiable. The estimation and validation results using data from Senegal show that CDR data can capture rational route choice behaviour, as well as reasonable value of travel time estimates. Still relying on data fusion, a novel method based on the mixed logit framework is developed to enable the analysis of departure time choice behaviour using passively collected data (GSM and GPS data) where the challenge is to deal with the lack of information on the desired times of travel. The proposed method relies on data fusion with travel time information extracted from Google Maps in the context of Switzerland. It is unique in the sense that it allows the modeller to understand the sensitivity attached to schedule delay, thus enabling its valuation, despite the passive nature of the data. The model results are in line with the expected travel behaviour, and the schedule delay valuation estimates are reasonable for the study area. Finally, a joint trip generation modelling framework fusing CDR, household travel survey, and census data is developed. The framework adjusts the scaling factors of a traditional trip generation model (based on household travel survey data only) to optimise model performance at both the disaggregate and aggregate levels. The framework is calibrated using data from Bangladesh and the adjusted models are found to have better spatial and temporal transferability. Thus, besides demonstrating the potential of mobile phone data, the thesis makes significant methodological and applied contributions. The use of different datasets provides rich insights that can inform policy measures related to the adoption of big data for transport studies. The research findings are particularly timely for transport agencies and practitioners working in contexts with severe data limitations (especially in developing countries), as well as academics generally interested in exploring the potential of emerging big data sources, both in transport and beyond

    Similar works