1,754 research outputs found

    Scalable Population Synthesis with Deep Generative Modeling

    Full text link
    Population synthesis is concerned with the generation of synthetic yet realistic representations of populations. It is a fundamental problem in the modeling of transport where the synthetic populations of micro-agents represent a key input to most agent-based models. In this paper, a new methodological framework for how to 'grow' pools of micro-agents is presented. The model framework adopts a deep generative modeling approach from machine learning based on a Variational Autoencoder (VAE). Compared to the previous population synthesis approaches, including Iterative Proportional Fitting (IPF), Gibbs sampling and traditional generative models such as Bayesian Networks or Hidden Markov Models, the proposed method allows fitting the full joint distribution for high dimensions. The proposed methodology is compared with a conventional Gibbs sampler and a Bayesian Network by using a large-scale Danish trip diary. It is shown that, while these two methods outperform the VAE in the low-dimensional case, they both suffer from scalability issues when the number of modeled attributes increases. It is also shown that the Gibbs sampler essentially replicates the agents from the original sample when the required conditional distributions are estimated as frequency tables. In contrast, the VAE allows addressing the problem of sampling zeros by generating agents that are virtually different from those in the original data but have similar statistical properties. The presented approach can support agent-based modeling at all levels by enabling richer synthetic populations with smaller zones and more detailed individual characteristics.Comment: 27 pages, 15 figures, 4 table

    A Critical Review on Population Synthesis for Activity- and Agent-Based Transportation Models

    Get PDF
    Traditional four-step transportation planning models fail to capture novel transportation modes such as car/ridesharing. Hence, agent-based models are replacing those traditional models for their scalability, robustness, and capability of simulating nontraditional transportation modes. A crucial step in developing agent-based models is the definition of agents, e.g., household and persons. While model developers wish to capture typical workday travel patterns of the entire study population of travelers, such detailed data are unavailable due to privacy concerns and technical and financial feasibility issues. Hence, modelers opt for population syntheses based on travel diary surveys, land use data, and census data. The most prominent techniques are iterative proportional fitting (IPF), iterative proportional updating (IPU), combinatorial optimization, Markov-based and fitness-based syntheses, and other emerging approaches. Yet, at present, there is no clear guideline on using any of the available techniques. To bridge this gap, this chapter presents a comprehensive synthesis of practice and documents available successful studies

    Developing Travel Behaviour Models Using Mobile Phone Data

    Get PDF
    Improving the performance and efficiency of transport systems requires sound decision-making supported by data and models. However, conducting travel surveys to facilitate travel behaviour model estimation is an expensive venture. Hence, such surveys are typically infrequent in nature, and cover limited sample sizes. Furthermore, the quality of such data is often affected by reporting errors and changes in the respondents’ behaviour due to awareness of being observed. On the other hand, large and diverse quantities of time-stamped location data are nowadays passively generated as a by-product of technological growth. These passive data sources include Global Positioning System (GPS) traces, mobile phone network records, smart card data and social media data, to name but a few. Among these, mobile phone network records (i.e. call detail records (CDRs) and Global Systems for Mobile Communication (GSM) data) offer the biggest promise due to the increasing mobile phone penetration rates in both the developed and the developing worlds. Previous studies using mobile phone data have primarily focused on extracting travel patterns and trends rather than establishing mathematical relationships between the observed behaviour and the causal factors to predict the travel behaviour in alternative policy scenarios. This research aims to extend the application of mobile phone data to travel behaviour modelling and policy analysis by augmenting the data with information derived from other sources. This comes along with significant challenges stemming from the anonymous and noisy nature of the data. Consequently, novel data fusion and modelling frameworks have been developed and tested for different modelling scenarios to demonstrate the potential of this emerging low-cost data source. In the context of trip generation, a hybrid modelling framework has been developed to account for the anonymous nature of CDR data. This involves fusing the CDR and demographic data of a sub-sample of the users to estimate a demographic prediction sub-model based on phone usage variables extracted from the data. The demographic group membership probabilities from this model are then used as class weights in a latent class model for trip generation based on trip rates extracted from the GSM data of the same users. Once estimated, the hybrid model can be applied to probabilistically infer the socio-demographics, and subsequently, the trip generation of a large proportion of the population where only large-scale anonymous CDR data is available as an input. The estimation and validation results using data from Switzerland show that the hybrid model competes well against a typical trip generation model estimated using data with known socio-demographics of the users. The hybrid framework can be applied to other travel behaviour modelling contexts using CDR data (in mode or route choice for instance). The potential of CDR data to capture rational route choice behaviour for long-distance inter-regional O-D pairs (joined by highly overlapping routes) is demonstrated through data fusion with information on the attributes of the alternatives extracted from multiple external sources. The effect of location discontinuities in CDR data (due to its event-driven nature), and how this impacts the ability to observe the users’ trajectories in a highly overlapping network is discussed prompting the development of a route identification algorithm that distinguishes between unique and broad sub-group route choices. The broad choice framework, which was developed in the context of vehicle type choice is then adapted to leverage this limitation where unique route choices cannot be observed for some users, and only the broad sub-groups of the possible overlapping routes are identifiable. The estimation and validation results using data from Senegal show that CDR data can capture rational route choice behaviour, as well as reasonable value of travel time estimates. Still relying on data fusion, a novel method based on the mixed logit framework is developed to enable the analysis of departure time choice behaviour using passively collected data (GSM and GPS data) where the challenge is to deal with the lack of information on the desired times of travel. The proposed method relies on data fusion with travel time information extracted from Google Maps in the context of Switzerland. It is unique in the sense that it allows the modeller to understand the sensitivity attached to schedule delay, thus enabling its valuation, despite the passive nature of the data. The model results are in line with the expected travel behaviour, and the schedule delay valuation estimates are reasonable for the study area. Finally, a joint trip generation modelling framework fusing CDR, household travel survey, and census data is developed. The framework adjusts the scaling factors of a traditional trip generation model (based on household travel survey data only) to optimise model performance at both the disaggregate and aggregate levels. The framework is calibrated using data from Bangladesh and the adjusted models are found to have better spatial and temporal transferability. Thus, besides demonstrating the potential of mobile phone data, the thesis makes significant methodological and applied contributions. The use of different datasets provides rich insights that can inform policy measures related to the adoption of big data for transport studies. The research findings are particularly timely for transport agencies and practitioners working in contexts with severe data limitations (especially in developing countries), as well as academics generally interested in exploring the potential of emerging big data sources, both in transport and beyond

    Methodological and empirical challenges in modelling residential location choices

    No full text
    The modelling of residential locations is a key element in land use and transport planning. There are significant empirical and methodological challenges inherent in such modelling, however, despite recent advances both in the availability of spatial datasets and in computational and choice modelling techniques. One of the most important of these challenges concerns spatial aggregation. The housing market is characterised by the fact that it offers spatially and functionally heterogeneous products; as a result, if residential alternatives are represented as aggregated spatial units (as in conventional residential location models), the variability of dwelling attributes is lost, which may limit the predictive ability and policy sensitivity of the model. This thesis presents a modelling framework for residential location choice that addresses three key challenges: (i) the development of models at the dwelling-unit level, (ii) the treatment of spatial structure effects in such dwelling-unit level models, and (iii) problems associated with estimation in such modelling frameworks in the absence of disaggregated dwelling unit supply data. The proposed framework is applied to the residential location choice context in London. Another important challenge in the modelling of residential locations is the choice set formation problem. Most models of residential location choices have been developed based on the assumption that households consider all available alternatives when they are making location choices. Due the high search costs associated with the housing market, however, and the limited capacity of households to process information, the validity of this assumption has been an on-going debate among researchers. There have been some attempts in the literature to incorporate the cognitive capacities of households within discrete choice models of residential location: for instance, by modelling households’ choice sets exogenously based on simplifying assumptions regarding their spatial search behaviour (e.g., an anchor-based search strategy) and their characteristics. By undertaking an empirical comparison of alternative models within the context of residential location choice in the Greater London area this thesis investigates the feasibility and practicality of applying deterministic choice set formation approaches to capture the underlying search process of households. The thesis also investigates the uncertainty of choice sets in residential location choice modelling and proposes a simplified probabilistic choice set formation approach to model choice sets and choices simultaneously. The dwelling-level modelling framework proposed in this research is practice-ready and can be used to estimate residential location choice models at the level of dwelling units without requiring independent and disaggregated dwelling supply data. The empirical comparison of alternative exogenous choice set formation approaches provides a guideline for modellers and land use planners to avoid inappropriate choice set formation approaches in practice. Finally, the proposed simplified choice set formation model can be applied to model the behaviour of households in online real estate environments.Open Acces

    Prediction of rare feature combinations in population synthesis: Application of deep generative modelling

    Full text link
    In population synthesis applications, when considering populations with many attributes, a fundamental problem is the estimation of rare combinations of feature attributes. Unsurprisingly, it is notably more difficult to reliably representthe sparser regions of such multivariate distributions and in particular combinations of attributes which are absent from the original sample. In the literature this is commonly known as sampling zeros for which no systematic solution has been proposed so far. In this paper, two machine learning algorithms, from the family of deep generative models,are proposed for the problem of population synthesis and with particular attention to the problem of sampling zeros. Specifically, we introduce the Wasserstein Generative Adversarial Network (WGAN) and the Variational Autoencoder(VAE), and adapt these algorithms for a large-scale population synthesis application. The models are implemented on a Danish travel survey with a feature-space of more than 60 variables. The models are validated in a cross-validation scheme and a set of new metrics for the evaluation of the sampling-zero problem is proposed. Results show how these models are able to recover sampling zeros while keeping the estimation of truly impossible combinations, the structural zeros, at a comparatively low level. Particularly, for a low dimensional experiment, the VAE, the marginal sampler and the fully random sampler generate 5%, 21% and 26%, respectively, more structural zeros per sampling zero generated by the WGAN, while for a high dimensional case, these figures escalate to 44%, 2217% and 170440%, respectively. This research directly supports the development of agent-based systems and in particular cases where detailed socio-economic or geographical representations are required

    Migratory responses to agricultural risk in Northern Nigeria

    Get PDF
    We investigate the extent in which northern Nigerian households engage in internal migration to insure against ex ante and ex post agricultural risk due to weather-related variability and shocks. We use data on the migration patterns of individuals over a 20-year period and temperature degree-days to identify agricultural risk. Controlling for ex ante and ex post risk, we find that households with higher ex ante risk are more likely to send migrants. Households facing hot shocks before the migrant’s move tend to keep their male migrants in closer proximity. These findings suggest that households use migration as a risk management strategy in response to both ex ante and ex post risk, but that migration responses are gender-specific. These findings have implications not only for understanding the insurance motives of households, but also potential policy responses tied to climatic warming.Migration, Risk, temperature degree days,