6,801 research outputs found

    Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks

    Full text link
    We present a novel Bayesian nonparametric regression model for covariates X and continuous, real response variable Y. The model is parametrized in terms of marginal distributions for Y and X and a regression function which tunes the stochastic ordering of the conditional distributions F(y|x). By adopting an approximate composite likelihood approach, we show that the resulting posterior inference can be decoupled for the separate components of the model. This procedure can scale to very large datasets and allows for the use of standard, existing, software from Bayesian nonparametric density estimation and Plackett-Luce ranking estimation to be applied. As an illustration, we show an application of our approach to a US Census dataset, with over 1,300,000 data points and more than 100 covariates

    Nonparametric estimation of k-modal taste heterogeneity for group level agent-based mixed logit

    Full text link
    Estimating agent-specific taste heterogeneity with a large information and communication technology (ICT) dataset requires both model flexibility and computational efficiency. We propose a group-level agent-based mixed (GLAM) logit approach that is estimated with inverse optimization (IO) and group-level market share. The model is theoretically consistent with the RUM model framework, while the estimation method is a nonparametric approach that fits to market-level datasets, which overcomes the limitations of existing approaches. A case study of New York statewide travel mode choice is conducted with a synthetic population dataset provided by Replica Inc., which contains mode choices of 19.53 million residents on two typical weekdays, one in Fall 2019 and another in Fall 2021. Individual mode choices are grouped into market-level market shares per census block-group OD pair and four population segments, resulting in 120,740 group-level agents. We calibrate the GLAM logit model with the 2019 dataset and compare to several benchmark models: mixed logit (MXL), conditional mixed logit (CMXL), and individual parameter logit (IPL). The results show that empirical taste distribution estimated by GLAM logit can be either unimodal or multimodal, which is infeasible for MXL/CMXL and hard to fulfill in IPL. The GLAM logit model outperforms benchmark models on the 2021 dataset, improving the overall accuracy from 82.35% to 89.04% and improving the pseudo R-square from 0.4165 to 0.5788. Moreover, the value-of-time (VOT) and mode preferences retrieved from GLAM logit aligns with our empirical knowledge (e.g., VOT of NotLowIncome population in NYC is $28.05/hour; public transit and walking is preferred in NYC). The agent-specific taste parameters are essential for the policymaking of statewide transportation projects

    An investigation into machine learning approaches for forecasting spatio-temporal demand in ride-hailing service

    Full text link
    In this paper, we present machine learning approaches for characterizing and forecasting the short-term demand for on-demand ride-hailing services. We propose the spatio-temporal estimation of the demand that is a function of variable effects related to traffic, pricing and weather conditions. With respect to the methodology, a single decision tree, bootstrap-aggregated (bagged) decision trees, random forest, boosted decision trees, and artificial neural network for regression have been adapted and systematically compared using various statistics, e.g. R-square, Root Mean Square Error (RMSE), and slope. To better assess the quality of the models, they have been tested on a real case study using the data of DiDi Chuxing, the main on-demand ride hailing service provider in China. In the current study, 199,584 time-slots describing the spatio-temporal ride-hailing demand has been extracted with an aggregated-time interval of 10 mins. All the methods are trained and validated on the basis of two independent samples from this dataset. The results revealed that boosted decision trees provide the best prediction accuracy (RMSE=16.41), while avoiding the risk of over-fitting, followed by artificial neural network (20.09), random forest (23.50), bagged decision trees (24.29) and single decision tree (33.55).Comment: Currently under review for journal publicatio

    Using nonparametrics to specify a model to measure the value of travel time

    Get PDF
    Using a range of nonparametric methods, the paper examines the specification of a model to evaluate the willingness-to-pay (WTP) for travel time changes from binomial choice data from a simple time-cost trading experiment. The analysis favours a model with random WTP as the only source of randomness over a model with fixed WTP which is linear in time and cost and has an additive random error term. Results further indicate that the distribution of log WTP can be described as a sum of a linear index fixing the location of the log WTP distribution and an independent random variable representing unobserved heterogeneity. This formulation is useful for parametric modelling. The index indicates that the WTP varies systematically with income and other individual characteristics. The WTP varies also with the time difference presented in the experiment which is in contradiction of standard utility theory.Willingness-to-pay; WTP; value of time; nonparametric; semiparametric; local logit

    On the development of a semi-nonparametric generalized multinomial logit model for travel-related choices

    Get PDF
    abstract: A semi-nonparametric generalized multinomial logit model, formulated using orthonormal Legendre polynomials to extend the standard Gumbel distribution, is presented in this paper. The resulting semi-nonparametric function can represent a probability density function for a large family of multimodal distributions. The model has a closed-form log-likelihood function that facilitates model estimation. The proposed method is applied to model commute mode choice among four alternatives (auto, transit, bicycle and walk) using travel behavior data from Argau, Switzerland. Comparisons between the multinomial logit model and the proposed semi-nonparametric model show that violations of the standard Gumbel distribution assumption lead to considerable inconsistency in parameter estimates and model inferences.The article is published at http://journals.plos.org/plosone/article?id=10.1371/journal.pone.018668

    The multinomial logit model revisited: a semi-parametric approach in discrete choice analysis

    Get PDF
    The multinomial logit model in discrete choice analysis is widely used in transport research. It has long been known that the Gumbel distribution forms the basis of the multinomial logit model. Although the Gumbel distribution is a good approximation in some applications such as route choice problems, it is chosen mainly for mathematical convenience. This can be restrictive in many other scenarios in practice. In this paper we show that the assumption of the Gumbel distribution can be substantially relaxed to include a large class of distributions that is stable with respect to the minimum operation. The distributions in the class allow heteroscedastic variances. We then seek a transformation that stabilizes the heteroscedastic variances. We show that this leads to a semi-parametric choice model which links the linear combination of travel-related attributes to the choice probabilities via an unknown sensitivity function. This sensitivity function reflects the degree of travelers’ sensitivity to the changes in the combined travel cost. The estimation of the semi-parametric choice model is also investigated and empirical studies are used to illustrate the developed method

    Roadway System Assessment Using Bluetooth-Based Automatic Vehicle Identification Travel Time Data

    Get PDF
    This monograph is an exposition of several practice-ready methodologies for automatic vehicle identification (AVI) data collection systems. This includes considerations in the physical setup of the collection system as well as the interpretation of the data. An extended discussion is provided, with examples, demonstrating data techniques for converting the raw data into more concise metrics and views. Examples of statistical before-after tests are also provided. A series of case studies were presented that focus on various real-world applications, including the impact of winter weather on freeway operations, the economic benefit of traffic signal retiming, and the estimation of origin-destination matrices from travel time data. The technology used in this report is Bluetooth MAC address matching, but the concepts are extendible to other AVI data sources
    • …
    corecore