6,801 research outputs found
Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks
We present a novel Bayesian nonparametric regression model for covariates X
and continuous, real response variable Y. The model is parametrized in terms of
marginal distributions for Y and X and a regression function which tunes the
stochastic ordering of the conditional distributions F(y|x). By adopting an
approximate composite likelihood approach, we show that the resulting posterior
inference can be decoupled for the separate components of the model. This
procedure can scale to very large datasets and allows for the use of standard,
existing, software from Bayesian nonparametric density estimation and
Plackett-Luce ranking estimation to be applied. As an illustration, we show an
application of our approach to a US Census dataset, with over 1,300,000 data
points and more than 100 covariates
Nonparametric estimation of k-modal taste heterogeneity for group level agent-based mixed logit
Estimating agent-specific taste heterogeneity with a large information and
communication technology (ICT) dataset requires both model flexibility and
computational efficiency. We propose a group-level agent-based mixed (GLAM)
logit approach that is estimated with inverse optimization (IO) and group-level
market share. The model is theoretically consistent with the RUM model
framework, while the estimation method is a nonparametric approach that fits to
market-level datasets, which overcomes the limitations of existing approaches.
A case study of New York statewide travel mode choice is conducted with a
synthetic population dataset provided by Replica Inc., which contains mode
choices of 19.53 million residents on two typical weekdays, one in Fall 2019
and another in Fall 2021. Individual mode choices are grouped into market-level
market shares per census block-group OD pair and four population segments,
resulting in 120,740 group-level agents. We calibrate the GLAM logit model with
the 2019 dataset and compare to several benchmark models: mixed logit (MXL),
conditional mixed logit (CMXL), and individual parameter logit (IPL). The
results show that empirical taste distribution estimated by GLAM logit can be
either unimodal or multimodal, which is infeasible for MXL/CMXL and hard to
fulfill in IPL. The GLAM logit model outperforms benchmark models on the 2021
dataset, improving the overall accuracy from 82.35% to 89.04% and improving the
pseudo R-square from 0.4165 to 0.5788. Moreover, the value-of-time (VOT) and
mode preferences retrieved from GLAM logit aligns with our empirical knowledge
(e.g., VOT of NotLowIncome population in NYC is $28.05/hour; public transit and
walking is preferred in NYC). The agent-specific taste parameters are essential
for the policymaking of statewide transportation projects
An investigation into machine learning approaches for forecasting spatio-temporal demand in ride-hailing service
In this paper, we present machine learning approaches for characterizing and
forecasting the short-term demand for on-demand ride-hailing services. We
propose the spatio-temporal estimation of the demand that is a function of
variable effects related to traffic, pricing and weather conditions. With
respect to the methodology, a single decision tree, bootstrap-aggregated
(bagged) decision trees, random forest, boosted decision trees, and artificial
neural network for regression have been adapted and systematically compared
using various statistics, e.g. R-square, Root Mean Square Error (RMSE), and
slope. To better assess the quality of the models, they have been tested on a
real case study using the data of DiDi Chuxing, the main on-demand ride hailing
service provider in China. In the current study, 199,584 time-slots describing
the spatio-temporal ride-hailing demand has been extracted with an
aggregated-time interval of 10 mins. All the methods are trained and validated
on the basis of two independent samples from this dataset. The results revealed
that boosted decision trees provide the best prediction accuracy (RMSE=16.41),
while avoiding the risk of over-fitting, followed by artificial neural network
(20.09), random forest (23.50), bagged decision trees (24.29) and single
decision tree (33.55).Comment: Currently under review for journal publicatio
Using nonparametrics to specify a model to measure the value of travel time
Using a range of nonparametric methods, the paper examines the specification of a model to evaluate the willingness-to-pay (WTP) for travel time changes from binomial choice data from a simple time-cost trading experiment. The analysis favours a model with random WTP as the only source of randomness over a model with fixed WTP which is linear in time and cost and has an additive random error term. Results further indicate that the distribution of log WTP can be described as a sum of a linear index fixing the location of the log WTP distribution and an independent random variable representing unobserved heterogeneity. This formulation is useful for parametric modelling. The index indicates that the WTP varies systematically with income and other individual characteristics. The WTP varies also with the time difference presented in the experiment which is in contradiction of standard utility theory.Willingness-to-pay; WTP; value of time; nonparametric; semiparametric; local logit
On the development of a semi-nonparametric generalized multinomial logit model for travel-related choices
abstract: A semi-nonparametric generalized multinomial logit model, formulated using orthonormal Legendre polynomials to extend the standard Gumbel distribution, is presented in this paper. The resulting semi-nonparametric function can represent a probability density function for a large family of multimodal distributions. The model has a closed-form log-likelihood function that facilitates model estimation. The proposed method is applied to model commute mode choice among four alternatives (auto, transit, bicycle and walk) using travel behavior data from Argau, Switzerland. Comparisons between the multinomial logit model and the proposed semi-nonparametric model show that violations of the standard Gumbel distribution assumption lead to considerable inconsistency in parameter estimates and model inferences.The article is published at http://journals.plos.org/plosone/article?id=10.1371/journal.pone.018668
The multinomial logit model revisited: a semi-parametric approach in discrete choice analysis
The multinomial logit model in discrete choice analysis is widely used in transport research. It has long been known that the Gumbel distribution forms the basis of the multinomial logit model. Although the Gumbel distribution is a good approximation in some applications such as route choice problems, it is chosen mainly for mathematical convenience. This can be restrictive in many other scenarios in practice. In this paper we show that the assumption of the Gumbel distribution can be substantially relaxed to include a large class of distributions that is stable with respect to the minimum operation. The distributions in the class allow heteroscedastic variances. We then seek a transformation that stabilizes the heteroscedastic variances. We show that this leads to a semi-parametric choice model which links the linear combination of travel-related attributes to the choice probabilities via an unknown sensitivity function. This sensitivity function reflects the degree of travelers’ sensitivity to the changes in the combined travel cost. The estimation of the semi-parametric choice model is also investigated and empirical studies are used to illustrate the developed method
Roadway System Assessment Using Bluetooth-Based Automatic Vehicle Identification Travel Time Data
This monograph is an exposition of several practice-ready methodologies for automatic vehicle identification (AVI) data collection systems. This includes considerations in the physical setup of the collection system as well as the interpretation of the data. An extended discussion is provided, with examples, demonstrating data techniques for converting the raw data into more concise metrics and views. Examples of statistical before-after tests are also provided. A series of case studies were presented that focus on various real-world applications, including the impact of winter weather on freeway operations, the economic benefit of traffic signal retiming, and the estimation of origin-destination matrices from travel time data. The technology used in this report is Bluetooth MAC address matching, but the concepts are extendible to other AVI data sources
- …