21,990 research outputs found
Multitask Learning Deep Neural Networks to Combine Revealed and Stated Preference Data
It is an enduring question how to combine revealed preference (RP) and stated
preference (SP) data to analyze travel behavior. This study presents a
framework of multitask learning deep neural networks (MTLDNNs) for this
question, and demonstrates that MTLDNNs are more generic than the traditional
nested logit (NL) method, due to its capacity of automatic feature learning and
soft constraints. About 1,500 MTLDNN models are designed and applied to the
survey data that was collected in Singapore and focused on the RP of four
current travel modes and the SP with autonomous vehicles (AV) as the one new
travel mode in addition to those in RP. We found that MTLDNNs consistently
outperform six benchmark models and particularly the classical NL models by
about 5% prediction accuracy in both RP and SP datasets. This performance
improvement can be mainly attributed to the soft constraints specific to
MTLDNNs, including its innovative architectural design and regularization
methods, but not much to the generic capacity of automatic feature learning
endowed by a standard feedforward DNN architecture. Besides prediction, MTLDNNs
are also interpretable. The empirical results show that AV is mainly the
substitute of driving and AV alternative-specific variables are more important
than the socio-economic variables in determining AV adoption. Overall, this
study introduces a new MTLDNN framework to combine RP and SP, and demonstrates
its theoretical flexibility and empirical power for prediction and
interpretation. Future studies can design new MTLDNN architectures to reflect
the speciality of RP and SP and extend this work to other behavioral analysis
Tree Boosting Data Competitions with XGBoost
This Master's Degree Thesis objective is to provide understanding on how to approach a supervised learning predictive problem and illustrate it using a statistical/machine learning algorithm, Tree Boosting. A review of tree methodology is introduced in order to understand its evolution, since Classification and Regression Trees, followed by Bagging, Random Forest and, nowadays, Tree Boosting. The methodology is explained following the XGBoost implementation, which achieved state-of-the-art results in several data competitions. A framework for applied predictive modelling is explained with its proper concepts: objective function, regularization term, overfitting, hyperparameter tuning, k-fold cross validation and feature engineering. All these concepts are illustrated with a real dataset of videogame churn; used in a datathon competition
- …