Multi-headed self-attention mechanism-based Transformer model for predicting bus travel times across multiple bus routes using heterogeneous datasets

Abstract

Bus transit is a crucial component of transportation networks, especially in urban areas. Bus agencies must enhance the quality of their real-time bus travel information service to serve their passengers better and attract more travelers. Various models have recently been developed for estimating bus travel times to increase the quality of real-time information service. However, most are concentrated on smaller road networks due to their generally subpar performance in densely populated urban regions on a vast network and failure to produce good results with long-range dependencies. This paper develops a deep learning-based architecture using a single-step multi-station forecasting approach to predict average bus travel times for numerous routes, stops, and trips on a large-scale network using heterogeneous bus transit data collected from the GTFS database and the vehicle probe data. Over one week, data was gathered from multiple bus routes in Saint Louis, Missouri. This study developed a multi-headed self-attention mechanism-based Univariate Transformer neural network to predict the mean vehicle travel times for different hours of the day for multiple stations across multiple routes. In addition, we developed Multivariate GRU and LSTM neural network models for our research to compare the prediction accuracy and comprehend the robustness of the Transformer model. To validate the Transformer Model's performance more in comparison to the GRU and LSTM models, we employed the Historical Average Model and XGBoost model as benchmark models. Historical time steps and prediction horizon were set up to 5 and 1, respectively, which means that five hours of historical average travel time data were used to predict average travel time for the following hour. Only the historical average bus travel time was used as the input parameter for the Transformer model. Other features, including spatial and temporal information, volatility measures (e.g., the standard deviation and variance of travel time), dwell time, expected travel time, jam factors, hours of a day, etc., were captured from our dataset. These parameters were employed to develop the Multivariate GRU and LSTM models. The model's performance was evaluated based on a performance metric called Mean Absolute Percentage Error (MAPE). The results showed that the Transformer model outperformed other models for one-hour ahead prediction having minimum and mean MAPE values of 4.32 percent and 8.29 percent, respectively. We also investigated that the Transformer model performed the best during different traffic conditions (e.g., peak and off-peak hours). Furthermore, we also displayed the model computation time for the prediction; XGBoost was found to be the quickest, with a prediction time of 6.28 seconds, while the Transformer model had a prediction time of 7.42 seconds. The study's findings demonstrate that the Transformer model showed its applicability for real-time travel time prediction and guaranteed the high quality of the predictions produced by the model in the context of a complicated extensive transportation network in high-density urban areas and capturing long-range dependencies.Includes bibliographical references

    Similar works