Study of delay prediction in the US airport network

Abstract

In modern business, Artificial Intelligence (AI) and Machine Learning (ML) have affected strategy and decision-making positively in the form of predictive modeling. This study aims to use ML and AI to predict arrival flight delays in the United States airport network. Flight delays carry severe social, environmental, and economic impacts with them, and deploying ML models during the process of strategic decision-making, can help to reduce the impacts of these delays. To achieve the result of the study, a literature study and critical appraisal have been carried out on previous studies and research relating to flight delay prediction. In the literature study, the datasets used, selected features, selected algorithms, and evaluation tools used in previous studies have been analyzed. The results from the literature study and critical appraisal have influenced the decisions made in the methodology for this study. In the methodology, a choice is made for two public datasets, one of the domestic flight data of 2017 and one of the weather data of 2017. These two datasets are then processed in a custom-designed data pipeline which is built using Spark. The processed data is split into training data, validation data, and testing data. The training data and validation data are used to train and hyperparameter tune several ML models using both Spark and H2O. Subsequently, these ML models are evaluated and compared based on performance metrics obtained using the testing data. From this comparison, the best-performing model is presented as a suitable solution for arrival flight delay prediction. The predictive model with the best performance among logistic regression, random forest, gradient boosting machine, and feed-forward neural networks ended up being the gradient boosting machine with far better predictive modeling performance. This solution can be deployed as a supportive ML model during strategic decision-making

    Similar works