2,240 research outputs found

    A multi-level predictive methodology for terminal area air traffic flow

    Get PDF
    Over the past few decades, the air transportation system has grown significantly. In particular, the number of passengers using air transportation has greatly increased. As the demand for air travel expands, airport departure/arrival demand almost reaches its capacity. In consequence, the level of delays increases since the system capacity cannot manage the increased demand. With this trend, the national airspace system (NAS) will be saturated, and the congestion at the airport will become even more severe. As a result of congestion, a considerable number of flights experience delays. According to the Bureau of Transportation Statistics (BTS), over 1 million flights are operated in a year, and about twenty percent of all scheduled commercial flights are delayed more than 15 minutes. These delays cost billions of dollars annually for airlines, passengers, and the US economy. Therefore, this study seeks to find out why the delays occur and to analyze patterns in which the delays occurred. Analysis of airport operations generally falls into a macro or micro perspective. At the macro point of view, very few details are considered, and delays are aggregated at the airport level. Especially, shortfalls in airport capacity and a capacity-demand imbalance are the primary causes of delays in this respect. In the micro perspective, each aircraft is modeled individually, and the causes of delays are reproduced as precisely as possible. Micro reasons for air traffic delays include inclement weather, mechanics problems, operation issues. In this regard, this research proposes a methodology that can efficiently and practically predict macro and micro-level air traffic flow in the terminal area. For a macro-level analysis of delays, artificial neural networks models are proposed to predict the hourly airport capacity. Multi-layer perceptron (MLP), recurrent neural network (RNN), and long short-term memory (LSTM) are trained with historical weather and airport capacity data of Hartsfield-Jackson Atlanta airport (ATL). In the performance evaluation, the models have presented decent predictive performance and successfully predicted the test data as well as the training data. On the other hand, Random Forests and AdaBoost are implemented in the micro-level modeling of the air traffic. The micro-level models trained with on-time flight performance data and corresponding weather data focus on a classification of the individual flight delays. The model provides interpretability and imbalanced data handling while the accuracy is as good as the existing methods. Lastly, the predictive model for individual flight delays is refined using the cost-proportionate rejection sampling (costing) method. Along with the integration of the costing method, general machine learning algorithms have been converted to cost-sensitive classifiers. The cost-sensitive classifiers were able to account for asymmetric misclassification costs without losing their diagnostic functionality as binary classifiers. This study presents a data-driven approach to air traffic flow management that can effectively utilize air traffic data accumulated over decades. Through data analysis from the macro and micro perspective, an integrated methodology for terminal air traffic flow prediction is provided. An accurate prediction of the airport capacity and individual flight delays will assist stakeholders in taking more informed decisions.Ph.D

    Predictive modelling : flight delays and associated factors hartsfield–Jackson Atlanta international airport

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceAtualmente, um ponto negativo nas viagens de avião são os atrasos que, constantemente, são anunciados aos passageiros resultando numa diminuição da sua satisfação enquanto clientes. Este e outros fatores fazem com que elevados custos, tanto quantitativos como qualitativos sejam imputados às companhias. Consequentemente, existe a necessidade de prever e mitigar a existência de atrasos aéreos que pode ajudar as companhias aéreas bem como aeroportos a melhorar a sua performance e a aplicar algumas medidas, dirigidas ao consumidor, que permitiam atenuar ou até anular o efeito que estes atrasos provoca nos seus passageiros. Deste modo, este estudo tem como principal objetivo prever a ocorrência de atrasos nas chegadas ao aeroporto internacional de Hartsfield-Jackson. Esta estimativa será possível através da elaboração de um modelo preditivo, recorrendo a diversas técnicas de Data Mining. Com a aplicação destas técnicas, foi possível identificar as variáveis que mais contribuíram para a existência do atraso. No desenvolvimento deste trabalho, foi seguida a metodologia da descoberta de conhecimento em base de dados (conhecida em inglês por Knowledge Discovery Database, KDD). Fases como a recolha dos dados, a aplicação de técnicas de amostragem (SMOTE e Undersampling), a partição dos dados em treino e teste, o pré-processamento (dados omissos e outliers) e transformação dos dados (normalização dos dados e seleção de atributos), a definição de modelos a treinar (Decision Trees, Random Forest e Multilayer Perceptron) bem como a avaliação da performance dos modelos através de métricas variadas foram aplicadas. Depois de testar diferentes abordagens, concluiu-se que o melhor modelo é alcançado com as variáveis relacionadas com a partida, usando o algoritmo Multilayer Perceptron e aplicando a técnica de SMOTE para lidar com dados não balanceados, removendo outliers e selecionando dez variáveis usando GainRatio. Por outro lado, quando as variáveis com informação da partida são excluídas, o algoritmo que melhor se destaca é o Multilayer Perceptron usando a técnica SMOTE, mas desta vez, incluindo os outliers e com quinze variáveis selecionadas novamente pelo GainRatio. Em ambas as hipóteses, as variáveis explicativas que mais contribuem para a existência do atraso na chegada são relacionadas com o clima, com as características do avião e com a propagação do atraso. Os resultados do algoritmo de Random Forests mostraram melhor desempenho, em relação à precisão, em comparação com outros autores (Belcastro, Marozzo, Talia, & Trunfio, 2016; Choi, Kim, Briceno, & Mavris, 2016). Contrariamente, o algoritmo Multilayer Perceptron, apresentou menor precisão em comparação com outro estudo equivalente (Y. J. Kim, Choi, Briceno, & Mavris, 2016).Nowadays, a downside to traveling is the delays that are constantly advertised to passengers resulting in a decrease in customer satisfaction. These delays associated with other factors can cause costs, both quantitative and qualitative. Consequently, there is a need to anticipate and mitigate the existence of airborne delays that can help airlines and airports improving their performance or even take some consumer-oriented measures that can undo or attenuate the effect that these delays have on their passengers. This study has as primary objective to predict the occurrence of arrival delays of the international airport of Hartsfield-Jackson. It was possible by building a predictive model, applying several Data Mining techniques. With these applications, it was possible to show the variables, among the proposals, that most contributed to the existence of the delay. In this work, the Knowledge Discovery Database (KDD) methodology was followed. Phases such as data collection; sampling techniques (SMOTE and Undersampling); Data partitioning in training and testing; Pre-processing (missing data and outliers) and data transformation (data normalization and attribute selection); And, finally the definition of models to be trained (Decision Trees, Random Forests, and Multilayer Perceptron), as well as the evaluation of the performance of the models through varied metrics, were used. After testing different approaches, it was concluded that the best model is achieved with the variables related to departure, using the Multilayer Perceptron algorithm and applying SMOTE to deal with unbalanced data, removing outliers and selecting ten variables using GainRatio. On the other hand, when the variables with information of the departure are excluded, the algorithm that performs best is also the Multilayer Perceptron using the SMOTE technique but, this time, including the outliers and with fifteen variables selected again by the GainRatio. On both hypotheses, the explanatory variables that most contributed to the existence of the delay in arrivals were related to the weather, the airplane characteristics and the propagation of the delay. Our results for the Random Forests algorithm shown better performance, regarding accuracy, compared to other authors (Belcastro et al., 2016; Choi et al., 2016). Contrary, for the Multilayer Perceptron algorithm, was presented a lower accuracy compared to another equivalent study (Y. J. Kim et al., 2016)

    Predicting & Optimizing Airlines Customer Satisfaction Using Classification

    Get PDF
    This research is going to be a machine learning project that aims to study the various factors that may play a role in forming customer satisfaction response and tries to figure out which attributes or combination of them are the driver of positive customer satisfaction. The research is going to use initially some dataset from Kaggle (explained in the section of data source) in order to run machine learning algorithms and creating a predictor that would help airlines in predicting which customers are satisfied and trying to have a proactive reaction in case of negative feedback, so we can make it up to the annoyed customer and get him satisfied. The research is going to examine several classification algorithms and tries to tune them in order to get the best result. Then will do experiments on resulting models and tries to find the optimal one among the others

    Evaluating Wi-Fi indoor positioning approaches in a real world environment

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsGlobal positioning system(GPS) does not provide generally a good positioning performance in an indoor location because of many reasons (Henniges, 2012). On the other hand, other alternatives such as the WI-FI technology has become recently in a popular use to provide indoor localization. And that is due to many reasons, such as the wide spread of WI-FI infrastructure in the indoor environments and the low cost of this technology. This study attempts to evaluate different WI-FI indoor positioning approaches in a real world environment. In particular, in retail stores and shopping malls. The pros and cons of each one of these approaches are pointed out. The main purpose of this study from the company perspective is to explore the state of the art methods and the cutting edge technologies of the WI-FI IPS and to come up with an improvement of their indoor localization system. This system forms the core of the company`s retail-analytics product that uses a Wi-Fi positioning technology to provide indoor location based services for the customers and helps retailers to better understanding their businesses

    A study on the prediction of flight delays of a private aviation airline

    Get PDF
    The delay is a crucial performance indicator of any transportation system, and flight delays cause financial and economic consequences to passengers and airlines. Hence, recognizing them through prediction may improve marketing decisions. The goal is to use machine learning techniques to predict an aviation challenge: flight delay above 15 minutes on departure of a private airline. Business and data understanding of this particular segment of aviation are revised against literature revision, and data preparation, modelling and evaluation are addressed to lead towards a model that may contribute as support for decision-making in a private aviation environment. The results show us which algorithms performed better and what variables contribute the most for the model, thereafter delay on departure.O atraso de voo é um indicador fulcral em toda a indútria de transporte aéreo e esses atrasos têm consequências económicas e financeiras para passageiros e companhias aéras. Reconhecê- los através de predição poderá melhorar decisões estratégicas e operacionais. O objectivo é utilizar técnicas de aprendizagem de máquina (machine learning) para prever um eterno desafio da aviação: atraso de voo à partida, utilizando dados de uma companhia aérea privada. O conhecimento do contexto do negócio e dos dados adquiridos, num segmento singular da aviação, são revistos à luz das literatura vigente e a preparação dos dados, a modelização e respectiva avaliação são conduzidos de modo a contribuir para uma ferramenta de apoio à decisão no contexto da aviação privada. Os resultados obtidos revelam quais dos algoritmos utilizados demonstra uma melhor performance e quais as variáveis dos dados obtidos que mais contribuem para o modelo e consequentemente para o atraso à partida

    In-Database Data Imputation

    Get PDF
    Missing data is a widespread problem in many domains, creating challenges in data analysis and decision making. Traditional techniques for dealing with missing data, such as excluding incomplete records or imputing simple estimates (e.g., mean), are computationally efficient but may introduce bias and disrupt variable relationships, leading to inaccurate analyses. Model-based imputation techniques offer a more robust solution that preserves the variability and relationships in the data, but they demand significantly more computation time, limiting their applicability to small datasets. This work enables efficient, high-quality, and scalable data imputation within a database system using the widely used MICE method. We adapt this method to exploit computation sharing and a ring abstraction for faster model training. To impute both continuous and categorical values, we develop techniques for in-database learning of stochastic linear regression and Gaussian discriminant analysis models. Our MICE implementations in PostgreSQL and DuckDB outperform alternative MICE implementations and model-based imputation techniques by up to two orders of magnitude in terms of computation time, while maintaining high imputation quality

    Machine learned daily life history classification using low frequency tracking data and automated modelling pipelines: application to North American waterfowl

    Get PDF
    Background: Identifying animal behaviors, life history states, and movement patterns is a prerequisite for many animal behavior analyses and effective management of wildlife and habitats. Most approaches classify short-term movement patterns with high frequency location or accelerometry data. However, patterns reflecting life history across longer time scales can have greater relevance to species biology or management needs, especially when available in near real-time. Given limitations in collecting and using such data to accurately classify complex behaviors in the long-term, we used hourly GPS data from 5 waterfowl species to produce daily activity classifications with machine-learned models using “automated modelling pipelines”. Methods: Automated pipelines are computer-generated code that complete many tasks including feature engineering, multi-framework model development, training, validation, and hyperparameter tuning to produce daily classifications from eight activity patterns reflecting waterfowl life history or movement states. We developed several input features for modeling grouped into three broad categories, hereafter “feature sets”: GPS locations, habitat information, and movement history. Each feature set used different data sources or data collected across different time intervals to develop the “features” (independent variables) used in models. Results: Automated modelling pipelines rapidly developed easily reproducible data preprocessing and analysis steps, identification and optimization of the best performing model and provided outputs for interpreting feature importance. Unequal expression of life history states caused unbalanced classes, so we evaluated feature set importance using a weighted F1-score to balance model recall and precision among individual classes. Although the best model using the least restrictive feature set (only 24 hourly relocations in a day) produced effective classifications (weighted F1 = 0.887), models using all feature sets performed substantially better (weighted F1 = 0.95), particularly for rarer but demographically more impactful life history states (i.e., nesting). Conclusions: Automated pipelines generated models producing highly accurate classifications of complex daily activity patterns using relatively low frequency GPS and incorporating more classes than previous GPS studies. Near real-time classification is possible which is ideal for time-sensitive needs such as identifying reproduction. Including habitat and longer sequences of spatial information produced more accurate classifications but incurred slight delays in processing
    corecore