5 research outputs found

    A Deep Feedforward Neural Network and Shallow Architectures Effectiveness Comparison: Flight Delays Classification Perspective

    Get PDF
    Flight delays have negatively impacted the socio-economics state of passengers, airlines and airports, resulting in huge economic losses. Hence, it has become necessary to correctly predict their occurrences in decision-making because it is important for the effective management of the aviation industry. Developing accurate flight delays classification models depends mostly on the air transportation system complexity and the infrastructure available in airports, which may be a region-specific issue. However, no specific prediction or classification model can handle the individual characteristics of all airlines and airports at the same time. Hence, the need to further develop and compare predictive models for the aviation decision system of the future cannot be over-emphasised. In this research, flight on-time data records from the United State Bureau of Transportation Statistics was employed to evaluate the performances of Deep Feedforward Neural Network, Neural Network, and Support Vector Machine models on a binary classification problem. The research revealed that the models achieved different accuracies of flight delay classifications. The Support Vector Machine had the worst average accuracy than Neural Network and Deep Feedforward Neural Network in the initial experiment. The Deep Feedforward Neural Network outperformed Support Vector Machines and Neural Network with the best average percentage accuracies. Going further to investigate the Deep Feedforward Neural Network architecture on different parameters against itself suggest that training a Deep Feedforward Neural Network algorithm, regardless of data training size, the classification accuracy peaks. We examine which number of epochs works best in our flight delay classification settings for the Deep Feedforward Neural Network. Our experiment results demonstrate that having many epochs affects the convergence rate of the model; unlike when hidden layers are increased, it does not ensure better or higher accuracy in a binary classification of flight delays. Finally, we recommended further studies on the applicability of the Deep Feedforward Neural Network in flight delays prediction with specific case studies of either airlines or airports to check the impact on the model's performance

    A deep feedforward neural network and shallow architectures effectiveness comparison: Flight delays classification perspective

    Get PDF
    Flight delays have negatively impacted the socio-economics state of passengers, airlines and airports, resulting in huge economic losses. Hence, it has become necessary to correctly predict their occurrences in decision-making because it is important for the effective management of the aviation industry. Developing accurate flight delays classification models depends mostly on the air transportation system complexity and the infrastructure available in airports, which may be a region-specific issue. However, no specific prediction or classification model can handle the individual characteristics of all airlines and airports at the same time. Hence, the need to further develop and compare predictive models for the aviation decision system of the future cannot be over-emphasised. In this research, flight on-time data records from the United State Bureau of Transportation Statistics was employed to evaluate the performances of Deep Feedforward Neural Network, Neural Network, and Support Vector Machine models on a binary classification problem. The research revealed that the models achieved different accuracies of flight delay classifications. The Support Vector Machine had the worst average accuracy than Neural Network and Deep Feedforward Neural Network in the initial experiment. The Deep Feedforward Neural Network outperformed Support Vector Machines and Neural Network with the best average percentage accuracies. Going further to investigate the Deep Feedforward Neural Network architecture on different parameters against itself suggest that training a Deep Feedforward Neural Network algorithm, regardless of data training size, the classification accuracy peaks. We examine which number of epochs works best in our flight delay classification settings for the Deep Feedforward Neural Network. Our experiment results demonstrate that having many epochs affects the convergence rate of the model; unlike when hidden layers are increased, it does not ensure better or higher accuracy in a binary classification of flight delays. Finally, we recommended further studies on the applicability of the Deep Feedforward Neural Network in flight delays prediction with specific case studies of either airlines or airports to check the impact on the model’s performance

    Modelo para identificar los vuelos afectados por retrasos o cancelaciones en el aeropuerto El Dorado de Bogotá, Colombia

    Get PDF
    Este trabajo está basado en el análisis de factores climáticos y operacionales de las aerolíneas con operación en Colombia. El factor operacional contiene el detalle de los vuelos que tienen lugar en los aeropuertos del país con variables como origen, destino, número de vuelo, aerolínea, fecha y hora programada, fecha y hora de remolque, estado del vuelo (adelantado, cumplido, retrasado y cancelado), cantidad de pasajeros, cantidad de carga, distancia y tiempo de vuelo entre otras. Por el gran peso e importancia que tiene el Aeropuerto El Dorado de Bogotá, el análisis y modelo resultado de este trabajo se centró en la operación y factores climáticos que tienen incidencia en este terminal aéreo. Por medio de técnicas como regresión logística, redes neuronales y XGboosting se logró predecir en la base de datos de pruebas cerca del 70% de los vuelos afectados por cancelaciones o retrasos en el aeropuerto de la capital colombiana.This work is based on the analysis of weather and operational factors of the airlines operating in Colombia. The operational factor contains the detail of the flights that take place in the country's airports with variables such as origin, destination, flight number, airline, scheduled date and time, towing date and time, flight status (early, on time, delayed and canceled), number of passengers, amount of cargo, distance and flight time among others. Due to the great weight and importance of the El Dorado Airport of Bogotá, the analysis and model resulting from this work focused on the operation and weather factors that have an impact on this Airport. Using techniques such as logistic regression, neural networks and XGboosting, it was possible to predict in the test database about 70% of flights affected by cancellations or delays at the Colombian capital airport.Magíster en Analítica para la Inteligencia de NegociosMaestrí

    Robust subgroup discovery

    Get PDF
    We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same time from a global modelling perspective. First, we formulate the broad model class of subgroup lists, i.e., ordered sets of subgroups, for univariate and multivariate targets that can consist of nominal or numeric variables, and that includes traditional top-1 subgroup discovery in its definition. This novel model class allows us to formalise the problem of optimal robust subgroup discovery using the Minimum Description Length (MDL) principle, where we resort to optimal Normalised Maximum Likelihood and Bayesian encodings for nominal and numeric targets, respectively. Second, as finding optimal subgroup lists is NP-hard, we propose SSD++, a greedy heuristic that finds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration, which is shown to be equivalent to a Bayesian one-sample proportions, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis testing penalty. We empirically show on 54 datasets that SSD++ outperforms previous subgroup set discovery methods in terms of quality and subgroup list size.Comment: For associated code, see https://github.com/HMProenca/RuleList ; submitted to Data Mining and Knowledge Discovery Journa
    corecore