5 research outputs found
A Deep Feedforward Neural Network and Shallow Architectures Effectiveness Comparison: Flight Delays Classification Perspective
Flight delays have negatively impacted the socio-economics state of passengers, airlines and airports, resulting in huge economic losses. Hence, it has become necessary to correctly predict their occurrences in decision-making because it is important for the effective management of the aviation industry. Developing accurate flight delays classification models depends mostly on the air transportation system complexity and the infrastructure available in airports, which may be a region-specific issue. However, no specific prediction or classification model can handle the individual characteristics of all airlines and airports at the same time. Hence, the need to further develop and compare predictive models for the aviation decision system of the future cannot be over-emphasised. In this research, flight on-time data records from the United State Bureau of Transportation Statistics was employed to evaluate the performances of Deep Feedforward Neural Network, Neural Network, and Support Vector Machine models on a binary classification problem. The research revealed that the models achieved different accuracies of flight delay classifications. The Support Vector Machine had the worst average accuracy than Neural Network and Deep Feedforward Neural Network in the initial experiment. The Deep Feedforward Neural Network outperformed Support Vector Machines and Neural Network with the best average percentage accuracies. Going further to investigate the Deep Feedforward Neural Network architecture on different parameters against itself suggest that training a Deep Feedforward Neural Network algorithm, regardless of data training size, the classification accuracy peaks. We examine which number of epochs works best in our flight delay classification settings for the Deep Feedforward Neural Network. Our experiment results demonstrate that having many epochs affects the convergence rate of the model; unlike when hidden layers are increased, it does not ensure better or higher accuracy in a binary classification of flight delays. Finally, we recommended further studies on the applicability of the Deep Feedforward Neural Network in flight delays prediction with specific case studies of either airlines or airports to check the impact on the model's performance
A deep feedforward neural network and shallow architectures effectiveness comparison: Flight delays classification perspective
Flight delays have negatively impacted the socio-economics state of passengers, airlines and airports, resulting in huge economic losses. Hence, it has become necessary to correctly predict their occurrences in decision-making because it is important for the effective management of the aviation industry. Developing accurate flight delays classification models depends mostly on the air transportation system complexity and the infrastructure available in airports, which may be a region-specific issue. However, no specific prediction or classification model can handle the individual characteristics of all airlines and airports at the same time. Hence, the need to further develop and compare predictive models for the aviation decision system of the future cannot be over-emphasised. In this research, flight on-time data records from the United State Bureau of Transportation Statistics was employed to evaluate the performances of Deep Feedforward Neural Network, Neural Network, and Support Vector Machine models on a binary classification problem. The research revealed that the models achieved different accuracies of flight delay classifications. The Support Vector Machine had the worst average accuracy than Neural Network and Deep Feedforward Neural Network in the initial experiment. The Deep Feedforward Neural Network outperformed Support Vector Machines and Neural Network with the best average percentage accuracies. Going further to investigate the Deep Feedforward Neural Network architecture on different parameters against itself suggest that training a Deep Feedforward Neural Network algorithm, regardless of data training size, the classification accuracy peaks. We examine which number of epochs works best in our flight delay classification settings for the Deep Feedforward Neural Network. Our experiment results demonstrate that having many epochs affects the convergence rate of the model; unlike when hidden layers are increased, it does not ensure better or higher accuracy in a binary classification of flight delays. Finally, we recommended further studies on the applicability of the Deep Feedforward Neural Network in flight delays prediction with specific case studies of either airlines or airports to check the impact on the model’s performance
Modelo para identificar los vuelos afectados por retrasos o cancelaciones en el aeropuerto El Dorado de Bogotá, Colombia
Este trabajo está basado en el análisis de factores climáticos y operacionales de las aerolÃneas con operación en Colombia. El factor operacional contiene el detalle de los vuelos que tienen lugar en los aeropuertos del paÃs con variables como origen, destino, número de vuelo, aerolÃnea, fecha y hora programada, fecha y hora de remolque, estado del vuelo (adelantado, cumplido, retrasado y cancelado), cantidad de pasajeros, cantidad de carga, distancia y tiempo de vuelo entre otras. Por el gran peso e importancia que tiene el Aeropuerto El Dorado de Bogotá, el análisis y modelo resultado de este trabajo se centró en la operación y factores climáticos que tienen incidencia en este terminal aéreo.
Por medio de técnicas como regresión logÃstica, redes neuronales y XGboosting se logró predecir en la base de datos de pruebas cerca del 70% de los vuelos afectados por cancelaciones o retrasos en el aeropuerto de la capital colombiana.This work is based on the analysis of weather and operational factors of the airlines operating in Colombia. The operational factor contains the detail of the flights that take place in the country's airports with variables such as origin, destination, flight number, airline, scheduled date and time, towing date and time, flight status (early, on time, delayed and canceled), number of passengers, amount of cargo, distance and flight time among others. Due to the great weight and importance of the El Dorado Airport of Bogotá, the analysis and model resulting from this work focused on the operation and weather factors that have an impact on this Airport.
Using techniques such as logistic regression, neural networks and XGboosting, it was possible to predict in the test database about 70% of flights affected by cancellations or delays at the Colombian capital airport.MagÃster en AnalÃtica para la Inteligencia de NegociosMaestrÃ
Robust subgroup discovery
We introduce the problem of robust subgroup discovery, i.e., finding a set of
interpretable descriptions of subsets that 1) stand out with respect to one or
more target attributes, 2) are statistically robust, and 3) non-redundant. Many
attempts have been made to mine either locally robust subgroups or to tackle
the pattern explosion, but we are the first to address both challenges at the
same time from a global modelling perspective. First, we formulate the broad
model class of subgroup lists, i.e., ordered sets of subgroups, for univariate
and multivariate targets that can consist of nominal or numeric variables, and
that includes traditional top-1 subgroup discovery in its definition. This
novel model class allows us to formalise the problem of optimal robust subgroup
discovery using the Minimum Description Length (MDL) principle, where we resort
to optimal Normalised Maximum Likelihood and Bayesian encodings for nominal and
numeric targets, respectively. Second, as finding optimal subgroup lists is
NP-hard, we propose SSD++, a greedy heuristic that finds good subgroup lists
and guarantees that the most significant subgroup found according to the MDL
criterion is added in each iteration, which is shown to be equivalent to a
Bayesian one-sample proportions, multinomial, or t-test between the subgroup
and dataset marginal target distributions plus a multiple hypothesis testing
penalty. We empirically show on 54 datasets that SSD++ outperforms previous
subgroup set discovery methods in terms of quality and subgroup list size.Comment: For associated code, see https://github.com/HMProenca/RuleList ;
submitted to Data Mining and Knowledge Discovery Journa