487 research outputs found

    Comparative analysis of neural networks techniques to forecast Airfare Prices

    Get PDF
    With the growth of tourism industry, airplanes have became an affordable choice for medium- and long-distance travels. Accurate forecasting of flights tickets helps the aviation industry to match demand, supply flexibly and optimize aviation resources. Airline companies use dynamic pricing strategies to determine the price of airline tickets to maximize profits. Passengers want to purchase tickets at the lowest selling price for the flight of their choice. However, airline tickets are a special commodity that is time-sensitive and scarce, and the price of airline tickets is affected by various factors. Our research work provides a systematic comparison of various traditional machine learning methods (i.e., Ridge Regression, Lasso Regression, K-Nearest Neighbor, Decision Tree, XGBoost, Random Forest) and deep learning methods (e.g., Fully Connected Networks, Convolutional Neural Networks, Transformer) to address the problem of airfare prediction, by keeping the consumers’ needs. Moreover, we proposed innovative Bayesian neural networks, which represent the first exploitation attempt of Bayesian Inference for the airfare prediction task, to the best of our knowledge. Therefore, we evaluate the performance of our implemented and optimized models on an open dataset. The experimental results show that deep learning-based methods achieve better results on average than traditional ones, while Bayesian neural networks can achieve better performance among the other machine learning methods. However, taking into account both prediction performance and computational time, the Random Forest turns out to be the best choice to apply in this scenario

    Improving Intent Classication By Automatic Data Augmentation Using Word Sense Disambiguation

    Get PDF
    abstract: Virtual digital assistants are automated software systems which assist humans by understanding natural languages such as English, either in voice or textual form. In recent times, a lot of digital applications have shifted towards providing a user experience using natural language interface. The change is brought up by the degree of ease with which the virtual digital assistants such as Google Assistant and Amazon Alexa can be integrated into your application. These assistants make use of a Natural Language Understanding (NLU) system which acts as an interface to translate unstructured natural language data into a structured form. Such an NLU system uses an intent finding algorithm which gives a high-level idea or meaning of a user query, termed as intent classification. The intent classification step identifies the action(s) that a user wants the assistant to perform. The intent classification step is followed by an entity recognition step in which the entities in the utterance are identified on which the intended action is performed. This step can be viewed as a sequence labeling task which maps an input word sequence into a corresponding sequence of slot labels. This step is also termed as slot filling. In this thesis, we improve the intent classification and slot filling in the virtual voice agents by automatic data augmentation. Spoken Language Understanding systems face the issue of data sparsity. The reason behind this is that it is hard for a human-created training sample to represent all the patterns in the language. Due to the lack of relevant data, deep learning methods are unable to generalize the Spoken Language Understanding model. This thesis expounds a way to overcome the issue of data sparsity in deep learning approaches on Spoken Language Understanding tasks. Here we have described the limitations in the current intent classifiers and how the proposed algorithm uses existing knowledge bases to overcome those limitations. The method helps in creating a more robust intent classifier and slot filling system.Dissertation/ThesisMasters Thesis Computer Science 201

    Forecasting monthly airline passenger numbers with small datasets using feature engineering and a modified principal component analysis

    Get PDF
    In this study, a machine learning approach based on time series models, different feature engineering, feature extraction, and feature derivation is proposed to improve air passenger forecasting. Different types of datasets were created to extract new features from the core data. An experiment was undertaken with artificial neural networks to test the performance of neurons in the hidden layer, to optimise the dimensions of all layers and to obtain an optimal choice of connection weights – thus the nonlinear optimisation problem could be solved directly. A method of tuning deep learning models using H2O (which is a feature-rich, open source machine learning platform known for its R and Spark integration and its ease of use) is also proposed, where the trained network model is built from samples of selected features from the dataset in order to ensure diversity of the samples and to improve training. A successful application of deep learning requires setting numerous parameters in order to achieve greater model accuracy. The number of hidden layers and the number of neurons, are key parameters in each layer of such a network. Hyper-parameter, grid search, and random hyper-parameter approaches aid in setting these important parameters. Moreover, a new ensemble strategy is suggested that shows potential to optimise parameter settings and hence save more computational resources throughout the tuning process of the models. The main objective, besides improving the performance metric, is to obtain a distribution on some hold-out datasets that resemble the original distribution of the training data. Particular attention is focused on creating a modified version of Principal Component Analysis (PCA) using a different correlation matrix – obtained by a different correlation coefficient based on kinetic energy to derive new features. The data were collected from several airline datasets to build a deep prediction model for forecasting airline passenger numbers. Preliminary experiments show that fine-tuning provides an efficient approach for tuning the ultimate number of hidden layers and the number of neurons in each layer when compared with the grid search method. Similarly, the results show that the modified version of PCA is more effective in data dimension reduction, classes reparability, and classification accuracy than using traditional PCA.</div

    Forecasting flight prices with machine learning models : a comparative analysis between low and high-cost airlines

    Get PDF
    Forecasting fight prices is a challenging task due to the complex nature of the pricing algorithms that airlines use. Apart from the fact that these algorithms are not public, they have to take into account many different variables that affect ticket prices. Since the airlines’ demand forecasting may not always hold true as a result of varying demand, prices need to be adjusted accordingly. This approach is called dynamic pricing. It is a technique of price discrimination based on temporal differences mainly, leading to the widely spread assumption that the time of booking is a crucial determinant of the ticket price. This analysis shows that apart from days to departure, especially fight distance and airline type infuence the price significantly. That is, longer fights as well as fights operated by full-service carriers, as opposed to low-cost carriers, are usually more expensive. This thesis uses a dataset including the fight fares and other fight-related characteristics of one-way fights in the US between April and October 2022, retrieved from the search engine Expedia.com. The data is used to train and compare the performance of several supervised learning models aiming to forecast fight prices. Each model is deployed three times, first with the entire dataset, and then once with data only from low-cost-carrier and only from full-service-carriers, respectively. The most accurate models for all three datasets are the random forests followed by k-nearest-neighbor. The results of this thesis suggest that a large part of the fight price can be predicted using fight-related details such as days to departure and fight duration, yet, it also shows that there remains a certain inexplicable variability that could be due to external factors that are not included in the present analysis.Prever os preços de voo é uma tarefa desafiante devido à natureza complexa dos algoritmos de fixação de preços que as companhias aéreas utilizam habitualmente. Para além da sua natureza privada, estes algoritmos levam em consideração muitas variáveis diferentes que afetam, por essa via, os preços das passagens aéreas. Uma vez que a previsão da procura pelas rotas das companhias aéreas nem sempre se mantém válida devido à sua variabilidade ao longo do tempo, os preços precisam de ser ajustados continuamente de modo a favorecer a rentabilidade dessas companhias. Esta prática designa-se por fixação de preços dinâmica, uma técnica de discriminação de preços baseada principalmente em diferenças temporais, levando à amplamente difundida perceção de que o momento da reserva é o principal determinante do preço das passagem aéreas. A presente análise revela que, para além do número de dias até à data de partida, o tipo de companhia aérea e, sobretudo, a distância de voo também influenciam significativamente o respetivo preço. Assim, voos mais longos e operados por companhias de serviço completo, em oposição às companhias de baixo custo, são geralmente mais caros. A presente tese utilizou uma base de dados incluindo os preços das passagens aéreas e outras características relacionadas com voos de ida nos EUA entre abril e outubro de 2022, obtidas através do motor de busca Expedia.com. Estes dados foram utilizados para treinar e comparar o desempenho de vários modelos de aprendizagem automática supervisionada com o objetivo de prever os preços de voo. Cada modelo foi implementado três vezes, primeiro com a base de dados completa, depois com os registos relativos às companhias de baixo custo e, finalmente, apenas com os dados das companhias de serviço completo. Os modelos mais precisos para os três conjuntos de dados são as florestas aleatória seguidos pelos modelos de K vizinhanças próximas. Os resultados deste trabalho sugerem que uma parte significativa do preço pode ser prevista utilizando detalhes relacionados com o voo, como o número de dias até a partida e a duração da viagem. Contudo, permanece uma certa variabilidade não explicada que pode dever-se a fatores externos não incluídos na presente análise

    Time Series Event Forecasting in Consumer Electronic Markets using Random Forests

    Get PDF
    Consumers are price-sensitive and opportunistic about the place of purchase when buying electronic goods. However, services that advise customers on their purchase time decisions for those products are missing. Given the objective to provide a binary signal to customers to either wait or purchase immediately, classification algorithms are a direct methodological choice. Approaches like random forests allow for the derivation of a probability and class prediction but are usually not used in time series contexts. This is due to missing or time-invariant regressors and unclear prediction settings. We show how classification methods can be used to generate reliable predictions of price events and analyze if they are subject to common market dependencies. Pooling univariate random forests and enhancing them with multivariate features shows that our approach generates stable and valuable recommendations. Because dependency structures between products are transferable, multivariate forecasting increases accuracy and issues recommendations where univariate approaches fail

    The Impact of COVID-19 on Airfares-A Machine Learning Counterfactual Analysis

    Get PDF
    This paper studies the performance of machine learning predictions for the counterfactual analysis of air transport. It is motivated by the dynamic and universally regulated international air transport market, where ex post policy evaluations usually lack counterfactual control scenarios. As an empirical example, this paper studies the impact of the COVID-19 pandemic on airfares in 2020 as the difference between predicted and actual airfares. Airfares are important from a policy makers’ perspective, as air transport is crucial for mobility. From a methodological point of view, airfares are also of particular interest given their dynamic character, which makes them challenging for prediction. This paper adopts a novel multi-step prediction technique with walk-forward validation to increase the transparency of the model’s predictive quality. For the analysis, the universe of worldwide airline bookings is combined with detailed airline information. The results show that machine learning with walk-forward validation is powerful for the counterfactual analysis of airfares

    Multimedia Big Data Analytics and Fusion for Data Science

    Get PDF
    Title from PDF of title page, viewed May 24, 2023Dissertation advisor: Shu-Ching ChenVitaIncludes bibliographical references (pages 178-212)Dissertation (Ph.D.)--Department of Computer Science and Electrical Engineering. University of Missouri--Kansas City, 2023Big data is becoming increasingly prevalent in people's everyday lives due to the enormous quantity of data generated from social and economic activities worldwide. As a result, extensive research has been undertaken to support the big data revolution. However, as data grows in volume, traditional data analytic methods face various challenges—especially when raw data comes in multiple forms and formats. This dissertation proposes a multimodal big data analytics and fusion framework that addresses several challenges in data science for handling and learning from multimodal big data. The proposed framework addresses issues during a standard data science project workflow, including data fusion, spatio-temporal deep feature extraction, and model training optimization strategy. First, a hierarchical graph fusion network is presented to capture the inter-modality correlations among modalities. The network hierarchy models the modality-wise combinations with gradually increased complexity to explore all n-modality interactions. Next, an adaptive spatio-temporal graph network is proposed to capture the hidden patterns from spatio-temporal data. It exploits local and global node correlations by improving the pre-defined graph Laplacian and automatically generates the graph adjacency matrix based on a data-driven method. In addition, a dynamic multi-task learning method is introduced to optimize the model training progress by dynamically adjusting the loss weights assigned to each task. It systematically monitors the sample-level prediction errors, task-level weight parameter changing rate, and iteration-level total loss to adjust the weight balance among tasks. The proposed framework has been evaluated on various datasets, including disaster event videos, social media, traffic flow, and other public datasets.Introduction -- Related work -- Overview of the framework -- Dynamic multi-task learning -- Hierarchical graph fusion -- Spatio-temporal graph network -- Conclusions and future wor
    • …
    corecore