486 research outputs found

    Machine learning techniques for predicting the stock market using daily market variables

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligencePredicting the stock market was never seen as an easy task. The complexity of the financial systems makes it extremely difficult for anything or anyone to predict what the future of prices holds, let it be a day, a week, a month or even a year. Many variables influence the market’s volatility and some of these may even be the gut feeling of an investor on a specific day. Several machine learning techniques were already applied to forecast multiple stock market indexes, some presenting good values of accuracy when it comes to predict whether the prices will go up or down, and low values of error when dealing with regression data. This work aims to apply some state-of-the-art algorithms and compare their performance with Long Short-term Memory (LSTM) as well as between each other. The variables used to this empirical work were the prices of the Dow Jones Industrial Average (DJIA) registered for every business day, from January 1st of 2006 to January 1st of 2018, for 29 companies. Some changes and adjustments were made to the original variables to present different data types to the algorithms. To ensure good quality and certainty when evaluating the flexibility and stability of each model, the error measure used was the Root Mean Squared Error and the Mann-Whitney U test was also applied to assess statistical significance of the results obtained.Prever a bolsa nunca foi considerado ser uma tarefa fácil. A complexidade dos sistemas financeiros torna extremamente difícil que um ser humano ou uma máquina consigam prever o que o futuro dos preços reserva, seja para um dia, uma semana, um mês ou um ano. Muitas variáveis influenciam a volatilidade do mercado e algumas podem até ser a confiança de um investidor em apostar em determinada empresa, naquele dia específico. Várias técnicas de aprendizagem automática foram aplicadas ao longo do tempo para prever vários índices de bolsas, algumas apresentando bons valores de precisão quando se tratou de prever se os preços subiam ou desciam e outras, baixos valores de erro ao lidar com dados de regressão. Este trabalho tem como objetivo aplicar alguns dos mais conhecidos algoritmos e comparar os seus desempenhos com o Long Short-Term Memory (LSTM), e entre si. As variáveis utilizadas para a elaboração deste trabalho empírico foram os preços da Dow Jones Industrial Average (DJIA) registados para todos os dias úteis, de 1 de Janeiro de 2006 a 1 de Janeiro de 2018, para 29 empresas. Algumas alterações e ajustes foram efetuados sobre as variáveis originais de forma a construír diferentes tipos de dados para posteriormente dar aos algoritmos. Para garantir boa qualidade e veracidade ao avaliar a flexibilidade e estabilidade de cada modelo, a medida de erro utilizada foi o erro médio quadrático da raíz e, de seguida, o teste U de Mann-Whitney foi aplicado para avaliar a significância estatística dos resultados obtidos

    Forecasting photovoltaic power generation with a stacking ensemble model

    Get PDF
    Nowadays, photovoltaics (PV) has gained popularity among other renewable energy sources because of its excellent features. However, the instability of the system’s output has become a critical problem due to the high PV penetration into the existing distribution system. Hence, it is essential to have an accurate PV power output forecast to integrate more PV systems into the grid and to facilitate energy management further. In this regard, this paper proposes a stacked ensemble algorithm (Stack-ETR) to forecast PV output power one day ahead, utilizing three machine learning (ML) algorithms, namely, random forest regressor (RFR), extreme gradient boosting (XGBoost), and adaptive boosting (AdaBoost), as base models. In addition, an extra trees regressor (ETR) was used as a meta learner to integrate the predictions from the base models to improve the accuracy of the PV power output forecast. The proposed model was validated on three practical PV systems utilizing four years of meteorological data to provide a comprehensive evaluation. The performance of the proposed model was compared with other ensemble models, where RMSE and MAE are considered the performance metrics. The proposed Stack-ETR model surpassed the other models and reduced the RMSE by 24.49%, 40.2%, and 27.95% and MAE by 28.88%, 47.2%, and 40.88% compared to the base model ETR for thin-film (TF), monocrystalline (MC), and polycrystalline (PC) PV systems, respectively

    Structure Optimization of Ensemble Learning Methods and Seasonal Decomposition Approaches to Energy Price Forecasting in Latin America: A Case Study about Mexico

    Get PDF
    The energy price influences the interest in investment, which leads to economic development. An estimate of the future energy price can support the planning of industrial expansions and provide information to avoid times of recession. This paper evaluates adaptive boosting (AdaBoost), bootstrap aggregation (bagging), gradient boosting, histogram-based gradient boosting, and random forest ensemble learning models for forecasting energy prices in Latin America, especially in a case study about Mexico. Seasonal decomposition of the time series is used to reduce unrepresentative variations. The Optuna using tree-structured Parzen estimator, optimizes the structure of the ensembles through a voter by combining several ensemble frameworks; thus an optimized hybrid ensemble learning method is proposed. The results show that the proposed method has a higher performance than the state-of-the-art ensemble learning methods, with a mean squared error of 3.37E−9 in the testing phase

    Comparing the Performance of Deep Learning Methods to Predict Companies' Financial Failure

    Get PDF
    This work was supported in part by the Ministerio de Ciencia, Innovacion y Universidades under Project RTI2018-102002-A-I00, in part by the Ministerio de Economia y Competitividad under Project TIN2017-85727-C4-2-P and Project PID2020-115570GB-C22, in part by the Fondo Europeo de Desarrollo Regional (FEDER) and Junta de Andalucia under Project B-TIC-402-UGR18, and in part by the Junta de Andalucia under Project P18-RT-4830.One of the most crucial problems in the eld of business is nancial forecasting. Many companies are interested in forecasting their incoming nancial status in order to adapt to the current nancial and business environment to avoid bankruptcy. In this work, due to the effectiveness of Deep Learning methods with respect to classi cation tasks, we compare the performance of three well-known Deep Learning methods (Long-Short Term Memory, Deep Belief Network and Multilayer Perceptron model of 6 layers) with three bagging ensemble classi ers (Random Forest, Support Vector Machine and K-Nearest Neighbor) and two boosting ensemble classi ers (Adaptive Boosting and Extreme Gradient Boosting) in companies' nancial failure prediction. Because of the inherent nature of the problem addressed, three extremely imbalanced datasets of Spanish, Taiwanese and Polish companies' data have been considered in this study. Thus, ve oversampling balancing techniques, two hybrid balancing techniques (oversamplingundersampling) and one clustering-based balancing technique have been applied to avoid data inconsistency problem. Considering the real nancial data complexity level and type, the results show that the Multilayer Perceptron model of 6 layers, in conjunction with SMOTE-ENN balancing method, yielded the best performance according to the accuracy, recall and type II error metrics. In addition, Long-Short Term Memory and ensemble methods obtained also very good results, outperforming several classi ers used in previous studies with the same datasets.Ministerio de Ciencia, Innovacion y Universidades RTI2018-102002-A-I00Spanish Government TIN2017-85727-C4-2-P PID2020-115570GB-C22European Commission B-TIC-402-UGR18Junta de Andalucia B-TIC-402-UGR18 P18-RT-483

    Air Quality Prediction in Smart Cities Using Machine Learning Technologies Based on Sensor Data: A Review

    Get PDF
    The influence of machine learning technologies is rapidly increasing and penetrating almost in every field, and air pollution prediction is not being excluded from those fields. This paper covers the revision of the studies related to air pollution prediction using machine learning algorithms based on sensor data in the context of smart cities. Using the most popular databases and executing the corresponding filtration, the most relevant papers were selected. After thorough reviewing those papers, the main features were extracted, which served as a base to link and compare them to each other. As a result, we can conclude that: (1) instead of using simple machine learning techniques, currently, the authors apply advanced and sophisticated techniques, (2) China was the leading country in terms of a case study, (3) Particulate matter with diameter equal to 2.5 micrometers was the main prediction target, (4) in 41% of the publications the authors carried out the prediction for the next day, (5) 66% of the studies used data had an hourly rate, (6) 49% of the papers used open data and since 2016 it had a tendency to increase, and (7) for efficient air quality prediction it is important to consider the external factors such as weather conditions, spatial characteristics, and temporal features

    Data-augmented sequential deep learning for wind power forecasting

    Get PDF
    Accurate wind power forecasting plays a critical role in the operation of wind parks and the dispatch of wind energy into the power grid. With excellent automatic pattern recognition and nonlinear mapping ability for big data, deep learning is increasingly employed in wind power forecasting. However, salient realities are that in-situ measured wind data are relatively expensive and inaccessible and correlation between steps is omitted in most multistep wind power forecasts. This paper is the first time that data augmentation is applied to wind power forecasting by systematically summarizing and proposing both physics-oriented and data-oriented time-series wind data augmentation approaches to considerably enlarge primary datasets, and develops deep encoder-decoder long short-term memory networks that enable sequential input and sequential output for wind power forecasting. The proposed augmentation techniques and forecasting algorithm are deployed on five turbines with diverse topographies in an Arctic wind park, and the outcomes are evaluated against benchmark models and different augmentations. The main findings reveal that on one side, the average improvement in RMSE of the proposed forecasting model over the benchmarks is 33.89%, 10.60%, 7.12%, and 4.27% before data augmentations, and increases to 40.63%, 17.67%, 11.74%, and 7.06%, respectively, after augmentations. The other side unveils that the effect of data augmentations on prediction is intricately varying, but for the proposed model with and without augmentations, all augmentation approaches boost the model outperformance from 7.87% to 13.36% in RMSE, 5.24% to 8.97% in MAE, and similarly over 12% in QR90. Finally, data-oriented augmentations, in general, are slightly better than physics-driven ones

    GA-Optimized Multivariate CNN-LSTM Model for Predicting Multi-channel Mobility in the COVID-19 Pandemic

    Get PDF
    The primary factor that contributes to the transmission of COVID-19 infection is human mobility. Positive instances added on a daily basis have a substantial positive association with the pace of human mobility, and the reverse is true. Thus, having the ability to predict human mobility trend during a pandemic is critical for policymakers to help in decreasing the rate of transmission in the future. In this regard, one approach that is commonly used for time-series data prediction is to build an ensemble with the aim of getting the best performance. However, building an ensemble often causes the performance of the model to decrease, due to the increasing number of parameters that are not being optimized properly. Consequently, the purpose of this study is to develop and evaluate a deep learning ensemble model, which is optimized using a genetic algorithm (GA) that incorporates a convolutional neural network (CNN) and a long short-term memory (LSTM). A CNN is used to conduct feature extraction from mobility time-series data, while an LSTM is used to do mobility prediction. The parameters of both layers are adjusted using GA. As a result of the experiments conducted with data from the Google Community Mobility Reports in Indonesia that ranges from the beginning of February 2020 to the end of December 2020, the GA-Optimized Multivariate CNN-LSTM ensemble outperforms stand-alone CNN and LSTM models, as well as the non-optimized CNN-LSTM model, in terms of predicting human movement in the future. This may be useful in assisting policymakers in anticipating future human mobility trends. Doi: 10.28991/esj-2021-01300 Full Text: PD
    corecore