104 research outputs found

    An Experimental Review on Deep Learning Architectures for Time Series Forecasting

    Get PDF
    In recent years, deep learning techniques have outperformed traditional models in many machine learning tasks. Deep neural networks have successfully been applied to address time series forecasting problems, which is a very important topic in data mining. They have proved to be an effective solution given their capacity to automatically learn the temporal dependencies present in time series. However, selecting the most convenient type of deep neural network and its parametrization is a complex task that requires considerable expertise. Therefore, there is a need for deeper studies on the suitability of all existing architectures for different forecasting tasks. In this work, we face two main challenges: a comprehensive review of the latest works using deep learning for time series forecasting; and an experimental study comparing the performance of the most popular architectures. The comparison involves a thorough analysis of seven types of deep learning models in terms of accuracy and efficiency. We evaluate the rankings and distribution of results obtained with the proposed models under many different architecture configurations and training hyperparameters. The datasets used comprise more than 50000 time series divided into 12 different forecasting problems. By training more than 38000 models on these data, we provide the most extensive deep learning study for time series forecasting. Among all studied models, the results show that long short-term memory (LSTM) and convolutional networks (CNN) are the best alternatives, with LSTMs obtaining the most accurate forecasts. CNNs achieve comparable performance with less variability of results under different parameter configurations, while also being more efficient

    Optimizing forecast model complexity using multi-objective evolutionary algorithms

    Get PDF
    Copyright © 2004 World ScientificWhen inducing a time series forecasting model there has always been the problem of defining a model that is complex enough to describe the process, yet not so complex as to promote data ‘overfitting’ – the so-called bias/variance trade-off. In the sphere of neural network forecast models this is commonly confronted by weight decay regularization, or by combining a complexity penalty term in the optimizing function. The correct degree of regularization, or penalty value, to implement for any particular problem however is difficult, if not impossible, to know a priori. This chapter presents the use of multi-objective optimization techniques, specifically those of an evolutionary nature, as a potential solution to this problem. This is achieved by representing forecast model ‘complexity’ and ‘accuracy’ as two separate objectives to be optimized. In doing this one can obtain problem specific information with regards to the accuracy/complexity trade-off of any particular problem, and, given the shape of the front on a set of validation data, ascertain an appropriate operating point. Examples are provided on a forecasting problem with varying levels of noise

    Predicting U.S. business cycles with recurrent neural networks : An extensive multivariate time-series analysis for comparing LSTM and GRU networks

    Get PDF
    This study examines how 22 different long short-term memory (LSTM) and gated recurrent unit (GRU) network architectures suit predicting U.S. business cycles. The networks create 91-day forecasts for the dependent variable by using multivariate time-series data comprising 26 leading indicators’ values for the previous 400 days. The proposed models are evaluated by using a train-test split, where the proposed models are trained with data from 1980 to 2005, and the out-of-sample set consists of data between 2005 and 2015. The performance is evaluated by using mean squared error (MSE) and mean absolute error (MAE), and early warning signs are also considered beneficial. The training algorithm consists of typical deep learning methods. MSE and L1 regularization are used for determining the cost, and minibatches of 32 examples are applied together with Nesterov accelerated momentum (NAG) learning algorithm. Early stopping is introduced to halt the training process when strong signs of overfitting are detected. Each proposed recurrent neural network (RNN) architecture is trained three times, and these three networks’ averaged predictions are examined when comparing the architectures. Performance-wise, a few LSTM networks stand out from the other proposed networks. Although the performance results favor the proposed LSTM networks slightly over their GRU equivalents, the difference is not substantial and, in turn, the proposed GRU networks offer less deviation in MSE and MAE between each architecture. However, these steadier performance results do not generate less volatile forecasts. Instead, the best performing networks and architectures differentiate by offering less volatile predictions that also vary less from the real values. Most of the models generate a considerable amount of early warning signs before the 2007 recession, which indicates their suitability for detecting turning points in business cycles. Moreover, a wide range of the proposed LSTM and GRU network architectures learn the general pattern, also the smaller architectures comprising only one hidden layer and less than 500 optimizable parameters. This suggests that these methods offer noteworthy solutions for business cycle forecasting and, more widely, supports applying nonlinear machine learning methods with multivariate data for macroeconomic forecasting tasks where prevalent methods have been found unable to deliver adequate accuracy.Tässä tutkielmassa vertaillaan 22 eri LSTM- ja GRU-neuroverkon soveltuvuutta Yhdysvaltojen taloussyklien ennustamiseen. Valittujen neuroverkkojen tehtävä on luoda 91 päivän ennusteita valitulle selitettävälle muuttujalle käyttämällä 400:n aikaisemman päivän havaintoarvoja 26:sta indikaattorista. Valittujen mallien optimoimiseen käytetään havaintoja ajanjaksolta 1980-2005 ja niiden arviointiin ajanjaksoa 2005-2015. Suorituskyvyn arvioimisessa sovelletaan keskineliövirhettä ja keskiabsoluuttistavirhettä. Tämän lisäksi aikaiset signaalit syklin kääntymisestä nähdään suotuisina. Neuroverkkojen parametrien optimoimiseen käytetty algoritmi sisältää tyypillisiä syväoppimisen menetelmiä. Kustannus määritetään käyttämällä keskineliövirhettä ja L1-termiä. NAG-algoritmia käytetään parametriarvojen päivittämiseen, jolle harjoitus instanssit syötetään 32 kappaleen erissä. Optimoiminen keskeytetään ennen takarajaa, mikäli saadaan merkittäviä viitteitä optimoitavan mallin ylisovittumisesta. Jokainen valittu neuroverkkoarkkitehtuuri treenataan kolme kertaa ja näiden kolmen neuroverkon tuottamien ennusteiden keskiarvoja käytetään pohjana eri arkkitehtuurien vertailussa. Suorituskykyä tarkasteltaessa, muutama LSTM-neuroverkko pystyy saavuttamaan muita vaihtoehtoja paremman tarkkuuden. Vaikka suorituskyvystä kertovat tulokset suosivat valittuja LSTM-arkkitehtuureita, erot LSTM- ja GRU-neuroverkkojen suorituskyvyssä ovat keskimäärin pieniä. Toisaalta, GRU-menetelmät pystyvät tarjoamaan vähemmän vaihtelua arkkitehtuurien keskinäisten neuroverkkojen suorituskyvyssä, mutta tämä ei kuitenkaan johda vakaampiin ennusteisiin. Sen sijaan, parhaat suorituskyvyt antavat LSTM-neuroverkot erottautuvat muista tarjoamalla muita vakaampia ennusteita, jotka myös eroavat todellisista arvoista muita vähemmän. Suurin osa tutkituista malleista tuottaa huomattavan määrän signaaleita syklin vaihtumisesta ennen vuonna 2007 alkanutta lamaa. Sekä pienet että suuret neuroverkot selviävät syklin ennustamisesta pääpiirteissään hyvin, minkä takia LSTM- ja GRU-neuroverkkoja voidaan pitää varteenotettavina vaihtoehtoina taloussyklien ennustamisessa. Tämän lisäksi, tulokset kannustavat soveltamaan epälineaarisia koneoppimismenetelmiä yhdessä usean muuttujan aikasarja-aineistojen kanssa sellaisiin makrotalouden ennusteongelmiin, joihin ei aikaisemmin ole löydetty tarpeellista tarkkuutta saavuttavaa ratkaisua

    An Experimental Review on Deep Learning Architectures for Time Series Forecasting

    Get PDF
    In recent years, deep learning techniques have outperformed traditional models in many machine learning tasks. Deep neural networks have successfully been applied to address time series forecasting problems, which is a very important topic in data mining. They have proved to be an effective solution given their capacity to automatically learn the temporal dependencies present in time series. However, selecting the most convenient type of deep neural network and its parametrization is a complex task that requires considerable expertise. Therefore, there is a need for deeper studies on the suitability of all existing architectures for different forecasting tasks. In this work, we face two main challenges: a comprehensive review of the latest works using deep learning for time series forecasting and an experimental study comparing the performance of the most popular architectures. The comparison involves a thorough analysis of seven types of deep learning models in terms of accuracy and efficiency. We evaluate the rankings and distribution of results obtained with the proposed models under many different architecture configurations and training hyperparameters. The datasets used comprise more than 50,000 time series divided into 12 different forecasting problems. By training more than 38,000 models on these data, we provide the most extensive deep learning study for time series forecasting. Among all studied models, the results show that long short-term memory (LSTM) and convolutional networks (CNN) are the best alternatives, with LSTMs obtaining the most accurate forecasts. CNNs achieve comparable performance with less variability of results under different parameter configurations, while also being more efficient.Ministerio de Ciencia, Innovación y Universidades TIN2017-88209-C2Junta de Andalucía US-1263341Junta de Andalucía P18-RT-277

    Neural forecasting: Introduction and literature overview

    Full text link
    Neural network based forecasting methods have become ubiquitous in large-scale industrial forecasting applications over the last years. As the prevalence of neural network based solutions among the best entries in the recent M4 competition shows, the recent popularity of neural forecasting methods is not limited to industry and has also reached academia. This article aims at providing an introduction and an overview of some of the advances that have permitted the resurgence of neural networks in machine learning. Building on these foundations, the article then gives an overview of the recent literature on neural networks for forecasting and applications.Comment: 66 pages, 5 figure

    PROBABILISTIC SHORT TERM SOLAR DRIVER FORECASTING WITH NEURAL NETWORK ENSEMBLES

    Get PDF
    Commonly utilized space weather indices and proxies drive predictive models for thermosphere density, directly impacting objects in low-Earth orbit (LEO) by influencing atmospheric drag forces. A set of solar proxies and indices (drivers), F10.7, S10.7, M10.7, and Y10.7, are created from a mixture of ground based radio observations and satellite instrument data. These solar drivers represent heating in various levels of the thermosphere and are used as inputs by the JB2008 empirical thermosphere density model. The United States Air Force (USAF) operational High Accuracy Satellite Drag Model (HASDM) relies on JB2008, and forecasts of solar drivers made by a linear algorithm, to produce forecasts of density. Density forecasts are useful to the space traffic management community and can be used to determine orbital state and probability of collision for space objects. In this thesis, we aim to provide improved and probabilistic forecasting models for these solar drivers, with a focus on providing first time probabilistic models for S10.7, M10.7, and Y10.7. We introduce auto-regressive methods to forecast solar drivers using neural network ensembles with multi-layer perceptron (MLP) and long-short term memory (LSTM) models in order to improve on the current operational forecasting methods. We investigate input data manipulation methods such as backwards averaging, varied lookback, and PCA rotation for multivariate prediction. We also investigate the differences associated with multi-step and dynamic prediction methods. A novel method for splitting data, referred to as striped sampling, is introduced to produce statistically consistent machine learning data sets. We also investigate the effects of loss function on forecasting performance and uncertainty estimates, as well as investigate novel ensemble weighting methods. We show the best models for univariate forecasting are ensemble approaches using multi step or a combination of multi step and dynamic predictions. Nearly all univariate approaches offer an improvement, with best models improving between 48 and 59% on relative mean squared error (MSE) with respect to persistence, which is used as the baseline model in this work. We show also that a stacked neural network ensemble approach significantly outperforms the operational linear method. When using MV-MLE (multivariate multi-lookback ensemble), we see improvements in performance error metrics over the operational method on all drivers. The multivariate approach also yields an improvement of root mean squared error (RMSE) for F10.7, S10.7, M10.7, and Y10.7 of 17.7%, 12.3%, 13.8%, 13.7% respectively, over the current operational method. We additionally provide the first probabilistic forecasting models for S10.7, M10.7, and Y10.7. Ensemble approaches are leveraged to provide a distribution of predicted values, allowing an investigation into robustness and reliability (R&R) of uncertainty estimates, using the calibration error score (CES) metric and calibration curves. Univariate models provided similar uncertainty estimates as other works, while improving on performance metrics. We also produce probabilistic forecasts using MV-MLE, which are well calibrated for all drivers, providing an average CES of 5.63%

    Transformer-based deep learning model for stock return forecasting : Empirical evidence from US markets in 2012–2021

    Get PDF
    A growing number of studies in recent years have deployed various machine learning methods for financial time series analysis. The ability of machine learning methods to deal with complex and nonlinear data sets, as well as the increasing amount of available data and computational capacity, has pushed research further in this direction. While machine learning methods are nowadays widely used for forecasting financial time series, the results have been mixed. The rapid increase in machine learning research has also meant that new and more advanced models are being developed all the time. In many areas where machine learning methods are employed, designs based on the Transformer deep learning model often represent the state-of-the-art. However, the applications of the Transformer model for financial tasks are still in their infancy as only a few studies have been published on the matter. This study aims to investigate the feasibility of a Transformer-based deep learning model for stock return prediction. The feasibility is tested by predicting the daily directional movements of four different US stock indices on an out-of-sample period from the start of 2012 until the end of 2021. Only historical price data is utilized to predict the directional returns with two sets of explanatory variables. The model performance is tested against benchmarks and evaluated using various performance criteria such as prediction accuracy. Moreover, a trading strategy is carried out to reveal possible profitable attributes of the Transformer-based model. The reported classification accuracy over the whole empirical sample for the better Transformer model is 52.52% while LSTM, another deep learning model used as a benchmark, achieves an accuracy of 53.87%. However, the Transformer model manages to defeat all the benchmark models in every other performance metric. When the performances are tested using the trading strategy, the best Transformer model is able to generate an annualized return of 15.7% before transaction costs. The best performing benchmark, a simple buy-and-hold strategy, yields a return of 14.2%. The two tested Transformer models also have the highest Sharpe ratios out of the tested models at 1.063 and 1.061. Nevertheless, after transaction costs are taken into account, none of the tested models beat a simple buy-and-hold strategy in terms of profitability. Although the Transformer model was not able to perform superiorly throughout the sample period, it nevertheless exhibited increased predictive performance over shorter periods. For example, the model seemed to exploit periods of higher volatility as seen during the start of the COVID-19 pandemic. Overall, although the predictive performance of the Transformer model in this study might leave more to be desired, the model undoubtedly has predictive properties which should encourage further research to be executed.Viime vuosina lisääntynyt määrä tutkimuksia on soveltanut koneoppimismenetelmiä rahoituksen aikasarja-analyyseissä. Koneoppimismenetelmien kyky käsitellä monimutkaisia ja epälineaarisia data-aineistoja, sekä lisääntynyt datan määrä ja laskentakapasiteetti ovat entisestään vauhdittaneet tutkimusta tällä alueella. Vaikka koneoppimismenetelmiä käytetään nykyisin laajalti rahoituksen aikasarjojen ennustamiseen, ovat niiden tuottamat tulokset olleet vaihtelevia. Koneoppimistutkimuksen nopea kasvu on myös tarkoittanut, että uusia ja kehittyneempiä malleja kehitetään kaiken aikaa. Monilla aloilla, joissa koneoppimista käytetään, alan johtavat mallit pohjautuvat usein Transformer-syväoppimismalliin. Transformer-pohjaisten mallien soveltaminen rahoituksen tehtäviin on kuitenkin vielä varhaisessa vaiheessa, sillä alalla on julkaistu vain muutamia tutkimuksia aiheesta. Tämä tutkielma pyrkii selvittämään Transformer-pohjaisen mallin soveltuvuutta osaketuottojen ennustamiseen. Soveltuvuutta testataan ennustamalla neljän eri yhdysvaltalaisen osakeindeksin päivittäisiä suunnanmuutoksia vuoden 2012 alusta vuoden 2021 loppuun. Tuottojen suunnan ennustamisessa hyödynnetään vain historiallista hintadataa kahdella joukolla muuttujia. Mallin suorituskykyä testataan ja verrataan muihin käytettyihin malleihin monin eri suorituskykymittarein, kuten esimerkiksi ennustustarkkuuden avulla. Lisäksi toteutetaan kaupankäyntistrategia, jotta nähtäisiin mallin tuottamien ennusteiden mahdollinen taloudellinen hyöty. Raportoitu ennustetarkkuus koko tutkimusotoksen ajalta oli paremmalla Transformer-mallilla 52,52%, kun sen sijaan vertailumallina käytetty LSTM-syväoppimismalli saavutti 53,87%:n ennustetarkkuuden. Kyseinen Transformer-malli onnistui kuitenkin suoriutumaan paremmin kuin vertailumallit kaikkien muiden suoritusmittareiden osalla. Kun mallien suoriutumista vertaillaan kaupankäyntistrategialla, paras Transformer-malli saavuttaa 15,7%:n vuosittaisen tuoton ennen kaupankäyntikustannuksia. Paras vertailukohta, yksinkertainen osta-ja-pidä-strategia tuottaa 14,2%:n tuoton. Kahdella testatulla Transformer-mallilla on myös korkeimmat Sharpen luvut: 1,063 ja 1,061. Kuitenkin, kun kaupankäyntikulut huomioidaan, yksikään testatuista malleista ei suoriudu osta-ja-pidä-strategiaa paremmin tuottojen osalta. Vaikka Transformer-malli ei pystynyt suoriutumaan selvästi parhaiten läpi koko tutkimusotoksen, se esitti kasvanutta suorituskykyä lyhempinä aikoina. Malli näytti pystyvän esimerkiksi hyödyntämään korkean volatiliteetin ajanjaksoja, kuten COVID-19-pandemian alkuaikaa. Kaiken kaikkiaan, vaikka Transformer-mallin ennustuskyky tässä tutkielmassa saattaa jättää toivomisen varaa, Transformer-malli on epäilemättä kykeneväinen ennustustehtävissä, minkä tulisi edistää lisätutkimusten tekemistä aiheesta

    The Forecast of Exchange Rates using Artificial Neural Networks, and their comparison to classic models

    Get PDF
    2014 dissertation for MSc in Financial Management. Selected by academic staff as a good example of a masters level dissertation. Predicting Foreign Exchange rates has forever been a task of great importance to any individual, business or organization having to deal with a foreign currency. In the wake of a world where global transactions are an everyday activity, readiness and skill when dealing with the forecasting of international monetary movements is a key factor in the success of any operation; be it that of an individual investor, or that of multi-national index listed company. The motivation behind the desire of conquering the skill of forecasting may range from the simple desire to hedge one‟s investments and dealings in a foreign currency, to that of a speculative investor, looking for arbitrage opportunities in trading foreign exchange markets. This paper had for motivation to test and compare various models in their ability to forecast the return generated by price movements of three globally available and traded currencies; notable the Euro – US Dollar, the Euro-Swiss Franc and the Pound Sterling – US Dollar. Recent studies have been showing great promise in the use of Artificial Neural Networks in the field of forecasting exchange traded assets and currencies; which is why this paper has discussed the performance of 4 Learning Machine models in comparison to 3 base models and 2 linear models. The learning machine models being studied are the Multi-Layer Perceptron, the Higher Order Neural Network, Gene Expression and Rolling Genetic-Support Vector Regression. These models were compared using various methods of statistical evaluation, in order to measure the discrepancy of the forecasted values from the actual values, as well as the annualized return and the risk to return ratio. It was concluded that modern forecasting technique do outweigh the classic base and linear models in terms of forecasting accuracy as well as potential gain and risk to return

    Stock market prediction with long short-term memory neural networks : Empirical study on Finnish stock market 1999–2020

    Get PDF
    In recent years, advanced machine learning techniques have outperformed previous benchmarks in multiple disciplines, and these methods have also been increasingly applied to stock market prediction tasks. This research aims to fill the research gap in advanced machine learning applications on Finnish stock market by applying long short-term memory (LSTM) neural networks on a stock return movement prediction task for the period between 1999–2020. The performance of the LSTM network is benchmarked against a conventional recurrent neural network and a logistic regression classifier. Using two alternative sets of input features, the models are trained to produce weekly out-of-sample predictions on stock return movements between 2006 and 2020. Furthermore, these predictions are utilized to derive prediction-based investment portfolios. The best-performing multivariate LSTM model yields an annual return of 12.7% and delivers a Sharpe ratio of 0.459 before transaction costs, while a simple buy-and-hold portfolio achieved an annual return of 8.6% and a Sharpe ratio of 0.338 during the same period. The relative edge of the LSTM-based portfolios holds after transaction costs are considered, but a subperiod analysis reveals that the outperformance is not that eminent during the latter half of the sample. By unveiling some common characteristics among the stocks selected for trading, the LSTMs are found to independently extract similar patterns to well-known capital market anomalies of short-term mean reversion and momentum. However, the high-level performance of LSTM models cannot be comprehensively explained by these abovementioned effects. The results indicate that the stock returns are partially driven by long-term signals, and that the LSTMs can independently extract this type of subtle information from noisy stock market data. Despite being relatively complex and having high computational costs, LSTM networks are shown to be suitable methods for stock return movement prediction tasks. Even though the theoretical performance might not fully materialize if the trading strategy is implemented in practice, LSTMs certainly have predictive properties that make them useful tools and complements for different investment purposes.Viime vuosina koneoppimisen sovellutukset ovat osoittautuneet tehokkaiksi menetelmiksi useilla eri aloilla, ja näitä kehittyneitä menetelmiä on sovellettu yhä enemmän myös osakemarkkinoiden ennustamiseen. Tässä tutkimuksessa sovelletaan LSTM-neuroverkkoja osaketuottojen liikkeiden ennustamiseen Suomen osakemarkkinoilla vuosien 1999–2020 aikana. LSTM-neuroverkon suorituskykyä verrataan tavanomaiseen takaisinkytkeytyvään neuroverkkoon sekä logistiseen regressiomalliin. Kahdenlaisten eri syötteiden avulla mallit koulutetaan ennustamaan osaketuottojen liikkeitä vuosina 2006–2020, ja näiden ennusteiden pohjalta rakennetaan yksinkertaistettu kaupankäyntistrategia. Parhaiten suoriutuva, useaa syötemuuttujaa hyödyntävä LSTM-neuroverkko yltää 12,7%: n vuosittaiseen tuottoon ja saavuttaa Sharpe-suhdeluvun 0,459 ennen transaktiokustannuksia, kun taas yksinkertainen osta ja pidä -salkku saavuttaa vuotuisen tuoton 8,6% ja Sharpe-suhdeluvun 0,338 samalla tarkastelujaksolla. LSTM-pohjaisten salkkujen suhteellinen etu säilyy transaktiokustannusten huomioon ottamisen jälkeenkin, mutta osittaisperiodikohtainen analyysi paljastaa, että suorituskyky ei ole merkittävästi parempi enää tarkastelujakson loppupuoliskolla. Tutkimalla LSTM-neuroverkon kaupankäyntiin poimimien osakkeiden yhteisiä piirteitä, havaitaan mallin hyödyntävän samanlaisia kaavoja kuin tunnettuihin markkina-anomalioihin perustuvat strategiat. LSTM-mallien korkean tason suorituskykyä ei kuitenkaan voida selittää kattavasti ainoastaan momentum-teorian tai keskiarvoon palautumisen avulla. Tulokset osoittavat, että osaketuotot ovat osittain pitkäkestoisten signaalien ohjaamia, ja että LSTM-neuroverkot kykenevät itsenäisesti poimimaan tämänkaltaisia signaaleja paljon kohinaa sisältävästä markkinadatasta. Huolimatta LSTM-neuroverkkojen monimutkaisuudesta ja niiden laskennallisista kustannuksista, tulokset osittavat niiden olevan hyödyllisiä menetelmiä osaketuottojen liikkeiden ennustamiseen. Vaikka teoreettinen suorituskyky ei täysimääräisesti olisi siirrettävissä käytäntöön, LSTM-neuroverkot tarjoavat kuitenkin suotavia ominaisuuksia, jotka tekevät niistä hyödyllisiä työkaluja käytettäväksi erilaisissa sijoitustarkoituksissa
    corecore