4 research outputs found
Rolling Window Time Series Prediction Using MapReduce
Prediction of time series data is an important application in many domains. Despite their inherent advantages, traditional databases and MapReduce methodology are not ideally suited for this type of processing due to dependencies introduced by the sequential nature of time series. In this thesis a novel framework is presented to facilitate retrieval and rolling window prediction of irregularly sampled large-scale time series data. By introducing a new index pool data structure, processing of time series can be efficiently parallelised. The proposed framework is implemented in R programming environment and utilises Hadoop to support parallelisation and fault tolerance. A systematic multi-predictor selection model is designed and applied, in order to choose the best-fit algorithm for different circumstances. Additionally, the boosting method is deployed as a post-processing to further optimise the predictive results. Experimental results on a cloud-based platform indicate that the proposed framework scales linearly up to 32-nodes, and performs efficiently with a relatively optimised prediction