574 research outputs found

    Concept and benchmark results for Big Data energy forecasting based on Apache Spark

    Get PDF
    The present article describes a concept for the creation and application of energy forecasting models in a distributed environment. Additionally, a benchmark comparing the time required for the training and application of data-driven forecasting models on a single computer and a computing cluster is presented. This comparison is based on a simulated dataset and both R and Apache Spark are used. Furthermore, the obtained results show certain points in which the utilization of distributed computing based on Spark may be advantageous

    Knowledge management system for big data in a smart electricity grid context

    Get PDF
    We have been witnessing a real explosion of information, due in large part to the development in Information and Knowledge Technologies (ICTs). As information is the raw material for the discovery of knowledge, there has been a rapid growth, both in the scientific community and in ICT itself, in the study of the Big Data phenomenon (Kaisler et al., 2014). The concept of Smart Grids (SG) has emerged as a way of rethinking how to produce and consume energy imposed by economic, political and ecological issues (Lund, 2014). To become a reality, SGs must be supported by intelligent and autonomous IT systems to make the right decisions in real time. Knowledge needed for real-time decision-making can only be achieved if SGs are equipped with systems capable of efficiently managing all the surrounding information. Thus, this paper proposes a system for the management of information in the context of SG to enable the monitoring, in real time, of the events that occur in the ecosystem and to predict following events.This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 641794 (project DREAM-GO) and from FEDER Funds through COMPETE program and from National Funds through FCT under the project UID/EEA/00760/2013.info:eu-repo/semantics/publishedVersio

    Facilitating and Enhancing the Performance of Model Selection for Energy Time Series Forecasting in Cluster Computing Environments

    Get PDF
    Applying Machine Learning (ML) manually to a given problem setting is a tedious and time-consuming process which brings many challenges with it, especially in the context of Big Data. In such a context, gaining insightful information, finding patterns, and extracting knowledge from large datasets are quite complex tasks. Additionally, the configurations of the underlying Big Data infrastructure introduce more complexity for configuring and running ML tasks. With the growing interest in ML the last few years, particularly people without extensive ML expertise have a high demand for frameworks assisting people in applying the right ML algorithm to their problem setting. This is especially true in the field of smart energy system applications where more and more ML algorithms are used e.g. for time series forecasting. Generally, two groups of non-expert users are distinguished to perform energy time series forecasting. The first one includes the users who are familiar with statistics and ML but are not able to write the necessary programming code for training and evaluating ML models using the well-known trial-and-error approach. Such an approach is time consuming and wastes resources for constructing multiple models. The second group is even more inexperienced in programming and not knowledgeable in statistics and ML but wants to apply given ML solutions to their problem settings. The goal of this thesis is to scientifically explore, in the context of more concrete use cases in the energy domain, how such non-expert users can be optimally supported in creating and performing ML tasks in practice on cluster computing environments. To support the first group of non-expert users, an easy-to-use modular extendable microservice-based ML solution for instrumenting and evaluating ML algorithms on top of a Big Data technology stack is conceptualized and evaluated. Our proposed solution facilitates applying trial-and-error approach by hiding the low level complexities from the users and introduces the best conditions to efficiently perform ML tasks in cluster computing environments. To support the second group of non-expert users, the first solution is extended to realize meta learning approaches for automated model selection. We evaluate how meta learning technology can be efficiently applied to the problem space of data analytics for smart energy systems to assist energy system experts which are not data analytics experts in applying the right ML algorithms to their data analytics problems. To enhance the predictive performance of meta learning, an efficient characterization of energy time series datasets is required. To this end, Descriptive Statistics Time based Meta Features (DSTMF), a new kind of meta features, is designed to accurately capture the deep characteristics of energy time series datasets. We find that DSTMF outperforms the other state-of-the-art meta feature sets introduced in the literature to characterize energy time series datasets in terms of the accuracy of meta learning models and the time needed to extract them. Further enhancement in the predictive performance of the meta learning classification model is achieved by training the meta learner on new efficient meta examples. To this end, we proposed two new approaches to generate new energy time series datasets to be used as training meta examples by the meta learner depending on the type of time series dataset (i.e. generation or energy consumption time series). We find that extending the original training sets with new meta examples generated by our approaches outperformed the case in which the original is extended by new simulated energy time series datasets

    Scalability Benchmarking of Cloud-Native Applications Applied to Event-Driven Microservices

    Get PDF
    Cloud-native applications constitute a recent trend for designing large-scale software systems. This thesis introduces the Theodolite benchmarking method, allowing researchers and practitioners to conduct empirical scalability evaluations of cloud-native applications, their frameworks, configurations, and deployments. The benchmarking method is applied to event-driven microservices, a specific type of cloud-native applications that employ distributed stream processing frameworks to scale with massive data volumes. Extensive experimental evaluations benchmark and compare the scalability of various stream processing frameworks under different configurations and deployments, including different public and private cloud environments. These experiments show that the presented benchmarking method provides statistically sound results in an adequate amount of time. In addition, three case studies demonstrate that the Theodolite benchmarking method can be applied to a wide range of applications beyond stream processing

    Benchmarking Big Data Technologies for Energy Procurement Efficiency

    Get PDF
    The electrical power industry is undergoing radical change due to the push for renewable energy that makes energy supply less predictable. Smart meters along with analytics software can grant insights into customer-specific consumption and thereby enable a better match between the demand and supply side for an electric utility. However, the vast amount of allocatable smart metering data and complexity of analytics pose challenges to database system. We address the implementation of an analytics ap-proach to optimize customer portfolios, eventually preventing excess energy procurement. Using real-world and simulated data, we test the suitability of big data approaches as well as traditional relational database technology. Furthermore, we present solutions based on big data platforms and demonstrate their cost effectiveness and performance. Our findings suggest economic feasibility of big data solutions for large utilities. Small and medium-sized utilities are advised to invest in more cost-effective solutions such as cluster-based systems

    Nearest Neighbors-Based Forecasting for Electricity Demand Time Series in Streaming

    Get PDF
    This paper presents a new forecasting algorithm for time series in streaming named StreamWNN. The methodology has two well-differentiated stages: the algorithm searches for the nearest neighbors to generate an initial prediction model in the batch phase. Then, an online phase is carried out when the time series arrives in streaming. In par-ticular, the nearest neighbor of the streaming data from the training set is computed and the nearest neighbors, previously computed in the batch phase, of this nearest neighbor are used to obtain the predictions. Results using the electricity consumption time series are reported, show-ing a remarkable performance of the proposed algorithm in terms of fore-casting errors when compared to a nearest neighbors-based benchmark algorithm. The running times for the predictions are also remarkableMinisterio de Ciencia, Innovación y Universidades TIN2017-88209-C

    Large scale data analysis using MLlib

    Get PDF
    Recent advancements in the internet, social media, and internet of things (IoT) devices have significantly increased the amount of data generated in a variety of formats. The data must be converted into formats that is easily handled by the data analysis techniques. It is mathematically and physically expensive to apply machine learning algorithms to big and complicated data sets. It is a resource-intensive process that necessitates a huge amount of logical and physical resources. Machine learning is a sophisticated data analytics technology that has gained in importance as a result of the massive amount of data generated daily that needs to be examined. Apache Spark machine learning library (MLlib) is one of the big data analysis platforms that provides a variety of outstanding functions for various machine learning tasks, spanning from classification to regression and dimension reduction. From a computational standpoint, this research investigated Apache Spark MLlib 2.0 as an open source, autonomous, scalable, and distributed learning library. Several real-world machine learning experiments are carried out in order to evaluate the properties of the platform on a qualitative and quantitative level. Some of the fundamental concepts and approaches for developing a scalable data model in a distributed environment are also discussed
    corecore