701 research outputs found

    Machine Learning Applications for Load Predictions in Electrical Energy Network

    Get PDF
    In this work collected operational data of typical urban and rural energy network are analysed for predictions of energy consumption, as well as for selected region of Nordpool electricity markets. The regression techniques are systematically investigated for electrical energy prediction and correlating other impacting parameters. The k-Nearest Neighbour (kNN), Random Forest (RF) and Linear Regression (LR) are analysed and evaluated both by using continuous and vertical time approach. It is observed that for 30 minutes predictions the RF Regression has the best results, shown by a mean absolute percentage error (MAPE) in the range of 1-2 %. kNN show best results for the day-ahead forecasting with MAPE of 2.61 %. The presented vertical time approach outperforms the continuous time approach. To enhance pre-processing stage, refined techniques from the domain of statistics and time series are adopted in the modelling. Reducing the dimensionality through principal components analysis improves the predictive performance of Recurrent Neural Networks (RNN). In the case of Gated Recurrent Units (GRU) networks, the results for all the seasons are improved through principal components analysis (PCA). This work also considers abnormal operation due to various instances (e.g. random effect, intrusion, abnormal operation of smart devices, cyber-threats, etc.). In the results of kNN, iforest and Local Outlier Factor (LOF) on urban area data and from rural region data, it is observed that the anomaly detection for the scenarios are different. For the rural region, most of the anomalies are observed in the latter timeline of the data concentrated in the last year of the collected data. For the urban area data, the anomalies are spread out over the entire timeline. The frequency of detected anomalies where considerably higher for the rural area load demand than for the urban area load demand. Observing from considered case scenarios, the incidents of detected anomalies are more data driven, than exceptions in the algorithms. It is observed that from the domain knowledge of smart energy systems the LOF is able to detect observations that could not have detected by visual inspection alone, in contrast to kNN and iforest. Whereas kNN and iforest excludes an upper and lower bound, the LOF is density based and separates out anomalies amidst in the data. The capability that LOF has to identify anomalies amidst the data together with the deep domain knowledge is an advantage, when detecting anomalies in smart meter data. This work has shown that the instance based models can compete with models of higher complexity, yet some methods in preprocessing (such as circular coding) does not function for an instance based learner such as k-Nearest Neighbor, and hence kNN can not option for this kind of complexity even in the feature engineering of the model. It will be interesting for the future work of electrical load forecasting to develop solution that combines a high complexity in the feature engineering and have the explainability of instance based models.publishedVersio

    Relative evaluation of regression tools for urban area electrical energy demand forecasting

    Get PDF
    Load forecasting is the most fundamental application in Smart-Grid, which provides essential input to Demand Response, Topology Optimization and Abnormally Detection, facilitating the integration of intermittent clean energy sources. In this work, several regression tools are analyzed using larger datasets for urban area electrical load forecasting. The regression tools which are used are Random Forest Regressor, k-Nearest Neighbour Regressor and Linear Regressor. This work explores the use of regression tool for regional electric load forecasting by correlating lower distinctive categorical level (season, day of the week) and weather parameters. The regression analysis has been done on continuous time basis as well as vertical time axis approach. The vertical time approach is considering a sample time period (e.g seasonally and weekly) of data for four years and has been tested for the same time period for the consecutive year. This work has uniqueness in electrical demand forecasting using regression tools through vertical approach and it also considers the impact of meteorological parameters. This vertical approach uses less amount of data compare to continuous time-series as well as neural network techniques. A correlation study, where both the Pearson method and visual inspection, of the vertical approach depicts meaningful relation between pre-processing of data, test methods and results, for the regressors examined through Mean Absolute Percentage Error (MAPE). By examining the structure of various regressors they are compared for the lowest MAPE. Random Forest Regressor provides better short-term load prediction (30 min) and kNN offers relatively better long-term load prediction (24 h).acceptedVersio

    Entity-level stream classification: exploiting entity similarity to label the future observations referring to an entity

    Get PDF
    Stream classification algorithms traditionally treat arriving instances as independent. However, in many applications, the arriving examples may depend on the “entity” that generated them, e.g. in product reviews or in the interactions of users with an application server. In this study, we investigate the potential of this dependency by partitioning the original stream of instances/“observations” into entity-centric substreams and by incorporating entity-specific information into the learning model. We propose a k-nearest-neighbour-inspired stream classification approach, in which the label of an arriving observation is predicted by exploiting knowledge on the observations belonging to this entity and to entities similar to it. For the computation of entity similarity, we consider knowledge about the observations and knowledge about the entity, potentially from a domain/feature space different from that in which predictions are made. To distinguish between cases where this knowledge transfer is beneficial for stream classification and cases where the knowledge on the entities does not contribute to classifying the observations, we also propose a heuristic approach based on random sampling of substreams using k Random Entities (kRE). Our learning scenario is not fully supervised: after acquiring labels for the initial m observations of each entity, we assume that no additional labels arrive and attempt to predict the labels of near-future and far-future observations from that initial seed. We report on our findings from three datasets

    A Survey on Data Mining Techniques Applied to Energy Time Series Forecasting

    Get PDF
    Data mining has become an essential tool during the last decade to analyze large sets of data. The variety of techniques it includes and the successful results obtained in many application fields, make this family of approaches powerful and widely used. In particular, this work explores the application of these techniques to time series forecasting. Although classical statistical-based methods provides reasonably good results, the result of the application of data mining outperforms those of classical ones. Hence, this work faces two main challenges: (i) to provide a compact mathematical formulation of the mainly used techniques; (ii) to review the latest works of time series forecasting and, as case study, those related to electricity price and demand markets.Ministerio de EconomĂ­a y Competitividad TIN2014-55894-C2-RJunta de AndalucĂ­a P12- TIC-1728Universidad Pablo de Olavide APPB81309

    Review of Low Voltage Load Forecasting: Methods, Applications, and Recommendations

    Full text link
    The increased digitalisation and monitoring of the energy system opens up numerous opportunities to decarbonise the energy system. Applications on low voltage, local networks, such as community energy markets and smart storage will facilitate decarbonisation, but they will require advanced control and management. Reliable forecasting will be a necessary component of many of these systems to anticipate key features and uncertainties. Despite this urgent need, there has not yet been an extensive investigation into the current state-of-the-art of low voltage level forecasts, other than at the smart meter level. This paper aims to provide a comprehensive overview of the landscape, current approaches, core applications, challenges and recommendations. Another aim of this paper is to facilitate the continued improvement and advancement in this area. To this end, the paper also surveys some of the most relevant and promising trends. It establishes an open, community-driven list of the known low voltage level open datasets to encourage further research and development.Comment: 37 pages, 6 figures, 2 tables, review pape

    Strategies for Imputation of High-Resolution Environmental Data in Clinical Randomized Controlled Trials.

    Full text link
    Time series data collected in clinical trials can have varying degrees of missingness, adding challenges during statistical analyses. An additional layer of complexity is introduced for missing data in randomized controlled trials (RCT), where researchers must remain blinded between intervention and control groups. Such restriction severely limits the applicability of conventional imputation methods that would utilize other participants' data for improved performance. This paper explores and compares various methods to impute high-resolution temperature logger data in RCT settings. In addition to the conventional non-parametric approaches, we propose a spline regression (SR) approach that captures the dynamics of indoor temperature by time of day that is unique to each participant. We investigate how the inclusion of external temperature and energy use can improve the model performance. Results show that SR imputation results in 16% smaller root mean squared error (RMSE) compared to conventional imputation methods, with the gap widening to 22% when more than half of data is missing. The SR method is particularly useful in cases where missingness occurs simultaneously for multiple participants, such as concurrent battery failures. We demonstrate how proper modelling of periodic dynamics can lead to significantly improved imputation performance, even with limited data

    Contributions to time series analysis, modelling and forecasting to increase reliability in industrial environments.

    Get PDF
    356 p.La integraciĂłn del Internet of Things en el sector industrial es clave para alcanzar la inteligencia empresarial. Este estudio se enfoca en mejorar o proponer nuevos enfoques para aumentar la confiabilidad de las soluciones de IA basadas en datos de series temporales en la industria. Se abordan tres fases: mejora de la calidad de los datos, modelos y errores. Se propone una definiciĂłn estĂĄndar de mĂŠtricas de calidad y se incluyen en el paquete dqts de R. Se exploran los pasos del modelado de series temporales, desde la extracciĂłn de caracterĂ­sticas hasta la elecciĂłn y aplicaciĂłn del modelo de predicciĂłn mĂĄs eficiente. El mĂŠtodo KNPTS, basado en la bĂşsqueda de patrones en el histĂłrico, se presenta como un paquete de R para estimar datos futuros. AdemĂĄs, se sugiere el uso de medidas elĂĄsticas de similitud para evaluar modelos de regresiĂłn y la importancia de mĂŠtricas adecuadas en problemas de clases desbalanceadas. Las contribuciones se validaron en casos de uso industrial de diferentes campos: calidad de producto, previsiĂłn de consumo elĂŠctrico, detecciĂłn de porosidad y diagnĂłstico de mĂĄquinas

    TĂŠcnicas avanzadas de predicciĂłn para big data en el contexto de smart cities

    Get PDF
    Programa de Doctorado en BiotecnologĂ­a, IngenierĂ­a y TecnologĂ­a QuĂ­micaLĂ­nea de InvestigaciĂłn: IngenierĂ­a InformĂĄticaClave Programa: DBICĂłdigo LĂ­nea: 19Cada dĂ­a se recoge mĂĄs y mĂĄs informaciĂłn de cualquier ĂĄmbito de nuestra vida. NĂşmero de pasos por minuto, contaminaciĂłn en las principales ciudades del mundo o el consumo elĂŠctrico medido cada cierto tiempo son sĂłlo algunos ejemplos. Es en este ĂĄmbito donde surgen las Smart Cities, o ciudades conectadas, donde se recaba toda la informaciĂłn posible de diferentes dispositivos IoT repartidos por la misma con la esperanza de descubrir conocimiento en dichos datos e, incluso, predecir ciertos comportamientos futuros. Pero estas nuevas series temporales que se estĂĄn creando comienzan a exceder los tamaĂąos hasta ahora tenidos en cuenta, empezando a considerarse por tanto Big Data. Las tĂŠcnicas de machine learning y minerĂ­a de datos que hasta ahora ofrecĂ­an buenos resultados, no podĂ­an gestionar tal cantidad de informaciĂłn. Es por ello que necesitaban ser revisadas. AsĂ­, surge este trabajo de investigaciĂłn, donde se propone un algoritmo de predicciĂłn basado en vecinos cercanos, para predecir series temporales Big Data. Para ello, apoyĂĄndose en nuevos frameworks de anĂĄlisis de datos como Apache Spark con la computaciĂłn distribuida como bandera, se proponen dos algoritmos: uno basado en el kWNN para anĂĄlisis y predicciĂłn de series temporales univariante y el MV-kWNN en su versiĂłn multivariante. Se detalla en este trabajo los pasos realizados para adaptarlo a la computaciĂłn distribuida y los resultados obtenidos tras llevar a cabo la predicciĂłn sobre los datos de consumo elĂŠctrico de 3 edificios de una universidad pĂşblica. Se muestra, asĂ­ mismo, las mejoras introducidas al algoritmo para seleccionar de forma Ăłptima los parĂĄmetros requeridos por el mismo, estos son: el nĂşmero de valores pasados que hay que usar (w) para predecir los h valores siguientes y el nĂşmero de vecinos cercanos k a considerar para la predicciĂłn. TambiĂŠn se valoran diferentes tamaĂąos de horizontes de predicciĂłn h como dato de entrada al algoritmo. Se comprueba la validez de dichas mejoras realizando la predicciĂłn sobre una serie temporal el doble de grande que la considerada en primer tĂŠrmino, en este caso la demanda elĂŠctrica en EspaĂąa recogida durante 9 aĂąos. Las baja tasa de error obtenida demuestra la idoneidad del algoritmo, y su comparaciĂłn con otros mĂŠtodos como deep learning o ĂĄrboles de regresiĂłn, asĂ­ lo reafirman. Distintas pruebas sobre la escalabilidad del algoritmo en un clĂşster con diferentes configuraciones muestran lo importante que es escoger adecuadamente parĂĄmetros como el nĂşmero de cores a utilizar por mĂĄquina, el nĂşmero de particiones en que dividir el conjunto de datos asĂ­ como el nĂşmero de mĂĄquinas en un clĂşster. Para finalizar, se propone un nuevo algoritmo para tener en cuenta no sĂłlo una variable, sino varias series exĂłgenas que pudieran mejorar la predicciĂłn final. Llevando a cabo diferentes anĂĄlisis basados en correlaciĂłn, se define el grado mĂ­nimo que deben cumplir las series para mejorar dicha predicciĂłn. Experimentaciones sobre dos series reales, de demanda elĂŠctrica en EspaĂąa y del precio de la electricidad durante el mismo periodo, son llevadas a cabo, alcanzando de nuevo bajas tasas de error. La comparaciĂłn con otros mĂŠtodos multivariantes, como los de redes neuronales o random forests, sitĂşan al mĂŠtodo propuesto en el primer lugar por delante de estos. Una Ăşltima experimentaciĂłn para confirmar la adecuaciĂłn del algoritmo a series temporales Big Data es realizada, mostrando los tiempos de ejecuciĂłn multiplicando hasta por 200 el tamaĂąo original de las series.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e InformĂĄticaPostprin

    25 Years of IIF Time Series Forecasting: A Selective Review

    Get PDF
    We review the past 25 years of time series research that has been published in journals managed by the International Institute of Forecasters (Journal of Forecasting 1982-1985; International Journal of Forecasting 1985-2005). During this period, over one third of all papers published in these journals concerned time series forecasting. We also review highly influential works on time series forecasting that have been published elsewhere during this period. Enormous progress has been made in many areas, but we find that there are a large number of topics in need of further development. We conclude with comments on possible future research directions in this field.Accuracy measures; ARCH model; ARIMA model; Combining; Count data; Densities; Exponential smoothing; Kalman Filter; Long memory; Multivariate; Neural nets; Nonlinearity; Prediction intervals; Regime switching models; Robustness; Seasonality; State space; Structural models; Transfer function; Univariate; VAR.
    • …
    corecore