1,079 research outputs found

    Probabilistic Anomaly Detection in Natural Gas Time Series Data

    Get PDF
    This paper introduces a probabilistic approach to anomaly detection, specifically in natural gas time series data. In the natural gas field, there are various types of anomalies, each of which is induced by a range of causes and sources. The causes of a set of anomalies are examined and categorized, and a Bayesian maximum likelihood classifier learns the temporal structures of known anomalies. Given previously unseen time series data, the system detects anomalies using a linear regression model with weather inputs, after which the anomalies are tested for false positives and classified using a Bayesian classifier. The method can also identify anomalies of an unknown origin. Thus, the likelihood of a data point being anomalous is given for anomalies of both known and unknown origins. This probabilistic anomaly detection method is tested on a reported natural gas consumption data set

    Modeling and Optimization of Active Distribution Network Operation Based on Deep Learning

    Get PDF

    Data Cleaning in the Energy Domain

    Get PDF
    This dissertation addresses the problem of data cleaning in the energy domain, especially for natural gas and electric time series. The detection and imputation of anomalies improves the performance of forecasting models necessary to lower purchasing and storage costs for utilities and plan for peak energy loads or distribution shortages. There are various types of anomalies, each induced by diverse causes and sources depending on the field of study. The definition of false positives also depends on the context. The analysis is focused on energy data because of the availability of data and information to make a theoretical and practical contribution to the field. A probabilistic approach based on hypothesis testing is developed to decide if a data point is anomalous based on the level of significance. Furthermore, the probabilistic approach is combined with statistical regression models to handle time series data. Domain knowledge of energy data and the survey of causes and sources of anomalies in energy are incorporated into the data cleaning algorithm to improve the accuracy of the results. The data cleaning method is evaluated on simulated data sets in which anomalies were artificially inserted and on natural gas and electric data sets. In the simulation study, the performance of the method is evaluated for both detection and imputation on all identified causes of anomalies in energy data. The testing on utilities\u27 data evaluates the percentage of improvement brought to forecasting accuracy by data cleaning. A cross-validation study of the results is also performed to demonstrate the performance of the data cleaning algorithm on smaller data sets and to calculate an interval of confidence for the results. The data cleaning algorithm is able to successfully identify energy time series anomalies. The replacement of those anomalies provides improvement to forecasting models accuracy. The process is automatic, which is important because many data cleaning processes require human input and become impractical for very large data sets. The techniques are also applicable to other fields such as econometrics and finance, but the exogenous factors of the time series data need to be well defined

    Condition Monitoring of Wind Turbines Using Intelligent Machine Learning Techniques

    Get PDF
    Wind Turbine condition monitoring can detect anomalies in turbine performance which have the potential to result in unexpected failure and financial loss. This study examines common Supervisory Control And Data Acquisition (SCADA) data over a period of 20 months for 21 pitch regulated 2.3 MW turbines and is presented in three manuscripts. First, power curve monitoring is targeted applying various types of Artificial Neural Networks to increase modeling accuracy. It is shown how the proposed method can significantly improve network reliability compared with existing models. Then, an advance technique is utilized to create a smoother dataset for network training followed by establishing dynamic ANFIS network. At this stage, designed network aims to predict power generation in future hours. Finally, a recursive principal component analysis is performed to extract significant features to be used as input parameters of the network. A novel fusion technique is then employed to build an advanced model to make predictions of turbines performance with favorably low errors

    Flight Data of Airplane for Wind Forecasting

    Get PDF
    This research solely focuses on understanding and predicting weather behavior, which is one of the important factors that affect airplanes in flight. The future weather information is used for informing pilots about changing flight conditions. In this paper, we present a new approach towards forecasting one component of weather information, wind speed, from data captured by airplanes in flight. We compare NASA’s ACT-America project against NOAA’s Wind Aloft program for prediction suitability. A collinearity analysis between these datasets reveals better model performance and smaller test error with NASA’s dataset. We then apply machine learning and a genetic algorithm to process the data further and arrive at a competitive error rate. The sliding window approach is used to find the best window size, and then we create a forecasting model that predicts wind speed at high altitudes 10 mins ahead of time. Finally, a stacking-based framework was used for better performance than individual learning algorithms to get root means square error (RMSE) of the best combination as 0.674, which is 98.4% better than the state-of-the-art approach

    Analysis and Prediction of Relative Humidity Level using Generalized Linear Model

    Get PDF
    The significance of humidity as a critical climate parameter impacts various sectors, including agriculture, health, and energy, necessitating a comprehensive understanding of its influencing factors. This study investigates the influence of climatic variables such as temperature, rainfall, sunshine duration, wind speed, and wind direction on the humidity levels in DKI Jakarta from 2019 to 2022. The objective is to develop a time-independent predictive model for humidity based on historical climate data. The methodology includes data pre-processing to impute missing values and replace outliers, followed by exploratory data analysis to ascertain variable distribution and inter-relationships. A regression model was initially employed for analysis, with subsequent application of regularization via a generalized linear model to enhance prediction accuracy. Results indicate that temperature, rainfall, sunshine duration, and wind direction significantly impact humidity levels in the investigated period. High inter-variable correlation posed challenges of multicollinearity and overfitting in the initial model. However, the application of regularization, trained with 75% of the historical dataset, mitigated these issues and improved model accuracy. This is evident in the improved Mean Squared Error (MSE) performance metrics of the Elastic-Net Regression Model (12.2), compared to the initial Multiple Regression Model (12.5). These findings hold potential implications for weather forecasting and climate change studie

    Implementation of Feature Engineering in Prediction of AQI in India using Machine Learning

    Get PDF
    Prediction of Air Quality Index (AQI) is the necessity of today’s era but for the prediction, analysis of different preprocessing techniques that can be applied, needs to be considered. In this study, first of all we explored various feature engineering techniques such as Data Imputation, Scaling, Extraction, Selection, and Data Split that can be used before applying machine learning algorithm for better results. Second, we used MLR and SVR (Linear, Gaussian) to build the prediction models. Finally, we used root mean square error (RMSE), R2, Mean Squared Error (MSE) and Mean Absolute Error (MAE) to evaluate the performance of the regression models in collaboration with the feature engineering techniques. The results shows that the performance of Linear SVR is better when coupled with imputation and robust scaler (R2=0.7557834846394744) as compared to the others, the performance of Gaussian SVR is better when coupled with the imputation only as compared to the others. In case of MLR, results (R2=0.7769187383819041) are almost same in all the 4 cases and performance degraded when PCA was applied
    • …
    corecore