2,757 research outputs found
Load curve data cleansing and imputation via sparsity and low rank
The smart grid vision is to build an intelligent power network with an
unprecedented level of situational awareness and controllability over its
services and infrastructure. This paper advocates statistical inference methods
to robustify power monitoring tasks against the outlier effects owing to faulty
readings and malicious attacks, as well as against missing data due to privacy
concerns and communication errors. In this context, a novel load cleansing and
imputation scheme is developed leveraging the low intrinsic-dimensionality of
spatiotemporal load profiles and the sparse nature of "bad data.'' A robust
estimator based on principal components pursuit (PCP) is adopted, which effects
a twofold sparsity-promoting regularization through an -norm of the
outliers, and the nuclear norm of the nominal load profiles. Upon recasting the
non-separable nuclear norm into a form amenable to decentralized optimization,
a distributed (D-) PCP algorithm is developed to carry out the imputation and
cleansing tasks using networked devices comprising the so-termed advanced
metering infrastructure. If D-PCP converges and a qualification inequality is
satisfied, the novel distributed estimator provably attains the performance of
its centralized PCP counterpart, which has access to all networkwide data.
Computer simulations and tests with real load curve data corroborate the
convergence and effectiveness of the novel D-PCP algorithm.Comment: 8 figures, submitted to IEEE Transactions on Smart Grid - Special
issue on "Optimization methods and algorithms applied to smart grid
Towards Automated Performance Bug Identification in Python
Context: Software performance is a critical non-functional requirement,
appearing in many fields such as mission critical applications, financial, and
real time systems. In this work we focused on early detection of performance
bugs; our software under study was a real time system used in the
advertisement/marketing domain.
Goal: Find a simple and easy to implement solution, predicting performance
bugs.
Method: We built several models using four machine learning methods, commonly
used for defect prediction: C4.5 Decision Trees, Na\"{\i}ve Bayes, Bayesian
Networks, and Logistic Regression.
Results: Our empirical results show that a C4.5 model, using lines of code
changed, file's age and size as explanatory variables, can be used to predict
performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that
reducing the number of changes delivered on a commit, can decrease the chance
of performance bug injection.
Conclusions: We believe that our approach can help practitioners to eliminate
performance bugs early in the development cycle. Our results are also of
interest to theoreticians, establishing a link between functional bugs and
(non-functional) performance bugs, and explicitly showing that attributes used
for prediction of functional bugs can be used for prediction of performance
bugs
Structural health monitoring of offshore wind turbines: A review through the Statistical Pattern Recognition Paradigm
Offshore Wind has become the most profitable renewable energy source due to the remarkable development it has experienced in Europe over the last decade. In this paper, a review of Structural Health Monitoring Systems (SHMS) for offshore wind turbines (OWT) has been carried out considering the topic as a Statistical Pattern Recognition problem. Therefore, each one of the stages of this paradigm has been reviewed focusing on OWT application. These stages are: Operational Evaluation; Data Acquisition, Normalization and Cleansing; Feature Extraction and Information Condensation; and Statistical Model Development. It is expected that optimizing each stage, SHMS can contribute to the development of efficient Condition-Based Maintenance Strategies. Optimizing this strategy will help reduce labor costs of OWTs׳ inspection, avoid unnecessary maintenance, identify design weaknesses before failure, improve the availability of power production while preventing wind turbines׳ overloading, therefore, maximizing the investments׳ return. In the forthcoming years, a growing interest in SHM technologies for OWT is expected, enhancing the potential of offshore wind farm deployments further offshore. Increasing efficiency in operational management will contribute towards achieving UK׳s 2020 and 2050 targets, through ultimately reducing the Levelised Cost of Energy (LCOE)
Index to NASA Tech Briefs, 1975
This index contains abstracts and four indexes--subject, personal author, originating Center, and Tech Brief number--for 1975 Tech Briefs
Robust data cleaning procedure for large scale medium voltage distribution networks feeders
Relatively little attention has been given to the short-term load forecasting problem of primary substations mainly because load forecasts were not essential to secure the operation of passive distribution networks. With the increasing uptake of intermittent generations, distribution networks are becoming active since power flows can change direction in a somewhat volatile fashion. The volatility of power flows introduces operational constraints on voltage control, system fault levels, thermal constraints, systems losses and high reverse power flows. Today, greater observability of the networks is essential to maintain a safe overall system and to maximise the utilisation of existing assets. Hence, to identify and anticipate for any forthcoming critical operational conditions, networks operators are compelled to broaden their visibility of the networks to time horizons that include not only real-time information but also hour-ahead and day-ahead forecasts. With this change in paradigm, progressively, large scales of short-term load forecasters is integrated as an essential component of distribution networks' control and planning tools.
The data acquisition of large scale real-world data is prone to errors; anomalies in data sets can lead to erroneous forecasting outcomes. Hence, data cleansing is an essential first step in data-driven learning techniques. Data cleansing is a labour-intensive and time-consuming task for the following reasons: 1) to select a suitable cleansing method is not trivial 2) to generalise or automate a cleansing procedure is challenging, 3) there is a risk to introduce new errors in the data. This thesis attempts to maximise the performance of large scale forecasting models by addressing the quality of the modelling data. Thus, the objectives of this research are to identify the bad data quality causes, design an automatic data cleansing procedure suitable for large scale distribution network datasets and, to propose a rigorous framework for modelling MV distribution network feeders time series with deep learning architecture. The thesis discusses in detail the challenges in handling and modelling real-world distribution feeders time series. It also discusses a robust technique to detect outliers in the presence of level-shifts, and suitable missing values imputation techniques. All the concepts have been demonstrated on large real-world distribution network data.Open Acces
Analysis of building performance data
In recent years, the global trend for digitalisation has also reached buildings and facility management. Due to the roll out of smart meters and the retrofitting of buildings with meters and sensors, the amount of data available for a single building has increased significantly. In addition to data sets collected by measurement devices, Building Information Modelling has recently seen a strong incline. By maintaining a building model through the whole building life-cycle, the model becomes rich of information describing all major aspects of a building. This work aims to combine these data sources to gain further valuable information from data analysis. Better knowledge of the building’s behaviour due to high quality data available leads to more efficient building operations. Eventually, this may result in a reduction of energy use and therefore less operational costs. In this thesis a concept for holistic data acquisition from smart meters and a methodology for the integration of further meters in the measurement concept are introduced and validated. Secondly, this thesis presents a novel algorithm designed for cleansing and interpolation of faulty data. Descriptive data is extracted from an open meta data model for buildings which is utilised to further enrich the metered data. Additionally, this thesis presents a methodology for how to design and manage all information in a unified Data Warehouse schema. This Data Warehouse, which has been developed, maintains compatibility with an open meta data model by adopting the model’s specification into its data schema. It features the application of building specific Key Performance Indicators (KPI) to measure building performance. In addition a clustering algorithm, based on machine learning technology, is developed to identify behavioural patterns of buildings and their frequency of occurrence. All methodologies introduced in this work are evaluated through installations and data from three pilot buildings. The pilot buildings were selected to be of diverse types to prove the generic applicability of the above concepts. The outcome of this work successfully demonstrates that the combination of data sources available for buildings enable advanced data analysis. This largely increases the understanding of buildings and their behavioural patterns. A more efficient building operation and a reduction of energy usage can be achieved with this knowledge
Data Consistency for Data-Driven Smart Energy Assessment
In the smart grid era, the number of data available for different applications has increased considerably. However, data could not perfectly represent the phenomenon or process under analysis, so their usability requires a preliminary validation carried out by experts of the specific domain. The process of data gathering and transmission over the communication channels has to be verified to ensure that data are provided in a useful format, and that no external effect has impacted on the correct data to be received.
Consistency of the data coming from different sources (in terms of timings and data resolution) has to be ensured and managed appropriately. Suitable procedures are needed for transforming data into knowledge in an effective way. This contribution addresses the previous aspects by highlighting a number of potential issues and the solutions in place in different power and energy system, including the generation, grid
and user sides. Recent references, as well as selected historical references, are listed to support the illustration of the conceptual aspects
- …