4,651 research outputs found
Three essays on applications of machine learning in problems with high dimensional data
The amount of data businesses collecting from the internet is massive. Researchers and analysts can now track various data features generated from log files, such as customers’ behavior history, product descriptions and aggregate level data. etc. In an ideal scenario, such data could be represented in a spreadsheet, with columns representing each dimension. In practice, the number of data dimensions can be staggering, making data processing difficult. With high dimensional data, the number of features can be more than the number of observations, and it can be very challenging for traditional econometric method to handle this scenario. My dissertation addresses this data issue by applying machine learning techniques, including LASSO (least absolute shrinkage and selection operator), decision trees, and neural networks, to help decision makers perform descriptive-predictive, and prescriptive analytics based on high dimensional data.
My dissertation comprises three essays. The first essay applies tree based machine learning models (random forest and gradient boosting decision tree) and free text information to predict house prices and understand how certain factors could affect the prices. In the second essay, I propose a LASSO method in high dimensional data and use daily prices of hotels to understand hotel’s competition pattern in a certain area. In the third essay, a word embedding and neural network model is applied to real estate data to more efficiently extract free text information, which leads to more accurate of house prices.
In these essays, I apply and extend a variety of analytic tools including supervised learning, unsupervised learning, statistics, and econometric methods. These essays contribute to the applied econometric and business analytics literature and can help researchers and analysts appreciate both traditional econometrics and predictive analytics tools, and make data-driven business decisions
Identifying hybrid heating systems in the residential sector from smart meter data
In this paper, we identify hybrid heating systems on a single residential customer’s premises using smart meter data. A comprehensive methodology is developed at a generic level for residential sector buildings to identify the type of primary and support heating systems. The methodology includes the use of unsupervised and supervised learning algorithms both separately and combined. It is applied to two datasets that vary in size, quality of data, and availability and reliability of background information. The datasets contain hourly electricity consumption profiles of residential customers together with the outdoor temperature. The validation metrics for the developed algorithms are elaborated to provide a probabilistic evaluation of the model. The results show that it is possible to identify the types of both primary and support heating systems in the form of probability of having electric- or non-electric type of heating. The results obtained help estimate the flexibility domain of the residential building sector and thereby generate a high value for the energy system as a whole
Assessing the impact of employing machine learning-based baseline load prediction pipelines with sliding-window training scheme on offered flexibility estimation for different building categories
The present study is focused on assessing the impact of the performance of baseline load prediction pipelines on the estimation (by the grid operator) accuracy of the flexibility offered by different categories of buildings. Accordingly, the corresponding impact of employing different machine learning (ML) algorithms, with sliding-window and offline training schemes, for hour-ahead baseline load prediction has been investigated and compared. Using a smart meter measurements dataset, training window sizes and the most promising pipeline for each building category are first identified. Next, the consumption profiles of five buildings (belonging to each category), with the regular operation (baseline load) and while offering flexibility, are physically simulated. Finally, the identified pipelines are used for predicting the baseline loads, and the resulting error in estimating the provided flexibility is determined. Obtained results demonstrate that the identified most promising prediction pipeline (extra trees algorithm with a sliding window of 5 weeks) offers a notably superior performance compared to that of offline training (average R2 score of 0.91 vs. 0.87). Employing these pipelines permits estimating the provided flexibility with acceptable accuracy (flexibility index's mean relative error between -2.45% to +2.79%), permitting the grid operator to guarantee fair compensation for buildings' offered flexibility
Assessing the impact of employing machine learning-based baseline load prediction pipelines with sliding-window training scheme on offered flexibility estimation for different building categories
The present study is focused on assessing the impact of the performance of baseline load prediction pipelines on the estimation (by the grid operator) accuracy of the flexibility offered by different categories of buildings. Accordingly, the corresponding impact of employing different machine learning (ML) algorithms, with sliding-window and offline training schemes, for hour-ahead baseline load prediction has been investigated and compared. Using a smart meter measurements dataset, training window sizes and the most promising pipeline for each building category are first identified. Next, the consumption profiles of five buildings (belonging to each category), with the regular operation (baseline load) and while offering flexibility, are physically simulated. Finally, the identified pipelines are used for predicting the baseline loads, and the resulting error in estimating the provided flexibility is determined. Obtained results demonstrate that the identified most promising prediction pipeline (extra trees algorithm with a sliding window of 5 weeks) offers a notably superior performance compared to that of offline training (average score of 0.91 vs. 0.87). Employing these pipelines permits estimating the provided flexibility with acceptable accuracy (flexibility index's mean relative error between -2.45% to +2.79%), permitting the grid operator to guarantee fair compensation for buildings' offered flexibility.publishedVersio
Recommended from our members
Residential Demand Response using Electricity Smart Meter Data
The electricity industry is currently undergoing changes in a transitioning period characterised by Energy 3D: Digitalisation, Decentralisation, and Decarbonisation. Smart meters are the vital infrastructure necessary to digitalise the energy system as well as enable advancements in decentralisation and decarbonisation. As of today, more than 500 million smart meters have been installed worldwide, with that number expected to rise to several billion installations over the decade. Smart meters enable electricity load to be measured with half-hourly granularity, providing an opportunity for demand-side management innovations that are likely to be advantageous for both utility companies and customers. Among these innovations, time-of- use (TOU) tariffs are widely considered to be the most promising solution for optimising energy consumption in the residential sector, however actual use is still limited.
The objective of this thesis is to investigate opportunities and problems related to TOU tariffs utilising smart meter data at the national level. The authors have identified four major research gaps which need to be filled in order to expand commercial applications of TOU tariffs. These gaps are the described and addressed in the following chapters: the "TOU load adaptation forecasting problem", the "TOU winner detection problem", the "TOU public dataset problem", and the "excess generation forecasting problem".
This thesis demonstrates three modelling approaches and one new TOU dataset (CAMSL). A significant contribution to the field is through the discover of new summary statistical features (statistical moments) and assesses the capacity of these to encapsulate other more widely used explanatory variables of demand response. The thesis is concluded by discussing future works and policy implications, such as the necessity of the more tailored modelling works and public live-stream of smart meter data, which could accelerate the roll-out of the demand side management at the residential sector.EPC
Recommended from our members
A novel machine learning approach for identifying the drivers of domestic electricity users’ price responsiveness
Time-based pricing programs for domestic electricity users have been effective in reducing peak demand and facilitating renewables integration. Nevertheless, high cost, price non-responsiveness and adverse selection may create the possible challenges. To overcome these challenges, it can be fruitful to investigate the ‘high-potential’ users, which are more responsive to price changes and apply time-based pricing to these users. Few studies have investigated how to identify which users are more price-responsive. We aim to fill this gap by comprehensively identifying the drivers of domestic users’ price responsiveness, in order to facilitate the selection of the high-potential users. We adopt a novel data-driven approach, first by a feed forward neural network model to accurately determine the baseline monthly peak consumption of individual households, followed by an integrated machine-learning variable selection methodology to identify the drivers of price responsiveness applied to Irish smart meter data from 2009-10 as part of a national Time of Use trial. This methodology substantially outperforms traditional variable selection methods by combining three advanced machine-learning techniques. Our results show that the response of energy users to price change is affected by a number of factors, ranging from demographic and dwelling characteristics, psychological factors, historical electricity consumption, to appliance ownership. In particular, historical electricity consumption, income, the number of occupants, perceived behavioural control, and adoption of specific appliances, including immersion water heater and dishwasher, are found to be significant drivers of price responsiveness. We also observe that continual price increase within a moderate range does not drive additional peak demand reduction, and that there is an intention-behaviour gap, whereby stated intention does not lead to actual peak reduction behavior. Based on our findings, we have conducted scenario analysis to demonstrate the feasibility of selecting the high potential users to achieve significant peak reduction
Review of Low Voltage Load Forecasting: Methods, Applications, and Recommendations
The increased digitalisation and monitoring of the energy system opens up
numerous opportunities to decarbonise the energy system. Applications on low
voltage, local networks, such as community energy markets and smart storage
will facilitate decarbonisation, but they will require advanced control and
management. Reliable forecasting will be a necessary component of many of these
systems to anticipate key features and uncertainties. Despite this urgent need,
there has not yet been an extensive investigation into the current
state-of-the-art of low voltage level forecasts, other than at the smart meter
level. This paper aims to provide a comprehensive overview of the landscape,
current approaches, core applications, challenges and recommendations. Another
aim of this paper is to facilitate the continued improvement and advancement in
this area. To this end, the paper also surveys some of the most relevant and
promising trends. It establishes an open, community-driven list of the known
low voltage level open datasets to encourage further research and development.Comment: 37 pages, 6 figures, 2 tables, review pape
Recommended from our members
An assessment of the load modifying potential of model predictive controlled dynamic facades within the California context
California is making major strides towards meeting its greenhouse gas emission reduction goals with the transformation of its electrical grid to accommodate renewable generation, aggressive promotion of building energy efficiency, and increased emphasis on moving toward electrification of end uses (e.g., residential heating, etc.). As a result of this activity, the State is faced with significant challenges of systemwide resource adequacy, power quality and grid reliability that could be addressed in part with demand responsive (DR) load modifying strategies using controllable building technologies. Dynamic facades have the ability to potentially shift and shed loads at critical times of the day in combination with daylighting and HVAC controls. This study explores the technical potential of dynamic facades to support net load shape objectives. A model predictive controller (MPC) was designed based on reduced order thermal (Modelica) and window (Radiance) models. Using an automated workflow (involving JModelica.org and MPCPy), these models were converted and differentiated to formulate a non-linear optimization problem. A gradient-based, non-linear programming problem solver (IPOPT) was used to derive an optimal control strategy, then a post-optimization step was used to convert the solution to a discrete state for facade actuation. Continuous state modulation of the façade was also modeled. The performance of the MPC controller with and without activation of thermal mass was evaluated in a south-facing perimeter office zone with a three-zone electrochromic window for a clear sunny week during summer and winter periods in Oakland and Burbank, California. MPC strategies reduced total energy cost by 9–28% and critical coincident peak demand was reduced by up to 0.58 W/ft2-floor or 19–43% in the 4.6 m (15 ft) deep south zone on sunny summer days in Oakland compared to state-of-the-art heuristic control. Similar savings were achieved for the hotter, Burbank climate in Southern California. This outcome supports the argument that MPC control of dynamic facades can provide significant electricity cost reductions and net load management capabilities of benefit to both the building owner and evolving electrical grid
- …