4,651 research outputs found

    Three essays on applications of machine learning in problems with high dimensional data

    Get PDF
    The amount of data businesses collecting from the internet is massive. Researchers and analysts can now track various data features generated from log files, such as customers’ behavior history, product descriptions and aggregate level data. etc. In an ideal scenario, such data could be represented in a spreadsheet, with columns representing each dimension. In practice, the number of data dimensions can be staggering, making data processing difficult. With high dimensional data, the number of features can be more than the number of observations, and it can be very challenging for traditional econometric method to handle this scenario. My dissertation addresses this data issue by applying machine learning techniques, including LASSO (least absolute shrinkage and selection operator), decision trees, and neural networks, to help decision makers perform descriptive-predictive, and prescriptive analytics based on high dimensional data. My dissertation comprises three essays. The first essay applies tree based machine learning models (random forest and gradient boosting decision tree) and free text information to predict house prices and understand how certain factors could affect the prices. In the second essay, I propose a LASSO method in high dimensional data and use daily prices of hotels to understand hotel’s competition pattern in a certain area. In the third essay, a word embedding and neural network model is applied to real estate data to more efficiently extract free text information, which leads to more accurate of house prices. In these essays, I apply and extend a variety of analytic tools including supervised learning, unsupervised learning, statistics, and econometric methods. These essays contribute to the applied econometric and business analytics literature and can help researchers and analysts appreciate both traditional econometrics and predictive analytics tools, and make data-driven business decisions

    Identifying hybrid heating systems in the residential sector from smart meter data

    Get PDF
    In this paper, we identify hybrid heating systems on a single residential customer’s premises using smart meter data. A comprehensive methodology is developed at a generic level for residential sector buildings to identify the type of primary and support heating systems. The methodology includes the use of unsupervised and supervised learning algorithms both separately and combined. It is applied to two datasets that vary in size, quality of data, and availability and reliability of background information. The datasets contain hourly electricity consumption profiles of residential customers together with the outdoor temperature. The validation metrics for the developed algorithms are elaborated to provide a probabilistic evaluation of the model. The results show that it is possible to identify the types of both primary and support heating systems in the form of probability of having electric- or non-electric type of heating. The results obtained help estimate the flexibility domain of the residential building sector and thereby generate a high value for the energy system as a whole

    Assessing the impact of employing machine learning-based baseline load prediction pipelines with sliding-window training scheme on offered flexibility estimation for different building categories

    Get PDF
    The present study is focused on assessing the impact of the performance of baseline load prediction pipelines on the estimation (by the grid operator) accuracy of the flexibility offered by different categories of buildings. Accordingly, the corresponding impact of employing different machine learning (ML) algorithms, with sliding-window and offline training schemes, for hour-ahead baseline load prediction has been investigated and compared. Using a smart meter measurements dataset, training window sizes and the most promising pipeline for each building category are first identified. Next, the consumption profiles of five buildings (belonging to each category), with the regular operation (baseline load) and while offering flexibility, are physically simulated. Finally, the identified pipelines are used for predicting the baseline loads, and the resulting error in estimating the provided flexibility is determined. Obtained results demonstrate that the identified most promising prediction pipeline (extra trees algorithm with a sliding window of 5 weeks) offers a notably superior performance compared to that of offline training (average R2 score of 0.91 vs. 0.87). Employing these pipelines permits estimating the provided flexibility with acceptable accuracy (flexibility index's mean relative error between -2.45% to +2.79%), permitting the grid operator to guarantee fair compensation for buildings' offered flexibility

    Assessing the impact of employing machine learning-based baseline load prediction pipelines with sliding-window training scheme on offered flexibility estimation for different building categories

    Get PDF
    The present study is focused on assessing the impact of the performance of baseline load prediction pipelines on the estimation (by the grid operator) accuracy of the flexibility offered by different categories of buildings. Accordingly, the corresponding impact of employing different machine learning (ML) algorithms, with sliding-window and offline training schemes, for hour-ahead baseline load prediction has been investigated and compared. Using a smart meter measurements dataset, training window sizes and the most promising pipeline for each building category are first identified. Next, the consumption profiles of five buildings (belonging to each category), with the regular operation (baseline load) and while offering flexibility, are physically simulated. Finally, the identified pipelines are used for predicting the baseline loads, and the resulting error in estimating the provided flexibility is determined. Obtained results demonstrate that the identified most promising prediction pipeline (extra trees algorithm with a sliding window of 5 weeks) offers a notably superior performance compared to that of offline training (average score of 0.91 vs. 0.87). Employing these pipelines permits estimating the provided flexibility with acceptable accuracy (flexibility index's mean relative error between -2.45% to +2.79%), permitting the grid operator to guarantee fair compensation for buildings' offered flexibility.publishedVersio

    Review of Low Voltage Load Forecasting: Methods, Applications, and Recommendations

    Full text link
    The increased digitalisation and monitoring of the energy system opens up numerous opportunities to decarbonise the energy system. Applications on low voltage, local networks, such as community energy markets and smart storage will facilitate decarbonisation, but they will require advanced control and management. Reliable forecasting will be a necessary component of many of these systems to anticipate key features and uncertainties. Despite this urgent need, there has not yet been an extensive investigation into the current state-of-the-art of low voltage level forecasts, other than at the smart meter level. This paper aims to provide a comprehensive overview of the landscape, current approaches, core applications, challenges and recommendations. Another aim of this paper is to facilitate the continued improvement and advancement in this area. To this end, the paper also surveys some of the most relevant and promising trends. It establishes an open, community-driven list of the known low voltage level open datasets to encourage further research and development.Comment: 37 pages, 6 figures, 2 tables, review pape
    • …
    corecore