27 research outputs found

    Development of an Ensemble of Models for Predicting Socio-Economic Indicators of the Russian Federation using IRT-Theory and Bagging Methods

    Get PDF
    This article describes the application of the bagging method to build a forecast model for the socio-economic indicators of the Russian Federation. This task is one of the priorities within the framework of the Federal Project "Strategic Planning", which implies the creation of a unified decision support system capable of predicting socio-economic indicators. This paper considers the relevance of the development of forecasting models, examines and analyzes the work of researchers on this topic. The authors carried out computational experiments for 40 indicators of the socio-economic sphere of the Russian Federation. For each indicator, a linear multiple regression equation was constructed. For the constructed equations, verification was carried out and indicators with the worst accuracy and quality of the forecast were selected. For these indicators, neural network modeling was carried out. Multilayer perceptrons were chosen as the architecture of neural networks. Next, an analysis of the accuracy and quality of neural network models was carried out. Indicators that could not be predicted with a sufficient level of accuracy were selected for the bagging procedure. Bagging was used for weighted averaging of prediction results for neural networks of various configurations. Item Response Theory (IRT) elements were used to determine the weights of the models

    A Novel Feature-Engineered–NGBoost Machine-Learning Framework for Fraud Detection in Electric Power Consumption Data

    Get PDF
    This study presents a novel feature-engineered–natural gradient descent ensemble-boosting (NGBoost) machine-learning framework for detecting fraud in power consumption data. The proposed framework was sequentially executed in three stages: data pre-processing, feature engineering, and model evaluation. It utilized the random forest algorithm-based imputation technique initially to impute the missing data entries in the acquired smart meter dataset. In the second phase, the majority weighted minority oversampling technique (MWMOTE) algorithm was used to avoid an unequal distribution of data samples among different classes. The time-series feature-extraction library and whale optimization algorithm were utilized to extract and select the most relevant features from the kWh reading of consumers. Once the most relevant features were acquired, the model training and testing process was initiated by using the NGBoost algorithm to classify the consumers into two distinct categories (“Healthy” and “Theft”). Finally, each input feature’s impact (positive or negative) in predicting the target variable was recognized with the tree SHAP additive-explanations algorithm. The proposed framework achieved an accuracy of 93%, recall of 91%, and precision of 95%, which was greater than all the competing models, and thus validated its efficacy and significance in the studied field of research

    Review of Low Voltage Load Forecasting: Methods, Applications, and Recommendations

    Full text link
    The increased digitalisation and monitoring of the energy system opens up numerous opportunities to decarbonise the energy system. Applications on low voltage, local networks, such as community energy markets and smart storage will facilitate decarbonisation, but they will require advanced control and management. Reliable forecasting will be a necessary component of many of these systems to anticipate key features and uncertainties. Despite this urgent need, there has not yet been an extensive investigation into the current state-of-the-art of low voltage level forecasts, other than at the smart meter level. This paper aims to provide a comprehensive overview of the landscape, current approaches, core applications, challenges and recommendations. Another aim of this paper is to facilitate the continued improvement and advancement in this area. To this end, the paper also surveys some of the most relevant and promising trends. It establishes an open, community-driven list of the known low voltage level open datasets to encourage further research and development.Comment: 37 pages, 6 figures, 2 tables, review pape

    Novel Computationally Intelligent Machine Learning Algorithms for Data Mining and Knowledge Discovery

    Get PDF
    This thesis addresses three major issues in data mining regarding feature subset selection in large dimensionality domains, plausible reconstruction of incomplete data in cross-sectional applications, and forecasting univariate time series. For the automated selection of an optimal subset of features in real time, we present an improved hybrid algorithm: SAGA. SAGA combines the ability to avoid being trapped in local minima of Simulated Annealing with the very high convergence rate of the crossover operator of Genetic Algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks (GRNN). For imputing missing values and forecasting univariate time series, we propose a homogeneous neural network ensemble. The proposed ensemble consists of a committee of Generalized Regression Neural Networks (GRNNs) trained on different subsets of features generated by SAGA and the predictions of base classifiers are combined by a fusion rule. This approach makes it possible to discover all important interrelations between the values of the target variable and the input features. The proposed ensemble scheme has two innovative features which make it stand out amongst ensemble learning algorithms: (1) the ensemble makeup is optimized automatically by SAGA; and (2) GRNN is used for both base classifiers and the top level combiner classifier. Because of GRNN, the proposed ensemble is a dynamic weighting scheme. This is in contrast to the existing ensemble approaches which belong to the simple voting and static weighting strategy. The basic idea of the dynamic weighting procedure is to give a higher reliability weight to those scenarios that are similar to the new ones. The simulation results demonstrate the validity of the proposed ensemble model

    CUSTOMER SEGMENTATION APPROACHES: A COMPARISON OF METHODS WITH DATA FROM THE MEDICARE HEALTH OUTCOMES SURVEY

    Get PDF
    Model-based segmentation approaches are particularly useful in healthcare consumer research, where the primary goal is to identify groups of individuals who share similar attitudinal and behavioral characteristics, in order to develop engagement strategies, create products, and allocates resources tailored to the specific needs of each segment group. Despite the growing research and literature on segmentation models, many healthcare researchers continue to use demographic variables only to classify consumers into groups; while failing to uncover unique patterns, relationships, and latent traits and relationships. The primary aim of this study was to 1) examine the differences in outcomes when classification methods (K-Means and LCA) for segmentation was used in conjunction with continuous and dichotomous scales; and 2) examine the differences in outcomes when prediction methods (CHAID and Neural Networks) for segmentation was used in conjunction with binary and continuous dependent variables and a variation of the classification algorithm. For the purpose of comparison across methods, data from the Medicare Health Outcome Survey was used in all conditions. Results indicated that the best segment class solution was dependent upon both the method and treatment of the inputs and dependent variable for both classification and prediction problems. When the input depression scale was dichotomized, the K-Means model yielded a 6 segment best-class-solution, whereas the LCA model yielded 9 distinct segment classes. On the other hand, LCA models yielded the same segment solution (9 classes), irrespective of the treatment of the depression scale. Similarly, differences in outcomes were identified when the dependent variable was continuous vs. binary when prediction models were used to segment survey respondents. When the outcome was dichotomous, CHAID models resulted in a 5-segment solution, compared to a 6-segment solution for Neural Networks. On the other hand, the binary dependent variable produced a 4-segment solution for both CHAID and Neural Network models. In addition, the interpretation of the segment class profiles is dependent upon both method and condition (input and treatment of dependent variable)

    A Distributed and Real-time Machine Learning Framework for Smart Meter Big Data

    Get PDF
    The advanced metering infrastructure allows smart meters to collect high-resolution consumption data, thereby enabling consumers and utilities to understand their energy usage at different levels, which has led to numerous smart grid applications. Smart meter data, however, poses different challenges to developing machine learning frameworks than classic theoretical frameworks due to their big data features and privacy limitations. Therefore, in this work, we aim to address the challenges of building machine learning frameworks for smart meter big data. Specifically, our work includes three parts: 1) We first analyze and compare different learning algorithms for multi-level smart meter big data. A daily activity pattern recognition model has been developed based on non-intrusive load monitoring for appliance-level smart meter data. Then, a consensus-based load profiling and forecasting system has been proposed for individual building level and higher aggregated level smart meter data analysis; 2) Following discussion of multi-level smart meter data analysis from an offline perspective, a universal online functional analysis model has been proposed for multi-level real-time smart meter big data analysis. The proposed model consists of a multi-scale load dynamic profiling unit based on functional clustering and a multi-scale online load forecasting unit based on functional deep neural networks. The two units enable online tracking of the dynamic cluster trajectories and online forecasting of daily multi-scale demand; 3) To enable smart meter data analysis in the distributed environment, FederatedNILM was proposed, which is then combined with differential privacy to provide privacy guarantees for the appliance-level distributed machine learning framework. Based on federated deep learning enhanced with two schemes, namely the utility optimization scheme and the privacy-preserving scheme, the proposed distributed and privacy-preserving machine learning framework enables electric utilities and service providers to offer smart meter services on a large scale

    How to leverage artificial intelligence for sustainable business development

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThe field of Artificial Intelligence (AI) was founded as an academic discipline in the summer of 1956 (Muthukrishnan et al., 2020) at Dartmouth College in Hanover, New Hampshire, when several field experts gathered for a workshop focused on understanding how to humanize machine functioning (McCarthy et al., 2006). However, it was not until the beginning of the 21st century that AI research boomed, as a consequence of successful applications of machine learning algorithms across both academy and industry

    Forecasting: theory and practice

    Get PDF
    Forecasting has always been in the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The lack of a free-lunch theorem implies the need for a diverse set of forecasting methods to tackle an array of applications. This unique article provides a non-systematic review of the theory and the practice of forecasting. We offer a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts, including operations, economics, finance, energy, environment, and social good. We do not claim that this review is an exhaustive list of methods and applications. The list was compiled based on the expertise and interests of the authors. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of the forecasting theory and practice

    Statistical Data Modeling and Machine Learning with Applications

    Get PDF
    The modeling and processing of empirical data is one of the main subjects and goals of statistics. Nowadays, with the development of computer science, the extraction of useful and often hidden information and patterns from data sets of different volumes and complex data sets in warehouses has been added to these goals. New and powerful statistical techniques with machine learning (ML) and data mining paradigms have been developed. To one degree or another, all of these techniques and algorithms originate from a rigorous mathematical basis, including probability theory and mathematical statistics, operational research, mathematical analysis, numerical methods, etc. Popular ML methods, such as artificial neural networks (ANN), support vector machines (SVM), decision trees, random forest (RF), among others, have generated models that can be considered as straightforward applications of optimization theory and statistical estimation. The wide arsenal of classical statistical approaches combined with powerful ML techniques allows many challenging and practical problems to be solved. This Special Issue belongs to the section “Mathematics and Computer Science”. Its aim is to establish a brief collection of carefully selected papers presenting new and original methods, data analyses, case studies, comparative studies, and other research on the topic of statistical data modeling and ML as well as their applications. Particular attention is given, but is not limited, to theories and applications in diverse areas such as computer science, medicine, engineering, banking, education, sociology, economics, among others. The resulting palette of methods, algorithms, and applications for statistical modeling and ML presented in this Special Issue is expected to contribute to the further development of research in this area. We also believe that the new knowledge acquired here as well as the applied results are attractive and useful for young scientists, doctoral students, and researchers from various scientific specialties

    Advances in Binders for Construction Materials

    Get PDF
    The global binder production for construction materials is approximately 7.5 billion tons per year, contributing ~6% to the global anthropogenic atmospheric CO2 emissions. Reducing this carbon footprint is a key aim of the construction industry, and current research focuses on developing new innovative ways to attain more sustainable binders and concrete/mortars as a real alternative to the current global demand for Portland cement.With this aim, several potential alternative binders are currently being investigated by scientists worldwide, based on calcium aluminate cement, calcium sulfoaluminate cement, alkali-activated binders, calcined clay limestone cements, nanomaterials, or supersulfated cements. This Special Issue presents contributions that address research and practical advances in i) alternative binder manufacturing processes; ii) chemical, microstructural, and structural characterization of unhydrated binders and of hydrated systems; iii) the properties and modelling of concrete and mortars; iv) applications and durability of concrete and mortars; and v) the conservation and repair of historic concrete/mortar structures using alternative binders.We believe this Special Issue will be of high interest in the binder industry and construction community, based upon the novelty and quality of the results and the real potential application of the findings to the practice and industry
    corecore