2,773 research outputs found

    On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

    Full text link
    We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData Congress). arXiv admin note: text overlap with arXiv:1902.0935

    Regional flood frequency analysis using the FCM-ANFIS algorithm : a case study in south-eastern Australia

    Get PDF
    Regional flood frequency analysis (RFFA) is widely used to estimate design floods in ungauged catchments. Both linear and non-linear methods are adopted in RFFA. The development of the non-linear RFFA method Adaptive Neuro-fuzzy Inference System (ANFIS) using data from 181 gauged catchments in south-eastern Australia is presented in this study. Three different types of ANFIS models, Fuzzy C-mean (FCM), Subtractive Clustering (SC), and Grid Partitioning (GP) were adopted, and the results were compared with the Quantile Regression Technique (QRT). It was found that FCM performs better (with relative error (RE) values in the range of 38-60%) than the SC (RE of 44-69%) and GP (RE of 42-78%) models. The FCM performs better for smaller to medium ARIs (2 to 20 years) (ARI of five years having the best performance), and in New South Wales, over Victoria. In many aspects, the QRT and FCM models perform very similarly. These developed RFFA models can be used in south-eastern Australia to derive more accurate flood quantiles. The developed method can easily be adapted to other parts of Australia and other countries. The results of this study will assist in updating the Australian Rainfall Runoff (national guide)-recommended RFFA technique

    Characterisation of large changes in wind power for the day-ahead market using a fuzzy logic approach

    Get PDF
    Wind power has become one of the renewable resources with a major growth in the electricity market. However, due to its inherent variability, forecasting techniques are necessary for the optimum scheduling of the electric grid, specially during ramp events. These large changes in wind power may not be captured by wind power point forecasts even with very high resolution Numerical Weather Prediction (NWP) models. In this paper, a fuzzy approach for wind power ramp characterisation is presented. The main benefit of this technique is that it avoids the binary definition of ramp event, allowing to identify changes in power out- put that can potentially turn into ramp events when the total percentage of change to be considered a ramp event is not met. To study the application of this technique, wind power forecasts were obtained and their corresponding error estimated using Genetic Programming (GP) and Quantile Regression Forests. The error distributions were incorporated into the characterisation process, which according to the results, improve significantly the ramp capture. Results are presented using colour maps, which provide a useful way to interpret the characteristics of the ramp events

    Deep Generative Models for Reject Inference in Credit Scoring

    Get PDF
    Credit scoring models based on accepted applications may be biased and their consequences can have a statistical and economic impact. Reject inference is the process of attempting to infer the creditworthiness status of the rejected applications. In this research, we use deep generative models to develop two new semi-supervised Bayesian models for reject inference in credit scoring, in which we model the data generating process to be dependent on a Gaussian mixture. The goal is to improve the classification accuracy in credit scoring models by adding reject applications. Our proposed models infer the unknown creditworthiness of the rejected applications by exact enumeration of the two possible outcomes of the loan (default or non-default). The efficient stochastic gradient optimization technique used in deep generative models makes our models suitable for large data sets. Finally, the experiments in this research show that our proposed models perform better than classical and alternative machine learning models for reject inference in credit scoring

    A robust fault diagnosis and forecasting approach based on Kalman filter and interval type-2 fuzzy logic for efficiency improvement of centrifugal gas compressor system

    Get PDF
    The paper proposes a robust faults detection and forecasting approach for a centrifugal gas compressor system, the mechanism of this approach used the Kalman filter to estimate and filtering the unmeasured states of the studied system based on signals data of the inputs and the outputs that have been collected experimentally on site. The intelligent faults detection expert system is designed based on the interval type-2 fuzzy logic. The present work is achieved by an important task which is the prediction of the remaining time of the system under study to reach the danger and/or the failure stage based on the Auto-regressive Integrated Moving Average (ARIMA) model, where the objective within the industrial application is to set the maintenance schedules in precisely time. The obtained results prove the performance of the proposed faults diagnosis and detection approach which can be used in several heavy industrial systemsPeer ReviewedPostprint (published version

    Development of Neurofuzzy Architectures for Electricity Price Forecasting

    Get PDF
    In 20th century, many countries have liberalized their electricity market. This power markets liberalization has directed generation companies as well as wholesale buyers to undertake a greater intense risk exposure compared to the old centralized framework. In this framework, electricity price prediction has become crucial for any market player in their decision‐making process as well as strategic planning. In this study, a prototype asymmetric‐based neuro‐fuzzy network (AGFINN) architecture has been implemented for short‐term electricity prices forecasting for ISO New England market. AGFINN framework has been designed through two different defuzzification schemes. Fuzzy clustering has been explored as an initial step for defining the fuzzy rules while an asymmetric Gaussian membership function has been utilized in the fuzzification part of the model. Results related to the minimum and maximum electricity prices for ISO New England, emphasize the superiority of the proposed model over well‐established learning‐based models

    Sustainable Assessment in Supply Chain and Infrastructure Management

    Get PDF
    In the competitive business environment or public domain, the sustainability assessment in supply chain and infrastructure management are important for any organization. Organizations are currently striving to improve their sustainable strategies through preparedness, response, and recovery because of increasing competitiveness, community, and regulatory pressure. Thus, it is necessary to develop a meaningful and more focused understanding of sustainability in supply chain management and infrastructure management practices. In the context of a supply chain, sustainability implies that companies identify, assess, and manage impacts and risks in all the echelons of the supply chain, considering downstream and upstream activities. Similarly, the sustainable infrastructure management indicates the ability of infrastructure to meet the requirements of the present without sacrificing the ability of future generations to address their needs. The complexities regarding sustainable supply chain and infrastructure management have driven managers and professionals to seek different solutions. This Special Issue aims to provide readers with the most recent research results on the aforementioned subjects. In addition, it offers some solutions and also raises some questions for further research and development toward sustainable supply chain and infrastructure management

    Quantile regression forests-based modeling and environmental indicators for decision support in broiler farming

    Get PDF
    An efficient and sustainable animal production requires fine-tuning and control of all the parameters involved. But this is not a simple task. Animal farming is a complex biological system in which environmental parameters and management practices interact in a dynamic way. In addition, the typical non-linear response of biological processes implies that relationships across parameters that are critical to assure animal welfare and performance are difficult to determine. In this paper a novel decision support system based on environmental indicators and on weights, leg problems and mortality rates is proposed to address this issue. The data-driven modeling process is performed by a quantile regression forests approach that allows estimating growth, welfare and mortality parameters on the basis of environmental deviations from optimal farm conditions. Resulting models also provide confidence intervals able to deal with uncertainty. They are deployed in farm, offering an accessible tool for farmers, veterinarians and technical personnel. Experimental results involving 20 flocks of broiler meat chickens from different farms show the validity of the system, obtaining robust prediction intervals and high accuracy, namely over 81% for every model. The in-field use of the proposed approach will facilitate an efficient and animal welfare-friendly production management.This project was funded by the Spanish Ministry of Economy and Competitivity, General Directorate for Science and Technology, National Research Program ’Retos de la Sociedad’ Project #AGL2013-49173-C2-1-R P.I. Inma Estevez and #AGL2013-49173-C2-2-R. The authors wish to thank to AN and the farmers for facilitating access to their farms for data collection
    • …
    corecore