78 research outputs found

    Transparent but Accurate Evolutionary Regression Combining New Linguistic Fuzzy Grammar and a Novel Interpretable Linear Extension

    Get PDF
    Scientists must understand what machines do (systems should not behave like a black box), because in many cases how they predict is more important than what they predict. In this work, we propose a new extension of the fuzzy linguistic grammar and a mainly novel interpretable linear extension for regression problems, together with an enhanced new linguistic tree-based evolutionary multiobjective learning approach. This allows the general behavior of the data covered, as well as their specific variability, to be expressed as a single rule. In order to ensure the highest transparency and accuracy values, this learning process maximizes two widely accepted semantic metrics and also minimizes both the number of rules and the model mean squared error. The results obtained in 23 regression datasets show the effectiveness of the proposed method by applying statistical tests to the said metrics, which cover the different aspects of the interpretability of linguistic fuzzy models. This learning process has obtained the preservation of high-level semantics and less than 5 rules on average, while it still clearly outperforms some of the previous state-of-the-art linguistic fuzzy regression methods for learning interpretable regression linguistic fuzzy systems, and even to a competitive, pure accuracyoriented linguistic learning approach. Finally, we analyze a case study in a real problem related to childhood obesity, and a real expert carries out the analysis shown.Andalusian Government P18-RT-2248Health Institute Carlos III/Spanish Ministry of Science, Innovation and Universities PI20/00711Spanish Government PID2019-107793GB-I00 PID2020-119478GB-I0

    Self learning neuro-fuzzy modeling using hybrid genetic probabilistic approach for engine air/fuel ratio prediction

    Get PDF
    Machine Learning is concerned in constructing models which can learn and make predictions based on data. Rule extraction from real world data that are usually tainted with noise, ambiguity, and uncertainty, automatically requires feature selection. Neuro-Fuzzy system (NFS) which is known with its prediction performance has the difficulty in determining the proper number of rules and the number of membership functions for each rule. An enhanced hybrid Genetic Algorithm based Fuzzy Bayesian classifier (GA-FBC) was proposed to help the NFS in the rule extraction. Feature selection was performed in the rule level overcoming the problems of the FBC which depends on the frequency of the features leading to ignore the patterns of small classes. As dealing with a real world problem such as the Air/Fuel Ratio (AFR) prediction, a multi-objective problem is adopted. The GA-FBC uses mutual information entropy, which considers the relevance between feature attributes and class attributes. A fitness function is proposed to deal with multi-objective problem without weight using a new composition method. The model was compared to other learning algorithms for NFS such as Fuzzy c-means (FCM) and grid partition algorithm. Predictive accuracy and the complexity of the Fuzzy Rule Base System (FRBS) including number of rules and number of terms in each rule were taken as terms of evaluation. It was also compared to the original GA-FBC depending on the frequency not on Mutual Information (MI). Experimental results using Air/Fuel Ratio (AFR) data sets show that the new model participates in decreasing the average number of attributes in the rule and sometimes in increasing the average performance compared to other models. This work facilitates in achieving a self-generating FRBS from real data. The GA-FBC can be used as a new direction in machine learning research. This research contributes in controlling automobile emissions in helping the reduction of one of the most causes of pollution to produce greener environment

    Genetic optimization of fuzzy membership functions for cloud resource provisioning

    Get PDF
    The successful usage of fuzzy systems can be seen in many application domains owing to their capabilities to model complex systems by exploiting knowledge of domain experts. Their accuracy and performance are, however, primarily dependent on the design of its membership functions and control rules. The commonly employed technique to design membership functions is to exploit the knowledge of domain experts. However, in certain application domains, the knowledge of domain experts are limited and therefore, cannot be relied upon. Alternatively, optimization techniques such as genetic algorithms are utilized to optimize the various design parameters of fuzzy systems. In this paper, we report a case study of optimizing the membership functions of a fuzzy system using genetic algorithm, which is an important part of our recently developed cloud elasticity framework. This work aims to improve the overall performance of the framework. Results obtained from this research work demonstrate performance improvement in comparison with our previous experimental settings

    Multiobjective Evolutionary Optimization of Type-2 Fuzzy Rule-Based Systems for Financial Data Classification

    Get PDF
    Classification techniques are becoming essential in the financial world for reducing risks and possible disasters. Managers are interested in not only high accuracy, but in interpretability and transparency as well. It is widely accepted now that the comprehension of how inputs and outputs are related to each other is crucial for taking operative and strategic decisions. Furthermore, inputs are often affected by contextual factors and characterized by a high level of uncertainty. In addition, financial data are usually highly skewed toward the majority class. With the aim of achieving high accuracies, preserving the interpretability, and managing uncertain and unbalanced data, this paper presents a novel method to deal with financial data classification by adopting type-2 fuzzy rule-based classifiers (FRBCs) generated from data by a multiobjective evolutionary algorithm (MOEA). The classifiers employ an approach, denoted as scaled dominance, for defining rule weights in such a way to help minority classes to be correctly classified. In particular, we have extended PAES-RCS, an MOEA-based approach to learn concurrently the rule and data bases of FRBCs, for managing both interval type-2 fuzzy sets and unbalanced datasets. To the best of our knowledge, this is the first work that generates type-2 FRBCs by concurrently maximizing accuracy and minimizing the number of rules and the rule length with the objective of producing interpretable models of real-world skewed and incomplete financial datasets. The rule bases are generated by exploiting a rule and condition selection (RCS) approach, which selects a reduced number of rules from a heuristically generated rule base and a reduced number of conditions for each selected rule during the evolutionary process. The weight associated with each rule is scaled by the scaled dominance approach on the fuzzy frequency of the output class, in order to give a higher weight to the minority class. As regards the data base learning, the membership function parameters of the interval type-2 fuzzy sets used in the rules are learned concurrently to the application of RCS. Unbalanced datasets are managed by using, in addition to complexity, selectivity and specificity as objectives of the MOEA rather than only the classification rate. We tested our approach, named IT2-PAES-RCS, on 11 financial datasets and compared our results with the ones obtained by the original PAES-RCS with three objectives and with and without scaled dominance, the FRBCs, fuzzy association rule-based classification model for high-dimensional dataset (FARC-HD) and fuzzy unordered rules induction algorithm (FURIA), the classical C4.5 decision tree algorithm, and its cost-sensitive version. Using nonparametric statistical tests, we will show that IT2-PAES-RCS generates FRBCs with, on average, accuracy statistically comparable with and complexity lower than the ones generated by the two versions of the original PAES-RCS. Further, the FRBCs generated by FARC-HD and FURIA and the decision trees computed by C4.5 and its cost-sensitive version, despite the highest complexity, result to be less accurate than the FRBCs generated by IT2-PAES-RCS. Finally, we will highlight how these FRBCs are easily interpretable by showing and discussing one of them

    Improving Transparency in Approximate Fuzzy Modeling Using Multi-objective Immune-Inspired Optimisation

    Get PDF
    In this paper, an immune inspired multi-objective fuzzy modeling (IMOFM) mechanism is proposed specifically for high-dimensional regression problems. For such problems, prediction accuracy is often the paramount requirement. With such a requirement in mind, however, one should also put considerable efforts in eliciting models which are as transparent as possible, a ‘tricky’ exercise in itself. The proposed mechanism adopts a multi-stage modeling procedure and a variable length coding scheme to account for the enlarged search space due to simultaneous optimisation of the rule-base structure and its associated parameters. We claim here that IMOFM can account for both Singleton and Mamdani Fuzzy Rule-Based Systems (FRBS) due to the carefully chosen output membership functions, the inference scheme and the defuzzification method. The proposed modeling approach has been compared to other representatives using a benchmark problem, and was further applied to a high-dimensional problem, taken from the steel industry, which concerns the prediction of mechanical properties of hot rolled steels. Results confirm that IMOFM is capable of eliciting not only accurate but also transparent FRBSs from quantitative data

    Temporal Information in Data Science: An Integrated Framework and its Applications

    Get PDF
    Data science is a well-known buzzword, that is in fact composed of two distinct keywords, i.e., data and science. Data itself is of great importance: each analysis task begins from a set of examples. Based on such a consideration, the present work starts with the analysis of a real case scenario, by considering the development of a data warehouse-based decision support system for an Italian contact center company. Then, relying on the information collected in the developed system, a set of machine learning-based analysis tasks have been developed to answer specific business questions, such as employee work anomaly detection and automatic call classification. Although such initial applications rely on already available algorithms, as we shall see, some clever analysis workflows had also to be developed. Afterwards, continuously driven by real data and real world applications, we turned ourselves to the question of how to handle temporal information within classical decision tree models. Our research brought us the development of J48SS, a decision tree induction algorithm based on Quinlan's C4.5 learner, which is capable of dealing with temporal (e.g., sequential and time series) as well as atemporal (such as numerical and categorical) data during the same execution cycle. The decision tree has been applied into some real world analysis tasks, proving its worthiness. A key characteristic of J48SS is its interpretability, an aspect that we specifically addressed through the study of an evolutionary-based decision tree pruning technique. Next, since a lot of work concerning the management of temporal information has already been done in automated reasoning and formal verification fields, a natural direction in which to proceed was that of investigating how such solutions may be combined with machine learning, following two main tracks. First, we show, through the development of an enriched decision tree capable of encoding temporal information by means of interval temporal logic formulas, how a machine learning algorithm can successfully exploit temporal logic to perform data analysis. Then, we focus on the opposite direction, i.e., that of employing machine learning techniques to generate temporal logic formulas, considering a natural language processing scenario. Finally, as a conclusive development, the architecture of a system is proposed, in which formal methods and machine learning techniques are seamlessly combined to perform anomaly detection and predictive maintenance tasks. Such an integration represents an original, thrilling research direction that may open up new ways of dealing with complex, real-world problems.Data science is a well-known buzzword, that is in fact composed of two distinct keywords, i.e., data and science. Data itself is of great importance: each analysis task begins from a set of examples. Based on such a consideration, the present work starts with the analysis of a real case scenario, by considering the development of a data warehouse-based decision support system for an Italian contact center company. Then, relying on the information collected in the developed system, a set of machine learning-based analysis tasks have been developed to answer specific business questions, such as employee work anomaly detection and automatic call classification. Although such initial applications rely on already available algorithms, as we shall see, some clever analysis workflows had also to be developed. Afterwards, continuously driven by real data and real world applications, we turned ourselves to the question of how to handle temporal information within classical decision tree models. Our research brought us the development of J48SS, a decision tree induction algorithm based on Quinlan's C4.5 learner, which is capable of dealing with temporal (e.g., sequential and time series) as well as atemporal (such as numerical and categorical) data during the same execution cycle. The decision tree has been applied into some real world analysis tasks, proving its worthiness. A key characteristic of J48SS is its interpretability, an aspect that we specifically addressed through the study of an evolutionary-based decision tree pruning technique. Next, since a lot of work concerning the management of temporal information has already been done in automated reasoning and formal verification fields, a natural direction in which to proceed was that of investigating how such solutions may be combined with machine learning, following two main tracks. First, we show, through the development of an enriched decision tree capable of encoding temporal information by means of interval temporal logic formulas, how a machine learning algorithm can successfully exploit temporal logic to perform data analysis. Then, we focus on the opposite direction, i.e., that of employing machine learning techniques to generate temporal logic formulas, considering a natural language processing scenario. Finally, as a conclusive development, the architecture of a system is proposed, in which formal methods and machine learning techniques are seamlessly combined to perform anomaly detection and predictive maintenance tasks. Such an integration represents an original, thrilling research direction that may open up new ways of dealing with complex, real-world problems

    A cloned linguistic decision tree controller for real-time path planning in hostile environments

    Get PDF
    AbstractThe idea of a Cloned Controller to approximate optimised control algorithms in a real-time environment is introduced. A Cloned Controller is demonstrated using Linguistic Decision Trees (LDTs) to clone a Model Predictive Controller (MPC) based on Mixed Integer Linear Programming (MILP) for Unmanned Aerial Vehicle (UAV) path planning through a hostile environment. Modifications to the LDT algorithm are proposed to account for attributes with circular domains, such as bearings, and discontinuous output functions. The cloned controller is shown to produce near optimal paths whilst significantly reducing the decision period. Further investigation shows that the cloned controller generalises to the multi-obstacle case although this can lead to situations far outside of the training dataset and consequently result in decisions with a high level of uncertainty. A modification to the algorithm to improve the performance in regions of high uncertainty is proposed and shown to further enhance generalisation. The resulting controller combines the high performance of MPC–MILP with the rapid response of an LDT while providing a degree of transparency/interpretability of the decision making

    Uncertainty and Interpretability Studies in Soft Computing with an Application to Complex Manufacturing Systems

    Get PDF
    In systems modelling and control theory, the benefits of applying neural networks have been extensively studied. Particularly in manufacturing processes, such as the prediction of mechanical properties of heat treated steels. However, modern industrial processes usually involve large amounts of data and a range of non-linear effects and interactions that might hinder their model interpretation. For example, in steel manufacturing the understanding of complex mechanisms that lead to the mechanical properties which are generated by the heat treatment process is vital. This knowledge is not available via numerical models, therefore an experienced metallurgist estimates the model parameters to obtain the required properties. This human knowledge and perception sometimes can be imprecise leading to a kind of cognitive uncertainty such as vagueness and ambiguity when making decisions. In system classification, this may be translated into a system deficiency - for example, small input changes in system attributes may result in a sudden and inappropriate change for class assignation. In order to address this issue, practitioners and researches have developed systems that are functional equivalent to fuzzy systems and neural networks. Such systems provide a morphology that mimics the human ability of reasoning via the qualitative aspects of fuzzy information rather by its quantitative analysis. Furthermore, these models are able to learn from data sets and to describe the associated interactions and non-linearities in the data. However, in a like-manner to neural networks, a neural fuzzy system may suffer from a lost of interpretability and transparency when making decisions. This is mainly due to the application of adaptive approaches for its parameter identification. Since the RBF-NN can be treated as a fuzzy inference engine, this thesis presents several methodologies that quantify different types of uncertainty and its influence on the model interpretability and transparency of the RBF-NN during its parameter identification. Particularly, three kind of uncertainty sources in relation to the RBF-NN are studied, namely: entropy, fuzziness and ambiguity. First, a methodology based on Granular Computing (GrC), neutrosophic sets and the RBF-NN is presented. The objective of this methodology is to quantify the hesitation produced during the granular compression at the low level of interpretability of the RBF-NN via the use of neutrosophic sets. This study also aims to enhance the disitnguishability and hence the transparency of the initial fuzzy partition. The effectiveness of the proposed methodology is tested against a real case study for the prediction of the properties of heat-treated steels. Secondly, a new Interval Type-2 Radial Basis Function Neural Network (IT2-RBF-NN) is introduced as a new modelling framework. The IT2-RBF-NN takes advantage of the functional equivalence between FLSs of type-1 and the RBF-NN so as to construct an Interval Type-2 Fuzzy Logic System (IT2-FLS) that is able to deal with linguistic uncertainty and perceptions in the RBF-NN rule base. This gave raise to different combinations when optimising the IT2-RBF-NN parameters. Finally, a twofold study for uncertainty assessment at the high-level of interpretability of the RBF-NN is provided. On the one hand, the first study proposes a new methodology to quantify the a) fuzziness and the b) ambiguity at each RU, and during the formation of the rule base via the use of neutrosophic sets theory. The aim of this methodology is to calculate the associated fuzziness of each rule and then the ambiguity related to each normalised consequence of the fuzzy rules that result from the overlapping and to the choice with one-to-many decisions respectively. On the other hand, a second study proposes a new methodology to quantify the entropy and the fuzziness that come out from the redundancy phenomenon during the parameter identification. To conclude this work, the experimental results obtained through the application of the proposed methodologies for modelling two well-known benchmark data sets and for the prediction of mechanical properties of heat-treated steels conducted to publication of three articles in two peer-reviewed journals and one international conference
    • …
    corecore