521 research outputs found

    Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets

    Get PDF
    In many real application areas, the data used are highly skewed and the number of instances for some classes are much higher than that of the other classes. Solving a classification task using such an imbalanced data-set is difficult due to the bias of the training towards the majority classes. The aim of this paper is to improve the performance of fuzzy rule based classification systems on imbalanced domains, increasing the granularity of the fuzzy partitions on the boundary areas between the classes, in order to obtain a better separability. We propose the use of a hierarchical fuzzy rule based classification system, which is based on the refinement of a simple linguistic fuzzy model by means of the extension of the structure of the knowledge base in a hierarchical way and the use of a genetic rule selection process in order to get a compact and accurate model. The good performance of this approach is shown through an extensive experimental study carried out over a large collection of imbalanced data-sets.Spanish Ministry of Education and Science (MEC) under Projects TIN-2005-08386-C05-01 and TIN-2005-08386- C05-0

    Literature Review of the Recent Trends and Applications in various Fuzzy Rule based systems

    Full text link
    Fuzzy rule based systems (FRBSs) is a rule-based system which uses linguistic fuzzy variables as antecedents and consequent to represent human understandable knowledge. They have been applied to various applications and areas throughout the soft computing literature. However, FRBSs suffers from many drawbacks such as uncertainty representation, high number of rules, interpretability loss, high computational time for learning etc. To overcome these issues with FRBSs, there exists many extensions of FRBSs. This paper presents an overview and literature review of recent trends on various types and prominent areas of fuzzy systems (FRBSs) namely genetic fuzzy system (GFS), hierarchical fuzzy system (HFS), neuro fuzzy system (NFS), evolving fuzzy system (eFS), FRBSs for big data, FRBSs for imbalanced data, interpretability in FRBSs and FRBSs which use cluster centroids as fuzzy rules. The review is for years 2010-2021. This paper also highlights important contributions, publication statistics and current trends in the field. The paper also addresses several open research areas which need further attention from the FRBSs research community.Comment: 49 pages, Accepted for publication in ijf

    Automatic synthesis of fuzzy systems: An evolutionary overview with a genetic programming perspective

    Get PDF
    Studies in Evolutionary Fuzzy Systems (EFSs) began in the 90s and have experienced a fast development since then, with applications to areas such as pattern recognition, curve‐fitting and regression, forecasting and control. An EFS results from the combination of a Fuzzy Inference System (FIS) with an Evolutionary Algorithm (EA). This relationship can be established for multiple purposes: fine‐tuning of FIS's parameters, selection of fuzzy rules, learning a rule base or membership functions from scratch, and so forth. Each facet of this relationship creates a strand in the literature, as membership function fine‐tuning, fuzzy rule‐based learning, and so forth and the purpose here is to outline some of what has been done in each aspect. Special focus is given to Genetic Programming‐based EFSs by providing a taxonomy of the main architectures available, as well as by pointing out the gaps that still prevail in the literature. The concluding remarks address some further topics of current research and trends, such as interpretability analysis, multiobjective optimization, and synthesis of a FIS through Evolving methods

    Advancement of Data-Driven Short-Term Flood Predictions on an Urbanized Watershed Using Preprocessing Techniques

    Get PDF
    Supervised classification can be applied for short-term predictions of hydrological events in cases where the label of the event rather than its magnitude is crucial, as in the case of early flood warning systems. To be effective, these warning systems must be able to forecast floods accurately and to provide estimates early enough. Following the approach of transforming hydrological sensor data into a phase space using time-delay embedding, an attempt was made to improve the performance of the models and to increase the lead-time of reliable predictions. For this, the available set of attributes supplied by stream and rain gauges was extended by derivatives. In addition, imbalanced data techniques were applied at the data preprocessing step. The computational experiments were conducted on various data sets, lead-times, and years with different hydrological characteristics. The results show that especially derivatives of water level data improve model performance, increasingly when added for only one or two hours before the prediction time. In addition to that, the imbalanced data techniques allowed for overall improved prediction of floods at the cost of slight increase of misclassification of low-flow events

    Fuzzy rough and evolutionary approaches to instance selection

    Get PDF

    Multiobjective Evolutionary Optimization of Type-2 Fuzzy Rule-Based Systems for Financial Data Classification

    Get PDF
    Classification techniques are becoming essential in the financial world for reducing risks and possible disasters. Managers are interested in not only high accuracy, but in interpretability and transparency as well. It is widely accepted now that the comprehension of how inputs and outputs are related to each other is crucial for taking operative and strategic decisions. Furthermore, inputs are often affected by contextual factors and characterized by a high level of uncertainty. In addition, financial data are usually highly skewed toward the majority class. With the aim of achieving high accuracies, preserving the interpretability, and managing uncertain and unbalanced data, this paper presents a novel method to deal with financial data classification by adopting type-2 fuzzy rule-based classifiers (FRBCs) generated from data by a multiobjective evolutionary algorithm (MOEA). The classifiers employ an approach, denoted as scaled dominance, for defining rule weights in such a way to help minority classes to be correctly classified. In particular, we have extended PAES-RCS, an MOEA-based approach to learn concurrently the rule and data bases of FRBCs, for managing both interval type-2 fuzzy sets and unbalanced datasets. To the best of our knowledge, this is the first work that generates type-2 FRBCs by concurrently maximizing accuracy and minimizing the number of rules and the rule length with the objective of producing interpretable models of real-world skewed and incomplete financial datasets. The rule bases are generated by exploiting a rule and condition selection (RCS) approach, which selects a reduced number of rules from a heuristically generated rule base and a reduced number of conditions for each selected rule during the evolutionary process. The weight associated with each rule is scaled by the scaled dominance approach on the fuzzy frequency of the output class, in order to give a higher weight to the minority class. As regards the data base learning, the membership function parameters of the interval type-2 fuzzy sets used in the rules are learned concurrently to the application of RCS. Unbalanced datasets are managed by using, in addition to complexity, selectivity and specificity as objectives of the MOEA rather than only the classification rate. We tested our approach, named IT2-PAES-RCS, on 11 financial datasets and compared our results with the ones obtained by the original PAES-RCS with three objectives and with and without scaled dominance, the FRBCs, fuzzy association rule-based classification model for high-dimensional dataset (FARC-HD) and fuzzy unordered rules induction algorithm (FURIA), the classical C4.5 decision tree algorithm, and its cost-sensitive version. Using nonparametric statistical tests, we will show that IT2-PAES-RCS generates FRBCs with, on average, accuracy statistically comparable with and complexity lower than the ones generated by the two versions of the original PAES-RCS. Further, the FRBCs generated by FARC-HD and FURIA and the decision trees computed by C4.5 and its cost-sensitive version, despite the highest complexity, result to be less accurate than the FRBCs generated by IT2-PAES-RCS. Finally, we will highlight how these FRBCs are easily interpretable by showing and discussing one of them

    Survey on highly imbalanced multi-class data

    Get PDF
    Machine learning technology has a massive impact on society because it offers solutions to solve many complicated problems like classification, clustering analysis, and predictions, especially during the COVID-19 pandemic. Data distribution in machine learning has been an essential aspect in providing unbiased solutions. From the earliest literatures published on highly imbalanced data until recently, machine learning research has focused mostly on binary classification data problems. Research on highly imbalanced multi-class data is still greatly unexplored when the need for better analysis and predictions in handling Big Data is required. This study focuses on reviews related to the models or techniques in handling highly imbalanced multi-class data, along with their strengths and weaknesses and related domains. Furthermore, the paper uses the statistical method to explore a case study with a severely imbalanced dataset. This article aims to (1) understand the trend of highly imbalanced multi-class data through analysis of related literatures; (2) analyze the previous and current methods of handling highly imbalanced multi-class data; (3) construct a framework of highly imbalanced multi-class data. The chosen highly imbalanced multi-class dataset analysis will also be performed and adapted to the current methods or techniques in machine learning, followed by discussions on open challenges and the future direction of highly imbalanced multi-class data. Finally, for highly imbalanced multi-class data, this paper presents a novel framework. We hope this research can provide insights on the potential development of better methods or techniques to handle and manipulate highly imbalanced multi-class data

    A survey on pre-processing techniques: relevant issues in the context of environmental data mining

    Get PDF
    One of the important issues related with all types of data analysis, either statistical data analysis, machine learning, data mining, data science or whatever form of data-driven modeling, is data quality. The more complex the reality to be analyzed is, the higher the risk of getting low quality data. Unfortunately real data often contain noise, uncertainty, errors, redundancies or even irrelevant information. Useless models will be obtained when built over incorrect or incomplete data. As a consequence, the quality of decisions made over these models, also depends on data quality. This is why pre-processing is one of the most critical steps of data analysis in any of its forms. However, pre-processing has not been properly systematized yet, and little research is focused on this. In this paper a survey on most popular pre-processing steps required in environmental data analysis is presented, together with a proposal to systematize it. Rather than providing technical details on specific pre-processing techniques, the paper focus on providing general ideas to a non-expert user, who, after reading them, can decide which one is the more suitable technique required to solve his/her problem.Peer ReviewedPostprint (author's final draft

    A fuzzy-XCS classifier system with linguistic hedges

    Get PDF

    Combining heterogeneous classifiers via granular prototypes.

    Get PDF
    In this study, a novel framework to combine multiple classifiers in an ensemble system is introduced. Here we exploit the concept of information granule to construct granular prototypes for each class on the outputs of an ensemble of base classifiers. In the proposed method, uncertainty in the outputs of the base classifiers on training observations is captured by an interval-based representation. To predict the class label for a new observation, we first determine the distances between the output of the base classifiers for this observation and the class prototypes, then the predicted class label is obtained by choosing the label associated with the shortest distance. In the experimental study, we combine several learning algorithms to build the ensemble system and conduct experiments on the UCI, colon cancer, and selected CLEF2009 datasets. The experimental results demonstrate that the proposed framework outperforms several benchmarked algorithms including two trainable combining methods, i.e., Decision Template and Two Stages Ensemble System, AdaBoost, Random Forest, L2-loss Linear Support Vector Machine, and Decision Tree
    corecore