1,792 research outputs found
Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets
In many real application areas, the data used are highly skewed and the number of
instances for some classes are much higher than that of the other classes. Solving a classification
task using such an imbalanced data-set is difficult due to the bias of the training
towards the majority classes.
The aim of this paper is to improve the performance of fuzzy rule based classification systems
on imbalanced domains, increasing the granularity of the fuzzy partitions on the
boundary areas between the classes, in order to obtain a better separability. We propose
the use of a hierarchical fuzzy rule based classification system, which is based on the
refinement of a simple linguistic fuzzy model by means of the extension of the structure
of the knowledge base in a hierarchical way and the use of a genetic rule selection process
in order to get a compact and accurate model.
The good performance of this approach is shown through an extensive experimental
study carried out over a large collection of imbalanced data-sets.Spanish Ministry of Education and Science (MEC) under Projects TIN-2005-08386-C05-01 and TIN-2005-08386-
C05-0
A systematic review of data quality issues in knowledge discovery tasks
Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust
Literature Review of the Recent Trends and Applications in various Fuzzy Rule based systems
Fuzzy rule based systems (FRBSs) is a rule-based system which uses linguistic
fuzzy variables as antecedents and consequent to represent human understandable
knowledge. They have been applied to various applications and areas throughout
the soft computing literature. However, FRBSs suffers from many drawbacks such
as uncertainty representation, high number of rules, interpretability loss,
high computational time for learning etc. To overcome these issues with FRBSs,
there exists many extensions of FRBSs. This paper presents an overview and
literature review of recent trends on various types and prominent areas of
fuzzy systems (FRBSs) namely genetic fuzzy system (GFS), hierarchical fuzzy
system (HFS), neuro fuzzy system (NFS), evolving fuzzy system (eFS), FRBSs for
big data, FRBSs for imbalanced data, interpretability in FRBSs and FRBSs which
use cluster centroids as fuzzy rules. The review is for years 2010-2021. This
paper also highlights important contributions, publication statistics and
current trends in the field. The paper also addresses several open research
areas which need further attention from the FRBSs research community.Comment: 49 pages, Accepted for publication in ijf
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
An enhanced resampling technique for imbalanced data sets
A data set is considered imbalanced if the distribution of instances in one class (majority class) outnumbers the other class (minority class). The main problem related
to binary imbalanced data sets is classifiers tend to ignore the minority class. Numerous resampling techniques such as undersampling, oversampling, and a combination of both techniques have been widely used. However, the undersampling and oversampling techniques suffer from elimination and addition of relevant data which may lead to poor classification results. Hence, this study aims to increase classification metrics by enhancing the undersampling technique and combining it
with an existing oversampling technique. To achieve this objective, a Fuzzy Distancebased
Undersampling (FDUS) is proposed. Entropy estimation is used to produce fuzzy thresholds to categorise the instances in majority and minority class into membership functions. FDUS is then combined with the Synthetic Minority
Oversampling TEchnique (SMOTE) known as FDUS+SMOTE, which is executed in sequence until a balanced data set is achieved. FDUS and FDUS+SMOTE are compared with four techniques based on classification accuracy, F-measure and Gmean. From the results, FDUS achieved better classification accuracy, F-measure and G-mean, compared to the other techniques with an average of 80.57%, 0.85 and 0.78, respectively. This showed that fuzzy logic when incorporated with Distance-based Undersampling technique was able to reduce the elimination of relevant data. Further, the findings showed that FDUS+SMOTE performed better than combination of
SMOTE and Tomek Links, and SMOTE and Edited Nearest Neighbour on benchmark data sets. FDUS+SMOTE has minimised the removal of relevant data from the majority class and avoid overfitting. On average, FDUS and FDUS+SMOTE were able to balance categorical, integer and real data sets and enhanced the performance
of binary classification. Furthermore, the techniques performed well on small record
size data sets that have of instances in the range of approximately 100 to 800
Automatic synthesis of fuzzy systems: An evolutionary overview with a genetic programming perspective
Studies in Evolutionary Fuzzy Systems (EFSs) began in the 90s and have experienced a fast development since then, with applications to areas such as pattern recognition, curve‐fitting and regression, forecasting and control. An EFS results from the combination of a Fuzzy Inference System (FIS) with an Evolutionary Algorithm (EA). This relationship can be established for multiple purposes: fine‐tuning of FIS's parameters, selection of fuzzy rules, learning a rule base or membership functions from scratch, and so forth. Each facet of this relationship creates a strand in the literature, as membership function fine‐tuning, fuzzy rule‐based learning, and so forth and the purpose here is to outline some of what has been done in each aspect. Special focus is given to Genetic Programming‐based EFSs by providing a taxonomy of the main architectures available, as well as by pointing out the gaps that still prevail in the literature. The concluding remarks address some further topics of current research and trends, such as interpretability analysis, multiobjective optimization, and synthesis of a FIS through Evolving methods
- …