576 research outputs found
On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems
We present a new distributed fuzzy partitioning method to reduce the
complexity of multi-way fuzzy decision trees in Big Data classification
problems. The proposed algorithm builds a fixed number of fuzzy sets for all
variables and adjusts their shape and position to the real distribution of
training data. A two-step process is applied : 1) transformation of the
original distribution into a standard uniform distribution by means of the
probability integral transform. Since the original distribution is generally
unknown, the cumulative distribution function is approximated by computing the
q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy
partition in the transformed attribute space using a fixed number of equally
distributed triangular membership functions. Despite the aforementioned
transformation, the definition of every fuzzy set in the original space can be
recovered by applying the inverse cumulative distribution function (also known
as quantile function). The experimental results reveal that the proposed
methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT)
induction algorithm to maintain classification accuracy with up to 6 million
fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData
Congress). arXiv admin note: text overlap with arXiv:1902.0935
A new approach to fuzzy random forest generation
Random forests have proved to be very effective classifiers, which can achieve very high accuracies. Although a number of papers have discussed the use of fuzzy sets for coping with uncertain data in decision tree learning, fuzzy random forests have not been particularly investigated in the fuzzy community. In this paper, we first propose a simple method for generating fuzzy decision trees by creating fuzzy partitions for continuous variables during the learning phase. Then, we discuss how the method can be used for generating forests of fuzzy decision trees. Finally, we show how these fuzzy random forests achieve accuracies higher than two fuzzy rule-based classifiers recently proposed in the literature. Also, we highlight how fuzzy random forests are more tolerant to noise in datasets than classical crisp random forests
On Distributed Fuzzy Decision Trees for Big Data
Fuzzy decision trees (FDTs) have shown to be an effective solution in the framework of fuzzy classification. The approaches proposed so far to FDT learning, however, have generally neglected time and space requirements. In this paper, we propose a distributed FDT learning scheme shaped according to the MapReduce programming model for generating both binary and multiway FDTs from big data. The scheme relies on a novel distributed fuzzy discretizer that generates a strong fuzzy partition for each continuous attribute based on fuzzy information entropy. The fuzzy partitions are, therefore, used as an input to the FDT learning algorithm, which employs fuzzy information gain for selecting the attributes at the decision nodes. We have implemented the FDT learning scheme on the Apache Spark framework. We have used ten real-world publicly available big datasets for evaluating the behavior of the scheme along three dimensions: 1) performance in terms of classification accuracy, model complexity, and execution time; 2) scalability varying the number of computing units; and 3) ability to efficiently accommodate an increasing dataset size. We have demonstrated that the proposed scheme turns out to be suitable for managing big datasets even with a modest commodity hardware support. Finally, we have used the distributed decision tree learning algorithm implemented in the MLLib library and the Chi-FRBCS-BigData algorithm, a MapReduce distributed fuzzy rule-based classification system, for comparative analysis. © 1993-2012 IEEE
Global Entropy Based Greedy Algorithm for discretization
Discretization algorithm is a crucial step to not only achieve summarization of continuous attributes but also better performance in classification that requires discrete values as input. In this thesis, I propose a supervised discretization method, Global Entropy Based Greedy algorithm, which is based on the Information Entropy Minimization. Experimental results show that the proposed method outperforms state of the art methods with well-known benchmarking datasets. To further improve the proposed method, a new approach for stop criterion that is based on the change rate of entropy was also explored. From the experimental analysis, it is noticed that the threshold based on the decreasing rate of entropy could be more effective than a constant number of intervals in the classification such as C5.0
A Survey of Neural Trees
Neural networks (NNs) and decision trees (DTs) are both popular models of
machine learning, yet coming with mutually exclusive advantages and
limitations. To bring the best of the two worlds, a variety of approaches are
proposed to integrate NNs and DTs explicitly or implicitly. In this survey,
these approaches are organized in a school which we term as neural trees (NTs).
This survey aims to present a comprehensive review of NTs and attempts to
identify how they enhance the model interpretability. We first propose a
thorough taxonomy of NTs that expresses the gradual integration and
co-evolution of NNs and DTs. Afterward, we analyze NTs in terms of their
interpretability and performance, and suggest possible solutions to the
remaining challenges. Finally, this survey concludes with a discussion about
other considerations like conditional computation and promising directions
towards this field. A list of papers reviewed in this survey, along with their
corresponding codes, is available at:
https://github.com/zju-vipa/awesome-neural-treesComment: 35 pages, 7 figures and 1 tabl
Development of Machine Learning Techniques for Diabetic Retinopathy Risk Estimation
La retinopatia diabètica (DR) és una malaltia crònica. És una de les principals complicacions de
diabetis i una causa essencial de pèrdua de visió entre les persones que pateixen diabetis.
Els pacients diabètics han de ser analitzats periòdicament per tal de detectar signes de
desenvolupament de la retinopatia en una fase inicial. El cribratge precoç i freqüent disminueix
el risc de pèrdua de visió i minimitza la cà rrega als centres assistencials. El nombre
dels pacients diabètics està en augment i creixements rà pids, de manera que el fa difÃcil
que consumeix recursos per realitzar un cribatge anual a tots ells.
L’objectiu principal d’aquest doctorat. la tesi consisteix en construir un sistema de suport de decisions clÃniques
(CDSS) basat en dades de registre de salut electrònic (EHR). S'utilitzarà aquest CDSS per estimar el risc de desenvolupar RD.
En aquesta tesi doctoral s'estudien mètodes d'aprenentatge automà tic per constuir un CDSS basat en regles lingüÃstiques difuses. El coneixement expressat en aquest tipus de regles facilita que el metge sà piga quines combindacions de les condicions són les poden provocar el risc de desenvolupar RD.
En aquest treball, proposo un mètode per reduir la incertesa en la classificació dels
pacients que utilitzen arbres de decisió difusos (FDT). A continuació es combinen diferents arbres, usant la tècnica de
Fuzzy Random Forest per millorar la qualitat de la predicció.
A continuació es proposen diverses tècniques d'agregació que millorin la fusió dels resultats que ens dóna
cadascun dels arbres FDT. Per millorar la decisió final dels nostres models, proposo tres mesures difuses que
s'utilitzen amb integrals de Choquet i Sugeno. La definició d’aquestes mesures difuses es basa en els valors de confiança de les regles. En particular, una d'elles és una mesura difusa que es troba en la qual
l'estructura jerà rquica de la FDT és explotada per trobar els valors de la mesura difusa.
El resultat final de la recerca feta ha donat lloc a un programari que es pot instal·lar en centres d’assistència primà ria i hospitals, i pot ser usat pels metges de capçalera per fer l'avaluació preventiva i el cribatge de la Retinopatia Diabètica.La retinopatÃa diabética (RD) es una enfermedad crónica. Es una de las principales complicaciones de
diabetes y una causa esencial de pérdida de visión entre las personas que padecen diabetes.
Los pacientes diabéticos deben ser examinados periódicamente para detectar signos de diabetes.
desarrollo de retinopatÃa en una etapa temprana. La detección temprana y frecuente disminuye
el riesgo de pérdida de visión y minimiza la carga en los centros de salud. El número
de pacientes diabéticos es enorme y está aumentando rápidamente, lo que lo hace difÃcil y
Consume recursos para realizar una evaluación anual para todos ellos.
El objetivo principal de esta tesis es construir un sistema de apoyo a la decisión clÃnica
(CDSS) basado en datos de registros de salud electrónicos (EHR). Este CDSS será utilizado
para estimar el riesgo de desarrollar RD.
En este tesis doctoral se estudian métodos de aprendizaje automático para construir un CDSS basado
en reglas lingüÃsticas difusas. El conocimiento expresado en este tipo de reglas facilita que el médico
pueda saber que combinaciones de las condiciones son las que pueden provocar el riesgo de desarrollar RD.
En este trabajo propongo un método para reducir la incertidumbre en la clasificación de los
pacientes que usan árboles de decisión difusos (FDT). A continuación se combinan diferentes árboles usando
la técnica de Fuzzy Random Forest para mejorar la calidad de la predicción.
Se proponen también varias polÃticas para fusionar los resultados de que nos da cada uno de los árboles (FDT).
Para mejorar la decisión final propongo tres medidas difusas que se usan con las integrales Choquet y Sugeno.
La definición de estas medidas difusas se basa en los valores de confianza de
las reglas. En particular, uno de ellos es una medida difusa descomponible en la que se usa
la estructura jerárquica del FDT para encontrar los valores de la medida difusa.
Como resultado final de la investigación se ha construido un software que puede instalarse en centros de atención médica y hospitales, i que puede ser usado por los médicos de cabecera para hacer la evaluación preventiva y
el cribado de la RetinopatÃa Diabética.Diabetic retinopathy (DR) is a chronic illness. It is one of the main complications of
diabetes, and an essential cause of vision loss among people suffering from diabetes.
Diabetic patients must be periodically screened in order to detect signs of diabetic
retinopathy development in an early stage. Early and frequent screening decreases
the risk of vision loss and minimizes the load on the health care centres. The number
of the diabetic patients is huge and rapidly increasing so that makes it hard and
resource-consuming to perform a yearly screening to all of them.
The main goal of this Ph.D. thesis is to build a clinical decision support system
(CDSS) based on electronic health record (EHR) data. This CDSS will be utilised
to estimate the risk of developing RD.
In this Ph.D. thesis, I focus on developing novel interpretable machine learning
systems. Fuzzy based systems with linguistic terms are going to be proposed. The
output of such systems makes the physician know what combinations of the features
that can cause the risk of developing DR.
In this work, I propose a method to reduce the uncertainty in classifying diabetic
patients using fuzzy decision trees. A Fuzzy Random forest (FRF) approach is
proposed as well to estimate the risk for developing DR.
Several policies are going to be proposed to merge the classification results
achieved by different Fuzzy Decision Trees (FDT) models to improve the quality of
the final decision of our models, I propose three fuzzy measures that are used with Choquet and Sugeno integrals.
The definition of these fuzzy measures is based on the confidence values of
the rules. In particular, one of them is a decomposable fuzzy measure in which the
hierarchical structure of the FDT is exploited to find the values of the fuzzy measure.
Out of this Ph.D. work, we have built a CDSS software that may be installed in the health care centres and hospitals
in order to evaluate and detect Diabetic Retinopathy at early stages
- …