6,756 research outputs found

    Comparison of Fuzzy Integral-Fuzzy Measure based Ensemble Algorithms with the State-of-the-art Ensemble Algorithms

    Get PDF
    The Fuzzy Integral (FI) is a non-linear aggregation operator which enables the fusion of information from multiple sources in respect to a Fuzzy Measure (FM) which captures the worth of both the individual sources and all their possible combinations. Based on the expected potential of non-linear aggregation offered by the FI, its application to decision-level fusion in ensemble classifiers, i.e. to fuse multiple classifiers outputs towards one superior decision level output, has recently been explored. A key example of such a FI-FM ensemble classification method is the Decision-level Fuzzy Integral Multiple Kernel Learning (DeFIMKL) algorithm, which aggregates the outputs of kernel based classifiers through the use of the Choquet FI with respect to a FM learned through a regularised quadratic programming approach. While the approach has been validated against a number of classifiers based on multiple kernel learning, it has thus far not been compared to the state-of-the-art in ensemble classification. Thus, this paper puts forward a detailed comparison of FI-FM based ensemble methods, specifically the DeFIMKL algorithm, with state-of-the art ensemble methods including Adaboost, Bagging, Random Forest and Majority Voting over 20 public datasets from the UCI machine learning repository. The results on the selected datasets suggest that the FI based ensemble classifier performs both well and efficiently, indicating that it is a viable alternative when selecting ensemble classifiers and indicating that the non-linear fusion of decision level outputs offered by the FI provides expected potential and warrants further study

    Fuzzy Integral Driven Ensemble Classification using A Priori Fuzzy Measures

    Get PDF
    Aggregation operators are mathematical functions that enable the fusion of information from multiple sources. Fuzzy Integrals (FIs) are widely used aggregation operators, which combine information in respect to a Fuzzy Measure (FM) which captures the worth of both the individual sources and all their possible combinations. However, FIs suffer from the potential drawback of not fusing information according to the intuitively interpretable FM, leading to non-intuitive results. The latter is particularly relevant when a FM has been defined using external information (e.g. experts). In order to address this and provide an alternative to the FI, the Recursive Average (RAV) aggregation operator was recently proposed which enables intuitive data fusion in respect to a given FM. With an alternative fusion operator in place, in this paper, we define the concept of ‘A Priori’ FMs which are generated based on external information (e.g. classification accuracy) and thus provide an alternative to the traditional approaches of learning or manually specifying FMs. We proceed to develop one specific instance of such an a priori FM to support the decision level fusion step in ensemble classification. We evaluate the resulting approach by contrasting the performance of the ensemble classifiers for different FMs, including the recently introduced Uriz and the Sugeno lambda-measure; as well as by employing both the Choquet FI and the RAV as possible fusion operators. Results are presented for 20 datasets from machine learning repositories and contextualised to the wider literature by comparing them to state-of-the-art ensemble classifiers such as Adaboost, Bagging, Random Forest and Majority Voting

    Power System Parameters Forecasting Using Hilbert-Huang Transform and Machine Learning

    Get PDF
    A novel hybrid data-driven approach is developed for forecasting power system parameters with the goal of increasing the efficiency of short-term forecasting studies for non-stationary time-series. The proposed approach is based on mode decomposition and a feature analysis of initial retrospective data using the Hilbert-Huang transform and machine learning algorithms. The random forests and gradient boosting trees learning techniques were examined. The decision tree techniques were used to rank the importance of variables employed in the forecasting models. The Mean Decrease Gini index is employed as an impurity function. The resulting hybrid forecasting models employ the radial basis function neural network and support vector regression. Apart from introduction and references the paper is organized as follows. The section 2 presents the background and the review of several approaches for short-term forecasting of power system parameters. In the third section a hybrid machine learning-based algorithm using Hilbert-Huang transform is developed for short-term forecasting of power system parameters. Fourth section describes the decision tree learning algorithms used for the issue of variables importance. Finally in section six the experimental results in the following electric power problems are presented: active power flow forecasting, electricity price forecasting and for the wind speed and direction forecasting

    Feature and Decision Level Fusion Using Multiple Kernel Learning and Fuzzy Integrals

    Get PDF
    The work collected in this dissertation addresses the problem of data fusion. In other words, this is the problem of making decisions (also known as the problem of classification in the machine learning and statistics communities) when data from multiple sources are available, or when decisions/confidence levels from a panel of decision-makers are accessible. This problem has become increasingly important in recent years, especially with the ever-increasing popularity of autonomous systems outfitted with suites of sensors and the dawn of the ``age of big data.\u27\u27 While data fusion is a very broad topic, the work in this dissertation considers two very specific techniques: feature-level fusion and decision-level fusion. In general, the fusion methods proposed throughout this dissertation rely on kernel methods and fuzzy integrals. Both are very powerful tools, however, they also come with challenges, some of which are summarized below. I address these challenges in this dissertation. Kernel methods for classification is a well-studied area in which data are implicitly mapped from a lower-dimensional space to a higher-dimensional space to improve classification accuracy. However, for most kernel methods, one must still choose a kernel to use for the problem. Since there is, in general, no way of knowing which kernel is the best, multiple kernel learning (MKL) is a technique used to learn the aggregation of a set of valid kernels into a single (ideally) superior kernel. The aggregation can be done using weighted sums of the pre-computed kernels, but determining the summation weights is not a trivial task. Furthermore, MKL does not work well with large datasets because of limited storage space and prediction speed. These challenges are tackled by the introduction of many new algorithms in the following chapters. I also address MKL\u27s storage and speed drawbacks, allowing MKL-based techniques to be applied to big data efficiently. Some algorithms in this work are based on the Choquet fuzzy integral, a powerful nonlinear aggregation operator parameterized by the fuzzy measure (FM). These decision-level fusion algorithms learn a fuzzy measure by minimizing a sum of squared error (SSE) criterion based on a set of training data. The flexibility of the Choquet integral comes with a cost, however---given a set of N decision makers, the size of the FM the algorithm must learn is 2N. This means that the training data must be diverse enough to include 2N independent observations, though this is rarely encountered in practice. I address this in the following chapters via many different regularization functions, a popular technique in machine learning and statistics used to prevent overfitting and increase model generalization. Finally, it is worth noting that the aggregation behavior of the Choquet integral is not intuitive. I tackle this by proposing a quantitative visualization strategy allowing the FM and Choquet integral behavior to be shown simultaneously

    Development of Machine Learning Techniques for Diabetic Retinopathy Risk Estimation

    Get PDF
    La retinopatia diabètica (DR) és una malaltia crònica. És una de les principals complicacions de diabetis i una causa essencial de pèrdua de visió entre les persones que pateixen diabetis. Els pacients diabètics han de ser analitzats periòdicament per tal de detectar signes de desenvolupament de la retinopatia en una fase inicial. El cribratge precoç i freqüent disminueix el risc de pèrdua de visió i minimitza la càrrega als centres assistencials. El nombre dels pacients diabètics està en augment i creixements ràpids, de manera que el fa difícil que consumeix recursos per realitzar un cribatge anual a tots ells. L’objectiu principal d’aquest doctorat. la tesi consisteix en construir un sistema de suport de decisions clíniques (CDSS) basat en dades de registre de salut electrònic (EHR). S'utilitzarà aquest CDSS per estimar el risc de desenvolupar RD. En aquesta tesi doctoral s'estudien mètodes d'aprenentatge automàtic per constuir un CDSS basat en regles lingüístiques difuses. El coneixement expressat en aquest tipus de regles facilita que el metge sàpiga quines combindacions de les condicions són les poden provocar el risc de desenvolupar RD. En aquest treball, proposo un mètode per reduir la incertesa en la classificació dels pacients que utilitzen arbres de decisió difusos (FDT). A continuació es combinen diferents arbres, usant la tècnica de Fuzzy Random Forest per millorar la qualitat de la predicció. A continuació es proposen diverses tècniques d'agregació que millorin la fusió dels resultats que ens dóna cadascun dels arbres FDT. Per millorar la decisió final dels nostres models, proposo tres mesures difuses que s'utilitzen amb integrals de Choquet i Sugeno. La definició d’aquestes mesures difuses es basa en els valors de confiança de les regles. En particular, una d'elles és una mesura difusa que es troba en la qual l'estructura jeràrquica de la FDT és explotada per trobar els valors de la mesura difusa. El resultat final de la recerca feta ha donat lloc a un programari que es pot instal·lar en centres d’assistència primària i hospitals, i pot ser usat pels metges de capçalera per fer l'avaluació preventiva i el cribatge de la Retinopatia Diabètica.La retinopatía diabética (RD) es una enfermedad crónica. Es una de las principales complicaciones de diabetes y una causa esencial de pérdida de visión entre las personas que padecen diabetes. Los pacientes diabéticos deben ser examinados periódicamente para detectar signos de diabetes. desarrollo de retinopatía en una etapa temprana. La detección temprana y frecuente disminuye el riesgo de pérdida de visión y minimiza la carga en los centros de salud. El número de pacientes diabéticos es enorme y está aumentando rápidamente, lo que lo hace difícil y Consume recursos para realizar una evaluación anual para todos ellos. El objetivo principal de esta tesis es construir un sistema de apoyo a la decisión clínica (CDSS) basado en datos de registros de salud electrónicos (EHR). Este CDSS será utilizado para estimar el riesgo de desarrollar RD. En este tesis doctoral se estudian métodos de aprendizaje automático para construir un CDSS basado en reglas lingüísticas difusas. El conocimiento expresado en este tipo de reglas facilita que el médico pueda saber que combinaciones de las condiciones son las que pueden provocar el riesgo de desarrollar RD. En este trabajo propongo un método para reducir la incertidumbre en la clasificación de los pacientes que usan árboles de decisión difusos (FDT). A continuación se combinan diferentes árboles usando la técnica de Fuzzy Random Forest para mejorar la calidad de la predicción. Se proponen también varias políticas para fusionar los resultados de que nos da cada uno de los árboles (FDT). Para mejorar la decisión final propongo tres medidas difusas que se usan con las integrales Choquet y Sugeno. La definición de estas medidas difusas se basa en los valores de confianza de las reglas. En particular, uno de ellos es una medida difusa descomponible en la que se usa la estructura jerárquica del FDT para encontrar los valores de la medida difusa. Como resultado final de la investigación se ha construido un software que puede instalarse en centros de atención médica y hospitales, i que puede ser usado por los médicos de cabecera para hacer la evaluación preventiva y el cribado de la Retinopatía Diabética.Diabetic retinopathy (DR) is a chronic illness. It is one of the main complications of diabetes, and an essential cause of vision loss among people suffering from diabetes. Diabetic patients must be periodically screened in order to detect signs of diabetic retinopathy development in an early stage. Early and frequent screening decreases the risk of vision loss and minimizes the load on the health care centres. The number of the diabetic patients is huge and rapidly increasing so that makes it hard and resource-consuming to perform a yearly screening to all of them. The main goal of this Ph.D. thesis is to build a clinical decision support system (CDSS) based on electronic health record (EHR) data. This CDSS will be utilised to estimate the risk of developing RD. In this Ph.D. thesis, I focus on developing novel interpretable machine learning systems. Fuzzy based systems with linguistic terms are going to be proposed. The output of such systems makes the physician know what combinations of the features that can cause the risk of developing DR. In this work, I propose a method to reduce the uncertainty in classifying diabetic patients using fuzzy decision trees. A Fuzzy Random forest (FRF) approach is proposed as well to estimate the risk for developing DR. Several policies are going to be proposed to merge the classification results achieved by different Fuzzy Decision Trees (FDT) models to improve the quality of the final decision of our models, I propose three fuzzy measures that are used with Choquet and Sugeno integrals. The definition of these fuzzy measures is based on the confidence values of the rules. In particular, one of them is a decomposable fuzzy measure in which the hierarchical structure of the FDT is exploited to find the values of the fuzzy measure. Out of this Ph.D. work, we have built a CDSS software that may be installed in the health care centres and hospitals in order to evaluate and detect Diabetic Retinopathy at early stages

    Advancing ensemble learning performance through data transformation and classifiers fusion in granular computing context

    Get PDF
    Classification is a special type of machine learning tasks, which is essentially achieved by training a classifier that can be used to classify new instances. In order to train a high performance classifier, it is crucial to extract representative features from raw data, such as text and images. In reality, instances could be highly diverse even if they belong to the same class, which indicates different instances of the same class could represent very different characteristics. For example, in a facial expression recognition task, some instances may be better described by Histogram of Oriented Gradients features, while others may be better presented by Local Binary Patterns features. From this point of view, it is necessary to adopt ensemble learning to train different classifiers on different feature sets and to fuse these classifiers towards more accurate classification of each instance. On the other hand, different algorithms are likely to show different suitability for training classifiers on different feature sets. It shows again the necessity to adopt ensemble learning towards advances in the classification performance. Furthermore, a multi-class classification task would become increasingly more complex when the number of classes is increased, i.e. it would lead to the increased difficulty in terms of discriminating different classes. In this paper, we propose an ensemble learning framework that involves transforming a multi-class classification task into a number of binary classification tasks and fusion of classifiers trained on different feature sets by using different learning algorithms. We report experimental studies on a UCI data set on Sonar and the CK+ data set on facial expression recognition. The results show that our proposed ensemble learning approach leads to considerable advances in classification performance, in comparison with popular learning approaches including decision tree ensembles and deep neural networks. In practice, the proposed approach can be used effectively to build an ensemble of ensembles acting as a group of expert systems, which show the capability to achieve more stable performance of pattern recognition, in comparison with building a single classifier that acts as a single expert system

    Shape recognition through multi-level fusion of features and classifiers

    Get PDF
    Shape recognition is a fundamental problem and a special type of image classification, where each shape is considered as a class. Current approaches to shape recognition mainly focus on designing low-level shape descriptors, and classify them using some machine learning approaches. In order to achieve effective learning of shape features, it is essential to ensure that a comprehensive set of high quality features can be extracted from the original shape data. Thus we have been motivated to develop methods of fusion of features and classifiers for advancing the classification performance. In this paper, we propose a multi-level framework for fusion of features and classifiers in the setting of gran-ular computing. The proposed framework involves creation of diversity among classifiers, through adopting feature selection and fusion to create diverse feature sets and to train diverse classifiers using different learn-Xinming Wang algorithms. The experimental results show that the proposed multi-level framework can effectively create diversity among classifiers leading to considerable advances in the classification performance
    • …
    corecore