    An Overview of Classifier Fusion Methods

    A number of classifier fusion methods have been recently developed opening an alternative approach leading to a potential improvement in the classification performance. As there is little theory of information fusion itself, currently we are faced with different methods designed for different problems and producing different results. This paper gives an overview of classifier fusion methods and attempts to identify new trends that may dominate this area of research in future. A taxonomy of fusion methods trying to bring some order into the existing “pudding of diversities” is also provided

    Development of Machine Learning Techniques for Diabetic Retinopathy Risk Estimation

    La retinopatia diabètica (DR) és una malaltia crònica. És una de les principals complicacions de diabetis i una causa essencial de pèrdua de visió entre les persones que pateixen diabetis. Els pacients diabètics han de ser analitzats periòdicament per tal de detectar signes de desenvolupament de la retinopatia en una fase inicial. El cribratge precoç i freqüent disminueix el risc de pèrdua de visió i minimitza la càrrega als centres assistencials. El nombre dels pacients diabètics està en augment i creixements ràpids, de manera que el fa difícil que consumeix recursos per realitzar un cribatge anual a tots ells. L’objectiu principal d’aquest doctorat. la tesi consisteix en construir un sistema de suport de decisions clíniques (CDSS) basat en dades de registre de salut electrònic (EHR). S'utilitzarà aquest CDSS per estimar el risc de desenvolupar RD. En aquesta tesi doctoral s'estudien mètodes d'aprenentatge automàtic per constuir un CDSS basat en regles lingüístiques difuses. El coneixement expressat en aquest tipus de regles facilita que el metge sàpiga quines combindacions de les condicions són les poden provocar el risc de desenvolupar RD. En aquest treball, proposo un mètode per reduir la incertesa en la classificació dels pacients que utilitzen arbres de decisió difusos (FDT). A continuació es combinen diferents arbres, usant la tècnica de Fuzzy Random Forest per millorar la qualitat de la predicció. A continuació es proposen diverses tècniques d'agregació que millorin la fusió dels resultats que ens dóna cadascun dels arbres FDT. Per millorar la decisió final dels nostres models, proposo tres mesures difuses que s'utilitzen amb integrals de Choquet i Sugeno. La definició d’aquestes mesures difuses es basa en els valors de confiança de les regles. En particular, una d'elles és una mesura difusa que es troba en la qual l'estructura jeràrquica de la FDT és explotada per trobar els valors de la mesura difusa. El resultat final de la recerca feta ha donat lloc a un programari que es pot instal·lar en centres d’assistència primària i hospitals, i pot ser usat pels metges de capçalera per fer l'avaluació preventiva i el cribatge de la Retinopatia Diabètica.La retinopatía diabética (RD) es una enfermedad crónica. Es una de las principales complicaciones de diabetes y una causa esencial de pérdida de visión entre las personas que padecen diabetes. Los pacientes diabéticos deben ser examinados periódicamente para detectar signos de diabetes. desarrollo de retinopatía en una etapa temprana. La detección temprana y frecuente disminuye el riesgo de pérdida de visión y minimiza la carga en los centros de salud. El número de pacientes diabéticos es enorme y está aumentando rápidamente, lo que lo hace difícil y Consume recursos para realizar una evaluación anual para todos ellos. El objetivo principal de esta tesis es construir un sistema de apoyo a la decisión clínica (CDSS) basado en datos de registros de salud electrónicos (EHR). Este CDSS será utilizado para estimar el riesgo de desarrollar RD. En este tesis doctoral se estudian métodos de aprendizaje automático para construir un CDSS basado en reglas lingüísticas difusas. El conocimiento expresado en este tipo de reglas facilita que el médico pueda saber que combinaciones de las condiciones son las que pueden provocar el riesgo de desarrollar RD. En este trabajo propongo un método para reducir la incertidumbre en la clasificación de los pacientes que usan árboles de decisión difusos (FDT). A continuación se combinan diferentes árboles usando la técnica de Fuzzy Random Forest para mejorar la calidad de la predicción. Se proponen también varias políticas para fusionar los resultados de que nos da cada uno de los árboles (FDT). Para mejorar la decisión final propongo tres medidas difusas que se usan con las integrales Choquet y Sugeno. La definición de estas medidas difusas se basa en los valores de confianza de las reglas. En particular, uno de ellos es una medida difusa descomponible en la que se usa la estructura jerárquica del FDT para encontrar los valores de la medida difusa. Como resultado final de la investigación se ha construido un software que puede instalarse en centros de atención médica y hospitales, i que puede ser usado por los médicos de cabecera para hacer la evaluación preventiva y el cribado de la Retinopatía Diabética.Diabetic retinopathy (DR) is a chronic illness. It is one of the main complications of diabetes, and an essential cause of vision loss among people suffering from diabetes. Diabetic patients must be periodically screened in order to detect signs of diabetic retinopathy development in an early stage. Early and frequent screening decreases the risk of vision loss and minimizes the load on the health care centres. The number of the diabetic patients is huge and rapidly increasing so that makes it hard and resource-consuming to perform a yearly screening to all of them. The main goal of this Ph.D. thesis is to build a clinical decision support system (CDSS) based on electronic health record (EHR) data. This CDSS will be utilised to estimate the risk of developing RD. In this Ph.D. thesis, I focus on developing novel interpretable machine learning systems. Fuzzy based systems with linguistic terms are going to be proposed. The output of such systems makes the physician know what combinations of the features that can cause the risk of developing DR. In this work, I propose a method to reduce the uncertainty in classifying diabetic patients using fuzzy decision trees. A Fuzzy Random forest (FRF) approach is proposed as well to estimate the risk for developing DR. Several policies are going to be proposed to merge the classification results achieved by different Fuzzy Decision Trees (FDT) models to improve the quality of the final decision of our models, I propose three fuzzy measures that are used with Choquet and Sugeno integrals. The definition of these fuzzy measures is based on the confidence values of the rules. In particular, one of them is a decomposable fuzzy measure in which the hierarchical structure of the FDT is exploited to find the values of the fuzzy measure. Out of this Ph.D. work, we have built a CDSS software that may be installed in the health care centres and hospitals in order to evaluate and detect Diabetic Retinopathy at early stages

    Multiple classifier fusion using the fuzzy integral.

    Fusion of multiple classifier decisions is a powerful method for increasing classification rates in difficult pattern recognition problems. Researchers have found that in many applications it is better to fuse multiple relatively simple classifiers than to build a single sophisticated classifier to achieve better recognition rates. Ideally, the combination function should take advantage of the strengths of individual classifiers and of all possible subsets of classifiers, avoid their weaknesses, and use all the dynamically available knowledge about the inputs, the outputs, the classes, and the classifiers. Automatic reading of handwritten numerals is a difficult problem because of the great variations involved in the shape of the characters. In this thesis an evidence fusion technique, based on the notion of fuzzy integral is utilized to combine the results of different classifiers and realize a robust algorithm for high accuracy handwritten numeral recognition. Both source relevance as well as source evidence are utilized to achieve significant enhancements. The most important advantage of this system is that not only is the evidence combined but that the relative importance of the different sources is also considered. Various conventional and fuzzy integral based fusion methods are explained in detail and experimental results obtained are compared. A method is introduced to improve the fuzzy densities of the classifiers which would improve the fusion results. In this method we use the correction factors obtained from the performance matrices to alter the initial fuzzy densities. Experiments on handwritten numeral recognition are described and compared. These experiments show that very low error rates can be achieved by fusing several low performance classifiers.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis1999 .B45. Source: Masters Abstracts International, Volume: 39-02, page: 0558. Adviser: M. Ahmadi. Thesis (M.A.Sc.)--University of Windsor (Canada), 1999

    Managing uncertainty in sound based control for an autonomous helicopter

    In this paper we present our ongoing research using a multi-purpose, small and low cost autonomous helicopter platform (Flyper ). We are building on previously achieved stable control using evolutionary tuning. We propose a sound based supervised method to localise the indoor helicopter and extract meaningful information to enable the helicopter to further stabilise its flight and correct its flightpath. Due to the high amount of uncertainty in the data, we propose the use of fuzzy logic in the signal processing of the sound signature. We discuss the benefits and difficulties using type-1 and type-2 fuzzy logic in this real-time systems and give an overview of our proposed system

    From n-grams to n-sets: A Fuzzy-Logic-Based Approach to Shakespearian Authorship Attribution.

    This thesis surveys the principles of Fuzzy Logic as they have been applied in the last three decades in the micro-electronic field and, in the context of resolving problems of authorship verification and attribution shows how these principles can assist with the detection of stylistic similarities or dissimilarities of an anonymous, disputed play to an author’s general or patterns-based known style. The main stylistic markers are the counts of semantic sets of 100 individual words-tokens and an index of counts of these words’ frequencies (a cosine index), as found in the first extract of approximately 10,000 words of each of 27 well attributed Shakespearian plays. Based on these markers, their geometrical representation, fuzzy modelling and on thee ground of Set Theory and Boolean Algebra, in the core part of this thesis three Mamdani (Type-1) genre-based Fuzzy Expert Systems were built for the detection of degrees (measured on a scale from 0 to 1) of Shakespearianness of disputed and, probably, co-authored plays of the early modern English period. Each of these three expert systems is composed of seven input and two output variables that are associated through a set of approximately 30 to 40 rules. There is a detailed description of the properties of the three expert systems’ inference mechanisms and the various experimentation phases. There is also an indicative graphical analysis of the phases of the experimentation and a thorough explanation of terms, such as partial truths membership, approximate reasoning and output centroids on an X-axis of a two-dimensional space. Throughout the thesis there is an extensive demonstration of various Fuzzy Logic techniques, including Sugeno-ANFIS (adaptive neuro-fuzzy inference system), with which the style of Shakespeare can be modelled in order to compare it with well attributed plays of other authors or plays that are not included in the strict Shakespearian canon of the selected 27 well-attributed, sole authored plays. In addition, other relevant issues of stylometric concern are discussed, such as the investigation and classification of known ‘problem’ and disputed plays through holistic classifiers (irrespective of genre). The results of the experimentation advocate the use of this novel, automated and computer simulation-based method of classification in the stylometric field for various purposes. In fact, the three models have succeeded in detecting the low Shakespearianness of non Shakespearian plays and the results they provided for anonymous, disputed plays are in conformance with the general evidence of historical scholarship. Therefore, the original contribution of this thesis is to define fully functional automated fuzzy classifiers of Shakespearianness. The result of this discovery is that we now know that the principles of fuzzy modelling can be applied for the creation of Fuzzy Expert Stylistic Classifiers and the concomitant detection of degrees of similarity of a play under scrutiny with the general or patterns-based known style of a specific author (in our case, Shakespeare). Furthermore, this thesis shows that, given certain premises, counts of words’ frequencies and counts of semantic sets of words can be employed satisfactorily for stylistic discrimination

    Efficient Data Driven Multi Source Fusion

    Data/information fusion is an integral component of many existing and emerging applications; e.g., remote sensing, smart cars, Internet of Things (IoT), and Big Data, to name a few. While fusion aims to achieve better results than what any one individual input can provide, often the challenge is to determine the underlying mathematics for aggregation suitable for an application. In this dissertation, I focus on the following three aspects of aggregation: (i) efficient data-driven learning and optimization, (ii) extensions and new aggregation methods, and (iii) feature and decision level fusion for machine learning with applications to signal and image processing. The Choquet integral (ChI), a powerful nonlinear aggregation operator, is a parametric way (with respect to the fuzzy measure (FM)) to generate a wealth of aggregation operators. The FM has 2N variables and N(2N − 1) constraints for N inputs. As a result, learning the ChI parameters from data quickly becomes impractical for most applications. Herein, I propose a scalable learning procedure (which is linear with respect to training sample size) for the ChI that identifies and optimizes only data-supported variables. As such, the computational complexity of the learning algorithm is proportional to the complexity of the solver used. This method also includes an imputation framework to obtain scalar values for data-unsupported (aka missing) variables and a compression algorithm (lossy or losselss) of the learned variables. I also propose a genetic algorithm (GA) to optimize the ChI for non-convex, multi-modal, and/or analytical objective functions. This algorithm introduces two operators that automatically preserve the constraints; therefore there is no need to explicitly enforce the constraints as is required by traditional GA algorithms. In addition, this algorithm provides an efficient representation of the search space with the minimal set of vertices. Furthermore, I study different strategies for extending the fuzzy integral for missing data and I propose a GOAL programming framework to aggregate inputs from heterogeneous sources for the ChI learning. Last, my work in remote sensing involves visual clustering based band group selection and Lp-norm multiple kernel learning based feature level fusion in hyperspectral image processing to enhance pixel level classification

    Replacing pooling functions in Convolutional Neural Networks by linear combinations of increasing functions

    Traditionally, Convolutional Neural Networks make use of the maximum or arithmetic mean in order to reduce the features extracted by convolutional layers in a downsampling process known as pooling. However, there is no strong argument to settle upon one of the two functions and, in practice, this selection turns to be problem dependent. Further, both of these options ignore possible dependencies among the data. We believe that a combination of both of these functions, as well as of additional ones which may retain different information, can benefit the feature extraction process. In this work, we replace traditional pooling by several alternative functions. In particular, we consider linear combinations of order statistics and generalizations of the Sugeno integral, extending the latter’s domain to the whole real line and setting the theoretical base for their application. We present an alternative pooling layer based on this strategy which we name ‘‘CombPool’’ layer. We replace the pooling layers of three different architectures of increasing complexity by CombPool layers, and empirically prove over multiple datasets that linear combinations outperform traditional pooling functions in most cases. Further, combinations with either the Sugeno integral or one of its generalizations usually yield the best results, proving a strong candidate to apply in most architectures.Tracasa Instrumental (iTRACASA), SpainGobierno de Navarra-Departamento de Universidad, Innovacion y Transformacion Digital, SpainSpanish Ministry of Science, Spain PID2019-108392GB-I00Andalusian Excellence project, Spain PID2019-108392GB-I00Conselho Nacional de Desenvolvimento Cientifico e Tecnologico (CNPQ) PC095-096Fundacao de Amparo a Ciencia e Tecnologia do Estado do Rio Grande do Sul (FAPERGS) P18-FR-4961 301618/2019-4 19/2551-000 1279-