56 research outputs found

    Development of Machine Learning Techniques for Diabetic Retinopathy Risk Estimation

    Get PDF
    La retinopatia diabètica (DR) és una malaltia crònica. És una de les principals complicacions de diabetis i una causa essencial de pèrdua de visió entre les persones que pateixen diabetis. Els pacients diabètics han de ser analitzats periòdicament per tal de detectar signes de desenvolupament de la retinopatia en una fase inicial. El cribratge precoç i freqüent disminueix el risc de pèrdua de visió i minimitza la càrrega als centres assistencials. El nombre dels pacients diabètics està en augment i creixements ràpids, de manera que el fa difícil que consumeix recursos per realitzar un cribatge anual a tots ells. L’objectiu principal d’aquest doctorat. la tesi consisteix en construir un sistema de suport de decisions clíniques (CDSS) basat en dades de registre de salut electrònic (EHR). S'utilitzarà aquest CDSS per estimar el risc de desenvolupar RD. En aquesta tesi doctoral s'estudien mètodes d'aprenentatge automàtic per constuir un CDSS basat en regles lingüístiques difuses. El coneixement expressat en aquest tipus de regles facilita que el metge sàpiga quines combindacions de les condicions són les poden provocar el risc de desenvolupar RD. En aquest treball, proposo un mètode per reduir la incertesa en la classificació dels pacients que utilitzen arbres de decisió difusos (FDT). A continuació es combinen diferents arbres, usant la tècnica de Fuzzy Random Forest per millorar la qualitat de la predicció. A continuació es proposen diverses tècniques d'agregació que millorin la fusió dels resultats que ens dóna cadascun dels arbres FDT. Per millorar la decisió final dels nostres models, proposo tres mesures difuses que s'utilitzen amb integrals de Choquet i Sugeno. La definició d’aquestes mesures difuses es basa en els valors de confiança de les regles. En particular, una d'elles és una mesura difusa que es troba en la qual l'estructura jeràrquica de la FDT és explotada per trobar els valors de la mesura difusa. El resultat final de la recerca feta ha donat lloc a un programari que es pot instal·lar en centres d’assistència primària i hospitals, i pot ser usat pels metges de capçalera per fer l'avaluació preventiva i el cribatge de la Retinopatia Diabètica.La retinopatía diabética (RD) es una enfermedad crónica. Es una de las principales complicaciones de diabetes y una causa esencial de pérdida de visión entre las personas que padecen diabetes. Los pacientes diabéticos deben ser examinados periódicamente para detectar signos de diabetes. desarrollo de retinopatía en una etapa temprana. La detección temprana y frecuente disminuye el riesgo de pérdida de visión y minimiza la carga en los centros de salud. El número de pacientes diabéticos es enorme y está aumentando rápidamente, lo que lo hace difícil y Consume recursos para realizar una evaluación anual para todos ellos. El objetivo principal de esta tesis es construir un sistema de apoyo a la decisión clínica (CDSS) basado en datos de registros de salud electrónicos (EHR). Este CDSS será utilizado para estimar el riesgo de desarrollar RD. En este tesis doctoral se estudian métodos de aprendizaje automático para construir un CDSS basado en reglas lingüísticas difusas. El conocimiento expresado en este tipo de reglas facilita que el médico pueda saber que combinaciones de las condiciones son las que pueden provocar el riesgo de desarrollar RD. En este trabajo propongo un método para reducir la incertidumbre en la clasificación de los pacientes que usan árboles de decisión difusos (FDT). A continuación se combinan diferentes árboles usando la técnica de Fuzzy Random Forest para mejorar la calidad de la predicción. Se proponen también varias políticas para fusionar los resultados de que nos da cada uno de los árboles (FDT). Para mejorar la decisión final propongo tres medidas difusas que se usan con las integrales Choquet y Sugeno. La definición de estas medidas difusas se basa en los valores de confianza de las reglas. En particular, uno de ellos es una medida difusa descomponible en la que se usa la estructura jerárquica del FDT para encontrar los valores de la medida difusa. Como resultado final de la investigación se ha construido un software que puede instalarse en centros de atención médica y hospitales, i que puede ser usado por los médicos de cabecera para hacer la evaluación preventiva y el cribado de la Retinopatía Diabética.Diabetic retinopathy (DR) is a chronic illness. It is one of the main complications of diabetes, and an essential cause of vision loss among people suffering from diabetes. Diabetic patients must be periodically screened in order to detect signs of diabetic retinopathy development in an early stage. Early and frequent screening decreases the risk of vision loss and minimizes the load on the health care centres. The number of the diabetic patients is huge and rapidly increasing so that makes it hard and resource-consuming to perform a yearly screening to all of them. The main goal of this Ph.D. thesis is to build a clinical decision support system (CDSS) based on electronic health record (EHR) data. This CDSS will be utilised to estimate the risk of developing RD. In this Ph.D. thesis, I focus on developing novel interpretable machine learning systems. Fuzzy based systems with linguistic terms are going to be proposed. The output of such systems makes the physician know what combinations of the features that can cause the risk of developing DR. In this work, I propose a method to reduce the uncertainty in classifying diabetic patients using fuzzy decision trees. A Fuzzy Random forest (FRF) approach is proposed as well to estimate the risk for developing DR. Several policies are going to be proposed to merge the classification results achieved by different Fuzzy Decision Trees (FDT) models to improve the quality of the final decision of our models, I propose three fuzzy measures that are used with Choquet and Sugeno integrals. The definition of these fuzzy measures is based on the confidence values of the rules. In particular, one of them is a decomposable fuzzy measure in which the hierarchical structure of the FDT is exploited to find the values of the fuzzy measure. Out of this Ph.D. work, we have built a CDSS software that may be installed in the health care centres and hospitals in order to evaluate and detect Diabetic Retinopathy at early stages

    Ensemble classification of incomplete data – a non-imputation approach with an application in ovarian tumour diagnosis support

    Get PDF
    Wydział Matematyki i InformatykiW niniejszej pracy doktorskiej zająłem się problemem klasyfikacji danych niekompletnych. Motywacja do podjęcia badań ma swoje źródło w medycynie, gdzie bardzo często występuje zjawisko braku danych. Najpopularniejszą metodą radzenia sobie z tym problemem jest imputacja danych, będąca uzupełnieniem brakujących wartości na podstawie statystycznych zależności między cechami. W moich badaniach przyjąłem inną strategię rozwiązania tego problemu. Wykorzystując opracowane wcześniej klasyfikatory można przekształcić je do formy, która zwraca przedział możliwych predykcji. Następnie, poprzez zastosowanie operatorów agregacji oraz metod progowania, można dokonać finalnej klasyfikacji. W niniejszej pracy pokazuję jak dokonać ww. przekształcenia klasyfikatorów oraz jak wykorzystać strategie agregacji danych przedziałowych do klasyfikacji. Opracowane przeze mnie metody podnoszą jakość klasyfikacji danych niekompletnych w problemie wspomagania diagnostyki guzów jajnika. Dodatkowa analiza wyników na zewnętrznych zbiorach danych z repozytorium uczenia maszynowego Uniwersytetu Kalifornijskiego w Irvine (UCI) wskazuje, że przedstawione metody są komplementarne z imputacją.In this doctoral dissertation I focus on the problem of classification of incomplete data. The motivation for the research comes from medicine, where missing data phenomena are commonly encountered. The most popular method of dealing with data missingness is imputation; that is, inserting missing data on the basis of statistical relationships among features. In my research I choose a different strategy for dealing with this issue. Classifiers of a type previously developed can be transformed to a form which returns an interval of possible predictions. In the next step, with the use of aggregation operators and thresholding methods, one can make a final classification. I show how to make such transformations of classifiers and how to use aggregation strategies for interval data classification. These methods improve the quality of the process of classification of incomplete data in the problem of ovarian tumour diagnosis. Additional analysis carried out on external datasets from the University of California, Irvine (UCI) Machine Learning Repository shows that the aforementioned methods are complementary to imputation

    Editor's Note

    Get PDF
    Artificial Intelligence has become nowadays one of the main relevant technologies that is driven us to a new revolution, a change in society, just as well as other human inventions, such as navigation, steam machines, or electricity did in our past. There are several ways in which AI might be developed, and the European Union has chosen a path, a way to transit through this revolution, in which Artificial Intelligence will be a tool at the service of Humanity. That was precisely the motto of the 2020 European Conference on Artificial Intelligence (“Paving the way towards Human-Centric AI”), of which these special issue is a selection of the best papers selected by the organizers of some of the Workshops in ECAI 2020

    The Semantics of History. Interdisciplinary Categories and Methods for Digital Historical Research

    Get PDF
    This paper aims at introducing and discussing the data modelling and labelling methods for interdisciplinary and digital research in History developed and used by the authors. Our approach suggests the development of a conceptual framework for interdisciplinary research in history as a much-needed strategy to ensure that historians use all vestiges from the past regardless of their origin or support for the construction of historical discourse. By labelling Units of Topography and Actors in a wide range of historical sources and exploiting the obtained data, we use the Monastery of Sant Genís de Rocafort (Martorell, Spain) as a lab example of our method. This should lead researchers to the development of an integrated historical discourse maximizing the potential of interdisciplinary and fair research and minimizing the risks of bias

    Incorporating fuzzy-based methods to deep learning models for semantic segmentation

    Get PDF
    This thesis focuses on improving the workflow of semantic segmentation through a combination of reducing model complexity, improving segmentation accuracy, and making semantic segmentation results more reliable and robust. Semantic segmentation refers to pixel-level classification, the objective of which is to classify each pixel of the input image into different categories. The process typically consists of three steps: model construction, training, and application. Thus, in this thesis, fuzzy-based techniques are utilized in the aforementioned three steps to improve semantic segmentation workflow . The widely-used semantic segmentation models normally extract and aggregate spatial information and channel-wise features simultaneously. In order to achieve promising segmentation performance, it is required to involve numerous learnable parameters, which increase the model's complexity. Thus, decoupling the information fusion tasks is an important approach in the exploration of semantic segmentation models. Fuzzy integrals are effective for fusing information, and some special fuzzy integral operators (OWA) are free of parameters and easy to implement in deep-learning models. Therefore, a novel fuzzy integral module that includes an additional convolutional layer for feature map dimensionality reduction and an OWA layer for information fusion across feature channels is designed. The proposed fuzzy integral module can be flexibly integrated into existing semantic segmentation models, and then help reduce parameters and save memory. Following the exploration of semantic segmentation models, the collected data is used to train the model. Note that the precise delineation of object boundaries is a key aspect of semantic segmentation. In order to make the segmentation model pay more attention to the boundary, a special boundary-wise loss function is desirable in the segmentation model training phase. Fuzzy rough sets are normally utilized to measure the relationship between two sets. Thus, in this thesis, to improve the boundary accuracy, fuzzy rough sets are leveraged to calculate a boundary-wise loss, which is the difference between the boundary sets of the predicted image and the ground truth image. After completing the training process with the proposed novel loss, the next step for semantic segmentation is to apply the pre-trained segmentation model to segment new images. One challenge is that there are no ground truth images to quantify the segmentation quality in the real-world application of semantic segmentation models. Therefore, it is crucial to design a quality quantification algorithm to infer image-level segmentation performance and improve the credibility of semantic segmentation models. In this thesis, a novel quality quantification algorithm based on fuzzy uncertainty is proposed as part of the model inference process without accessing ground truth images. Moreover, to further explore the practical application of the proposed quality quantification algorithm in clinical settings, this thesis goes beyond public datasets and delves into a real-world case study involving cardiac MRI segmentation. Additionally, as clinicians also provide the level of uncertainty to measure their confidence when annotating to generate ground truth images (human-based uncertainty), the correlation between human-based uncertainty and AI-based uncertainty (calculated by the proposed quality quantification algorithm) is deeply investigated. Comprehensive experiments are conducted in this thesis to demonstrate that the integration of fuzzy-based technologies can enhance the efficiency, accuracy, and reliability of semantic segmentation models compared to those without such methods

    Uncertainty-wise software anti-patterns detection: A possibilistic evolutionary machine learning approach

    Get PDF
    Context: Code smells (a.k.a. anti-patterns) are manifestations of poor design solutions that can deteriorate software maintainability and evolution. Research gap: Existing works did not take into account the issue of uncertain class labels, which is an important inherent characteristic of the smells detection problem. More precisely, two human experts may have different degrees of uncertainty about the smelliness of a particular software class not only for the smell detection task but also for the smell type identification one. Unluckily, existing approaches usually reject and/or ignore uncertain data that correspond to software classes (i.e. dataset instances) with uncertain labels. Throwing away and/or disregarding the uncertainty factor could considerably degrade the detection/identification process effectiveness. From a solution approach viewpoint, there is no work in the literature that proposed a method that is able to detect and/or identify code smells while preserving the uncertainty aspect. Objective: The main goal of our research work is to handle the uncertainty factor, issued from human experts, in detecting and/or identifying code smells by proposing an evolutionary approach that is able to deal with anti-patterns classification with uncertain labels. Method: We suggest Bi-ADIPOK, as an effective search-based tool that is capable to tackle the previously mentioned challenge for both detection and identification cases. The proposed method corresponds to an EA (Evolutionary Algorithm) that optimizes a set of detectors encoded as PK-NNs (Possibilistic K-nearest neighbors) based on a bi-level hierarchy, in which the upper level role consists on finding the optimal PK-NNs parameters, while the lower level one is to generate the PK-NNs. A newly fitness function has been proposed fitness function PomAURPC-OVA_dist (Possibilistic modified Area Under Recall Precision Curve One-Versus-All_distance, abbreviated PAURPC_d in this paper). Bi-ADIPOK is able to deal with label uncertainty using some concepts stemming from the Possibility Theory. Furthermore, the PomAURPC-OVA_dist is capable to process the uncertainty issue even with imbalanced data. We notice that Bi-ADIPOK is first built and then validated using a possibilistic base of smell examples that simulates and mimics the subjectivity of software engineers opinions. Results: The statistical analysis of the obtained results on a set of comparative experiments with respect to four relevant state-of-the-art methods shows the merits of our proposal. The obtained detection results demonstrate that, for the uncertain environment, the PomAURPC-OVA_dist of Bi-ADIPOK ranges between 0.902 and 0.932 and its IAC lies between 0.9108 and 0.9407, while for the certain environment, the PomAURPC-OVA_dist lies between 0.928 and 0.955 and the IAC ranges between 0.9477 and 0.9622. Similarly, the identification results, for the uncertain environment, indicate that the PomAURPC-OVA_dist of Bi-ADIPOK varies between 0.8576 and 0.9273 and its IAC is between 0.8693 and 0.9318. For the certain environment, the PomAURPC-OVA_dist lies between 0.8613 and 0.9351 and the IAC values are between 0.8672 and 0.9476. With uncertain data, Bi-ADIPOK can find 35% more code smells than the second best approach (i.e., BLOP). Furthermore, Bi-ADIPOK has succeeded to reduce the number of false alarms (i.e., misclassified smelly instances) by 12%. In addition, our proposed approach can identify 43% more smell types than BLOP and reduces the number of false alarms by 32%. The same results have been obtained for the certain environment, demonstrating Bi-ADIPOK's ability to deal with such environment
    corecore