4,316 research outputs found

    An overview of decision table literature 1982-1995.

    Get PDF
    This report gives an overview of the literature on decision tables over the past 15 years. As much as possible, for each reference, an author supplied abstract, a number of keywords and a classification are provided. In some cases own comments are added. The purpose of these comments is to show where, how and why decision tables are used. The literature is classified according to application area, theoretical versus practical character, year of publication, country or origin (not necessarily country of publication) and the language of the document. After a description of the scope of the interview, classification results and the classification by topic are presented. The main body of the paper is the ordered list of publications with abstract, classification and comments.

    Development of Machine Learning Techniques for Diabetic Retinopathy Risk Estimation

    Get PDF
    La retinopatia diabètica (DR) és una malaltia crònica. És una de les principals complicacions de diabetis i una causa essencial de pèrdua de visió entre les persones que pateixen diabetis. Els pacients diabètics han de ser analitzats periòdicament per tal de detectar signes de desenvolupament de la retinopatia en una fase inicial. El cribratge precoç i freqüent disminueix el risc de pèrdua de visió i minimitza la càrrega als centres assistencials. El nombre dels pacients diabètics està en augment i creixements ràpids, de manera que el fa difícil que consumeix recursos per realitzar un cribatge anual a tots ells. L’objectiu principal d’aquest doctorat. la tesi consisteix en construir un sistema de suport de decisions clíniques (CDSS) basat en dades de registre de salut electrònic (EHR). S'utilitzarà aquest CDSS per estimar el risc de desenvolupar RD. En aquesta tesi doctoral s'estudien mètodes d'aprenentatge automàtic per constuir un CDSS basat en regles lingüístiques difuses. El coneixement expressat en aquest tipus de regles facilita que el metge sàpiga quines combindacions de les condicions són les poden provocar el risc de desenvolupar RD. En aquest treball, proposo un mètode per reduir la incertesa en la classificació dels pacients que utilitzen arbres de decisió difusos (FDT). A continuació es combinen diferents arbres, usant la tècnica de Fuzzy Random Forest per millorar la qualitat de la predicció. A continuació es proposen diverses tècniques d'agregació que millorin la fusió dels resultats que ens dóna cadascun dels arbres FDT. Per millorar la decisió final dels nostres models, proposo tres mesures difuses que s'utilitzen amb integrals de Choquet i Sugeno. La definició d’aquestes mesures difuses es basa en els valors de confiança de les regles. En particular, una d'elles és una mesura difusa que es troba en la qual l'estructura jeràrquica de la FDT és explotada per trobar els valors de la mesura difusa. El resultat final de la recerca feta ha donat lloc a un programari que es pot instal·lar en centres d’assistència primària i hospitals, i pot ser usat pels metges de capçalera per fer l'avaluació preventiva i el cribatge de la Retinopatia Diabètica.La retinopatía diabética (RD) es una enfermedad crónica. Es una de las principales complicaciones de diabetes y una causa esencial de pérdida de visión entre las personas que padecen diabetes. Los pacientes diabéticos deben ser examinados periódicamente para detectar signos de diabetes. desarrollo de retinopatía en una etapa temprana. La detección temprana y frecuente disminuye el riesgo de pérdida de visión y minimiza la carga en los centros de salud. El número de pacientes diabéticos es enorme y está aumentando rápidamente, lo que lo hace difícil y Consume recursos para realizar una evaluación anual para todos ellos. El objetivo principal de esta tesis es construir un sistema de apoyo a la decisión clínica (CDSS) basado en datos de registros de salud electrónicos (EHR). Este CDSS será utilizado para estimar el riesgo de desarrollar RD. En este tesis doctoral se estudian métodos de aprendizaje automático para construir un CDSS basado en reglas lingüísticas difusas. El conocimiento expresado en este tipo de reglas facilita que el médico pueda saber que combinaciones de las condiciones son las que pueden provocar el riesgo de desarrollar RD. En este trabajo propongo un método para reducir la incertidumbre en la clasificación de los pacientes que usan árboles de decisión difusos (FDT). A continuación se combinan diferentes árboles usando la técnica de Fuzzy Random Forest para mejorar la calidad de la predicción. Se proponen también varias políticas para fusionar los resultados de que nos da cada uno de los árboles (FDT). Para mejorar la decisión final propongo tres medidas difusas que se usan con las integrales Choquet y Sugeno. La definición de estas medidas difusas se basa en los valores de confianza de las reglas. En particular, uno de ellos es una medida difusa descomponible en la que se usa la estructura jerárquica del FDT para encontrar los valores de la medida difusa. Como resultado final de la investigación se ha construido un software que puede instalarse en centros de atención médica y hospitales, i que puede ser usado por los médicos de cabecera para hacer la evaluación preventiva y el cribado de la Retinopatía Diabética.Diabetic retinopathy (DR) is a chronic illness. It is one of the main complications of diabetes, and an essential cause of vision loss among people suffering from diabetes. Diabetic patients must be periodically screened in order to detect signs of diabetic retinopathy development in an early stage. Early and frequent screening decreases the risk of vision loss and minimizes the load on the health care centres. The number of the diabetic patients is huge and rapidly increasing so that makes it hard and resource-consuming to perform a yearly screening to all of them. The main goal of this Ph.D. thesis is to build a clinical decision support system (CDSS) based on electronic health record (EHR) data. This CDSS will be utilised to estimate the risk of developing RD. In this Ph.D. thesis, I focus on developing novel interpretable machine learning systems. Fuzzy based systems with linguistic terms are going to be proposed. The output of such systems makes the physician know what combinations of the features that can cause the risk of developing DR. In this work, I propose a method to reduce the uncertainty in classifying diabetic patients using fuzzy decision trees. A Fuzzy Random forest (FRF) approach is proposed as well to estimate the risk for developing DR. Several policies are going to be proposed to merge the classification results achieved by different Fuzzy Decision Trees (FDT) models to improve the quality of the final decision of our models, I propose three fuzzy measures that are used with Choquet and Sugeno integrals. The definition of these fuzzy measures is based on the confidence values of the rules. In particular, one of them is a decomposable fuzzy measure in which the hierarchical structure of the FDT is exploited to find the values of the fuzzy measure. Out of this Ph.D. work, we have built a CDSS software that may be installed in the health care centres and hospitals in order to evaluate and detect Diabetic Retinopathy at early stages

    TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-based Intrusion Detection System

    Get PDF
    Intrusion detection systems (IDS) play a pivotal role in computer security by discovering and repealing malicious activities in computer networks. Anomaly-based IDS, in particular, rely on classification models trained using historical data to discover such malicious activities. In this paper, an improved IDS based on hybrid feature selection and two-level classifier ensembles is proposed. An hybrid feature selection technique comprising three methods, i.e. particle swarm optimization, ant colony algorithm, and genetic algorithm, is utilized to reduce the feature size of the training datasets (NSL-KDD and UNSW-NB15 are considered in this paper). Features are selected based on the classification performance of a reduced error pruning tree (REPT) classifier. Then, a two-level classifier ensembles based on two meta learners, i.e., rotation forest and bagging, is proposed. On the NSL-KDD dataset, the proposed classifier shows 85.8% accuracy, 86.8% sensitivity, and 88.0% detection rate, which remarkably outperform other classification techniques recently proposed in the literature. Results regarding the UNSW-NB15 dataset also improve the ones achieved by several state of the art techniques. Finally, to verify the results, a two-step statistical significance test is conducted. This is not usually considered by IDS research thus far and, therefore, adds value to the experimental results achieved by the proposed classifier

    A Review of Rule Learning Based Intrusion Detection Systems and Their Prospects in Smart Grids

    Get PDF

    Selecting Informative Features with Fuzzy-Rough Sets and its Application for Complex Systems Monitoring

    Get PDF
    One of the main obstacles facing current intelligent pattern recognition appli-cations is that of dataset dimensionality. To enable these systems to be effective, a redundancy-removing step is usually carried out beforehand. Rough Set Theory (RST) has been used as such a dataset pre-processor with much success, however it is reliant upon a crisp dataset; important information may be lost as a result of quantization of the underlying numerical features. This paper proposes a feature selection technique that employs a hybrid variant of rough sets, fuzzy-rough sets, to avoid this information loss. The current work retains dataset semantics, allowing for the creation of clear, readable fuzzy models. Experimental results, of applying the present work to complex systems monitoring, show that fuzzy-rough selection is more powerful than conventional entropy-based, PCA-based and random-based methods. Key words: feature selection; feature dependency; fuzzy-rough sets; reduct search; rule induction; systems monitoring.

    Omnivariate rule induction using a novel pairwise statistical test

    Get PDF
    Rule learning algorithms, for example, RIPPER, induces univariate rules, that is, a propositional condition in a rule uses only one feature. In this paper, we propose an omnivariate induction of rules where under each condition, both a univariate and a multivariate condition are trained, and the best is chosen according to a novel statistical test. This paper has three main contributions: First, we propose a novel statistical test, the combined 5 x 2 cv t test, to compare two classifiers, which is a variant of the 5 x 2 cv t test and give the connections to other tests as 5 x 2 cv F test and k-fold paired t test. Second, we propose a multivariate version of RIPPER, where support vector machine with linear kernel is used to find multivariate linear conditions. Third, we propose an omnivariate version of RIPPER, where the model selection is done via the combined 5 x 2 cv t test. Our results indicate that 1) the combined 5 x 2 cv t test has higher power (lower type II error), lower type I error, and higher replicability compared to the 5 x 2 cv t test, 2) omnivariate rules are better in that they choose whichever condition is more accurate, selecting the right model automatically and separately for each condition in a rule.Publisher's VersionAuthor Post Prin

    Data mining in manufacturing: a review based on the kind of knowledge

    Get PDF
    In modern manufacturing environments, vast amounts of data are collected in database management systems and data warehouses from all involved areas, including product and process design, assembly, materials planning, quality control, scheduling, maintenance, fault detection etc. Data mining has emerged as an important tool for knowledge acquisition from the manufacturing databases. This paper reviews the literature dealing with knowledge discovery and data mining applications in the broad domain of manufacturing with a special emphasis on the type of functions to be performed on the data. The major data mining functions to be performed include characterization and description, association, classification, prediction, clustering and evolution analysis. The papers reviewed have therefore been categorized in these five categories. It has been shown that there is a rapid growth in the application of data mining in the context of manufacturing processes and enterprises in the last 3 years. This review reveals the progressive applications and existing gaps identified in the context of data mining in manufacturing. A novel text mining approach has also been used on the abstracts and keywords of 150 papers to identify the research gaps and find the linkages between knowledge area, knowledge type and the applied data mining tools and techniques

    Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks

    Get PDF
    Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin
    corecore