11 research outputs found

    Case-based reasoning as a decision support system for cancer diagnosis: A case study

    Get PDF
    Microarray technology can measure the expression levels of thousands of genes in an experiment. This fact makes the use of computational methods in cancer research absolutely essential. One of the possible applications is in the use of Artificial Intelligence techniques. Several of these techniques have been used to analyze expression arrays, but there is a growing need for new and effective solutions. This paper presents a Case-based reasoning (CBR) system for automatic classification of leukemia patients from microarray data. The system incorporates novel algorithms for data mining that allow filtering, classification, and knowledge extraction. The system has been tested and the results obtained are presented in this paper

    Computational Intelligence Techniques for Classification in Microarray Analysis

    Get PDF
    During the last few years there has been a growing need for using computational intelligence techniques to analyze microarray data. The aim of the system presented in this study is to provide innovative decision support techniques for classifying data from microarrays and for extracting knowledge about the classification process. The computational intelligence techniques used in this chapter follow the case-based reasoning paradigm to emulate the steps followed in expression analysis. This work presents a novel filtering technique based on statistical methods, a new clustering technique that uses ESOINN (Enhanced Self-Organizing Incremental Neuronal Network), and a knowledge extraction technique based on the RIPPER algorithm. The system presented within this chapter has been applied to classify CLL patients and extract knowledge about the classification process. The results obtained permit us to conclude that the system provides a notable reduction of the dimensionality of the data obtained from microarrays. Moreover, the classification process takes the detection of relevant and irrelevant probes into account, which is fundamental for subsequent classification and an extraction of knowledge tool with a graphical interface to explain the classification process, and has been much appreciated by the human experts. Finally, the philosophy of the CBR systems facilitates the resolution of new problems using past experiences, which is very appropriate regarding the classification of leukemia

    A direct approach for sparse quadratic discriminant analysis

    Get PDF
    Quadratic discriminant analysis (QDA) is a standard tool for classification due to its simplicity and flexibility. Because the number of its parameters scales quadratically with the number of the variables, QDA is not practical, however, when the dimensionality is relatively large. To address this, we propose a novel procedure named DA-QDA for QDA in analyzing high-dimensional data. Formulated in a simple and coherent framework, DA-QDA aims to directly estimate the key quantities in the Bayes discriminant function including quadratic interactions and a linear index of the variables for classification. Under appropriate sparsity assumptions, we establish consistency results for estimating the interactions and the linear index, and further demonstrate that the misclassification rate of our procedure converges to the optimal Bayes risk, even when the dimensionality is exponentially high with respect to the sample size. An efficient algorithm based on the alternating direction method of multipliers (ADMM) is developed for finding interactions, which is much faster than its competitor in the literature. The promising performance of DA-QDA is illustrated via extensive simulation studies and the analysis of four real datasets

    Algorithmes d'estimation pour la classification parcimonieuse

    Get PDF
    Cette thèse traite du développement d'algorithmes d'estimation en haute dimension. Ces algorithmes visent à résoudre des problèmes de discrimination et de classification, notamment, en incorporant un mécanisme de sélection des variables pertinentes. Les contributions de cette thèse se concrétisent par deux algorithmes, GLOSS pour la discrimination et Mix-GLOSS pour la classification. Tous les deux sont basés sur le résolution d'une régression régularisée de type "optimal scoring" avec une formulation quadratique de la pénalité group-Lasso qui encourage l'élimination des descripteurs non-significatifs. Les fondements théoriques montrant que la régression de type "optimal scoring" pénalisée avec un terme "group-Lasso" permet de résoudre un problème d'analyse discriminante linéaire ont été développés ici pour la première fois. L'adaptation de cette théorie pour la classification avec l'algorithme EM n'est pas nouvelle, mais elle n'a jamais été détaillée précisément pour les pénalités qui induisent la parcimonie. Cette thèse démontre solidement que l'utilisation d'une régression de type "optimal scoring" pénalisée avec un terme "group-Lasso" à l'intérieur d'une boucle EM est possible. Nos algorithmes ont été testés avec des bases de données réelles et artificielles en haute dimension avec des résultats probants en terme de parcimonie, et ce, sans compromettre la performance du classifieur.This thesis deals with the development of estimation algorithms with embedded feature selection the context of high dimensional data, in the supervised and unsupervised frameworks. The contributions of this work are materialized by two algorithms, GLOSS for the supervised domain and Mix-GLOSS for unsupervised counterpart. Both algorithms are based on the resolution of optimal scoring regression regularized with a quadratic formulation of the group-Lasso penalty which encourages the removal of uninformative features. The theoretical foundations that prove that a group-Lasso penalized optimal scoring regression can be used to solve a linear discriminant analysis bave been firstly developed in this work. The theory that adapts this technique to the unsupervised domain by means of the EM algorithm is not new, but it has never been clearly exposed for a sparsity-inducing penalty. This thesis solidly demonstrates that the utilization of group-Lasso penalized optimal scoring regression inside an EM algorithm is possible. Our algorithms have been tested with real and artificial high dimensional databases with impressive resuits from the point of view of the parsimony without compromising prediction performances.COMPIEGNE-BU (601592101) / SudocSudocFranceF

    Multivariate classification of gene expression microarray data

    Get PDF
    L'expressiódels gens obtinguts de l'anàliside microarrays s'utilitza en molts casos, per classificar les cèllules. En aquestatesi, unaversióprobabilística del mètodeDiscriminant Partial Least Squares (p-DPLS)s'utilitza per classificar les mostres de les expressions delsseus gens. p-DPLS esbasa en la regla de Bayes de la probabilitat a posteriori. Aquestsclassificadorssónforaçats a classficarsempre.Per superaraquestalimitaciós'haimplementatl'opció de rebuig.Aquestaopciópermetrebutjarlesmostresamb alt riscd'errors de classificació (és a dir, mostresambigüesi outliers).Aquestaopció de rebuigcombinacriterisbasats en els residuals x, el leverage ielsvalorspredits. A més,esdesenvolupa un mètode de selecció de variables per triarels gens mésrellevants, jaque la majoriadels gens analitzatsamb un microarraysónirrellevants per al propòsit particular de classificacióI podenconfondre el classificador. Finalment, el DPLSs'estenen a la classificació multi-classemitjançant la combinació de PLS ambl'anàlisidiscriminant lineal
    corecore