17,680 research outputs found

    Building interpretable fuzzy models for high dimensional data analysis in cancer diagnosis

    Get PDF
    Background: Analysing gene expression data from microarray technologies is a very important task in biology and medicine, and particularly in cancer diagnosis. Different from most other popular methods in high dimensional bio-medical data analysis, such as microarray gene expression or proteomics mass spectroscopy data analysis, fuzzy rule-based models can not only provide good classification results, but also easily be explained and interpreted in human understandable terms, by using fuzzy rules. However, the advantages offered by fuzzy-based techniques in microarray data analysis have not yet been fully explored in the literature. Although some recently developed fuzzy-based modeling approaches can provide satisfactory classification results, the rule bases generated by most of the reported fuzzy models for gene expression data are still too large to be easily comprehensible. Results: In this paper, we develop some Multi-Objective Evolutionary Algorithms based Interpretable Fuzzy (MOEAIF) methods for analysing high dimensional bio-medical data sets, such as microarray gene expression data and proteomics mass spectroscopy data. We mainly focus on evaluating our proposed models on microarray gene expression cancer data sets, i.e., the lung cancer data set and the colon cancer data set, but we extend our investigations to other type of cancer data set, such as the ovarian cancer data set. The experimental studies have shown that relatively simple and small fuzzy rule bases, with satisfactory classification performance, can be successfully obtained for challenging microarray gene expression datasets. Conclusions: We believe that fuzzy-based techniques, and in particular the methods proposed in this paper, can be very useful tools in dealing with high dimensional cancer data. We also argue that the potential of applying fuzzy-based techniques to microarray data analysis need to be further explored. </p

    Fuzzy Logic for Elimination of Redundant Information of Microarray Data

    Get PDF
    Gene subset selection is essential for classification and analysis of microarray data. However, gene selection is known to be a very difficult task since gene expression data not only have high dimensionalities, but also contain redundant information and noises. To cope with these difficulties, this paper introduces a fuzzy logic based pre-processing approach composed of two main steps. First, we use fuzzy inference rules to transform the gene expression levels of a given dataset into fuzzy values. Then we apply a similarity relation to these fuzzy values to define fuzzy equivalence groups, each group containing strongly similar genes. Dimension reduction is achieved by considering for each group of similar genes a single representative based on mutual information. To assess the usefulness of this approach, extensive experimentations were carried out on three well-known public datasets with a combined classification model using three statistic filters and three classifiers

    Feasibility of Haralick's Texture Features for the Classification of Chromogenic In-situ Hybridization Images

    Full text link
    This paper presents a proof of concept for the usefulness of second-order texture features for the qualitative analysis and classification of chromogenic in-situ hybridization whole slide images in high-throughput imaging experiments. The challenge is that currently, the gold standard for gene expression grading in such images is expert assessment. The idea of the research team is to use different approaches in the analysis of these images that will be used for structural segmentation and functional analysis in gene expression. The article presents such perspective idea to select a number of textural features that are going to be used for classification. In our experiment, natural grouping of image samples (tiles) depending on their local texture properties was explored in an unsupervised classification procedure. The features are reduced to two dimensions with fuzzy c-means clustering. The overall conclusion of this experiment is that Haralick features are a viable choice for classification and analysis of chromogenic in-situ hybridization image data. The principal component analysis approach produced slightly more "understandable" from an annotator's point of view classes.Comment: 4 pages, 1 figur

    Applications of Diffusion Maps in Gene Expression Data-Based Cancer Diagnosis Analysis

    Get PDF
    Early detection of a tumor\u27s site of origin is particularly important for cancer diagnosis and treatment. The employment of gene expression profiles for different cancer types or subtypes has already shown significant advantages over traditional cancer classification methods. One of the major problems in cancer type recognition-oriented gene expression data analysis is the overwhelming number of measures of gene expression levels versus the small number of samples, which causes the curse of dimension issue. Here, we use diffusion maps, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set in order to obtain efficient representation of data geometric descriptions, for dimensionality reduction. The derived data are then clustered with Fuzzy ART to form the division of the cancer samples. Experimental results on the small round blue-cell tumor data set demonstrate the effectiveness of our proposed method in addressing multidimensional gene expression data and identifying different types of tumors

    Clustering of Cancer Tissues Using Diffusion Maps and Fuzzy ART with Gene Expression Data

    Get PDF
    Early detection of a tumor\u27s site of origin is particularly important for cancer diagnosis and treatment. The employment of gene expression profiles for different cancer types or subtypes has already shown significant advantages over traditional cancer classification methods. Here, we apply a neural network clustering theory, Fuzzy ART, to generate the division of cancer samples, which is useful in investigating unknown cancer types or subtypes. On the other hand, we use diffusion maps, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set in order to obtain efficient representation of data geometric descriptions, for dimensionality reduction. The curse of dimensionality is a major problem in cancer type recognition-oriented gene expression data analysis due to the overwhelming number of measures of gene expression levels versus the small number of samples. Experimental results on the small round blue-cell tumor (SRBCT) data set, compared with other widely used clustering algorithms, demonstrate the effectiveness of our proposed method in addressing multidimensional gene expression data

    Biomedical Knowledge Extraction Using Fuzzy Differential Profiles and Semantic Ranking

    Get PDF
    International audienceRecently, technologies such as DNA microarrays allow to generate big scale of transcriptomic data used to the aim of exploring background of genes. The analysis and the interpretation of such data requires important databases and efficient mining methods, in order to extract specific biological functions belonging to a group of genes of an expression profile. To this aim, we propose here a new approach for mining transcriptomic data combining domain knowledge and classification methods. Firstly, we propose the definition of Fuzzy Differential Gene Expression Profiles (FG-DEP) based on fuzzy classification and a differential definition between the considered biological situations. Secondly, we will use our previously defined efficient semantic similarity measure (called IntelliGO), that is applied on Gene Ontology (GO) annotation terms, for computing semantic and functional similarities between genes of resulting FG-DEP and well known genetic markers involved in the development of cancers. After that, the similarity matrices will be used to introduce a novel Functional Spectral Representation (FSR) calculated through a semantic ranking of genes regarding their similarities with the tumoral markers. The FSR representation should help expert to interpret by a new way transcriptomic data and infer new genes having similar biological functions regarding well known diseases
    • 

    corecore