29,410 research outputs found

    Feature subset selection problem on microarray data

    Get PDF
    Recent advance of technology gave birth to tools such as microarray chips. The use of microarray chips enabled the scientists to measure the amount of protein production from their genes in a cell, known as the gene expression data. The classification of cell samples by means of their gene expression data is a hot research area. The data used for the analysis is massive and therefore the features, i.e., the genes, must be reduced to a reasonable level due to the computational cost of experiments and the possibility of misleading irrelevant genes. Therefore, usually, the analysis based on the classification of cell samples includes a feature subset selection phase. This thesis aims to develop a tool that can be used during the feature subset selection phase of such analyses. Three novel algorithms are proposed for the gene selection problem based on basic association rule mining. The first algorithm starts with fuzzy partitioning of the gene expression data and discovers highly confident IF-THEN rules that enable the classification of sample tissues. The second algorithm search the possible IFTHEN rules based on a heuristic pruning approach which is based on the beam search algorithm. Finally, the third algorithm focuses on the hierarchical information carried through gene expressions by constructing decision trees based on different performance measures. We found satisfactory results in Leukemia Dataset. In addition, in colon cancer dataset, algorithm that is based on construction of decision trees showed good performance

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

    Data mining based cyber-attack detection

    Get PDF
    corecore