4,398 research outputs found

    A Survey on Soft Subspace Clustering

    Full text link
    Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201

    MapReduce based Classification for Microarray data using Parallel Genetic Algorithm

    Get PDF
    Inorder to uncover thousands of genes Microarray   produces high throughput is used. Only few gene expression data out of thousands of data is used for disease predication and also for disease classification in medical environment.  To find such initial coexpressed gene groups of clusters whose joint expression is strongly related with the class label A Supervised attribute clustering is used. By sharing the information between each attributes the Mutual Information uses the information of sample varieties to measure the similarity among the attributes. From this the redundant and irrelevant attributes are removed. After forming the clusters the PGA is used to find the optimal feature and is given as mapper function so as to improve the class separability. Using this method the diagnosis can be made easier and effective since its done parallelly. The predictive accuracy is estimated using all the three classifiers such as K-nearest neighbours including naive bayes and Support Vector machine. Thus the overall approach used reducer function which provides excellent predictive capability for accurate medical diagnosis

    Stable Feature Selection for Biomarker Discovery

    Full text link
    Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development

    Methods and Systems for Biclustering Algorithm

    Get PDF
    Methods and systems for improved unsupervised learning are described. The unsupervised learning can consist of biclustering a data set, e.g., by biclustering subsets of the entire data set. In an example, the biclustering does not include feeding know and proven results into the biclustering methodology or system. A hierarchical approach can be used that feeds proven clusters back into the biclustering methodology or system as the input. Data that does not cluster may be discarded. Thus, a very large unknown data set can be acted on to learn about the data. The system is also amenable to parallelization
    corecore