Classification methods are commonly divided into two categories: unsupervised and supervised. Unsupervised methods have the ability to discover new classes by grouping data into clusters or tree structures without using the class labels, but they carry the risk of producing noninterpretable results. On the other hand, supervised methods always find decision rules that discriminate samples with different class labels. However, the class label information plays such an important role that it confines supervised methods by defining the possible classes. Consequently, supervised methods do not have the ability to discover new classes. To overcome the limitations of unsupervised and supervised methods, we propose a new method, which utilizes the class labels to a less important role so as to perform class discovery and classification simultaneously. The proposed method is called SPACC (SPectral Analysis for Class discovery and Classification). In SPACC, the training samples are nodes of an undirected weighted network. Using spectral analysis, SPACC iteratively partitions the network into a top-down binary tree. Each partitioning step is unsupervised, and the class labels are only used to define the stopping criterion. When the partitioning ends, the training samples have been divided into several subsets, each corresponding to one class label. Because multiple subsets can correspond to the same class label, SPACC may identify biologically meaningful subclasses, and minimize the impact of outliers and mislabeled data. We demonstrate the effectiveness of SPACC for class discovery and classification on microarray data of lymphomas and leukemias. SPACC software is available at http://icbp.stanford.edu/software/SPACC/
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.