21,245 research outputs found
Unsupervised Instance and Subnetwork Selection for Network Data
Unlike tabular data, features in network data are interconnected within a
domain-specific graph. Examples of this setting include gene expression
overlaid on a protein interaction network (PPI) and user opinions in a social
network. Network data is typically high-dimensional (large number of nodes) and
often contains outlier snapshot instances and noise. In addition, it is often
non-trivial and time-consuming to annotate instances with global labels (e.g.,
disease or normal). How can we jointly select discriminative subnetworks and
representative instances for network data without supervision? We address these
challenges within an unsupervised framework for joint subnetwork and instance
selection in network data, called UISS, via a convex self-representation
objective. Given an unlabeled network dataset, UISS identifies representative
instances while ignoring outliers. It outperforms state-of-the-art baselines on
both discriminative subnetwork selection and representative instance selection,
achieving up to 10% accuracy improvement on all real-world data sets we use for
evaluation. When employed for exploratory analysis in RNA-seq network samples
from multiple studies it produces interpretable and informative summaries
Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification
This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice
- …