14,818 research outputs found
A lexicographic multi-objective genetic algorithm for multi-label correlation-based feature selection
This paper proposes a new Lexicographic multi-objective Genetic Algorithm for Multi-Label Correlation-based Feature Selection (LexGA-ML-CFS), which is an extension of the previous single-objective Genetic Algorithm for Multi-label Correlation-based Feature Selection (GA-ML-CFS). This extension uses a LexGA as a global search method for generating candidate feature subsets. In our experiments, we compare the results obtained by LexGA-ML-CFS with the results obtained by the original hill climbing-based ML-CFS, the single-objective GA-ML-CFS and a baseline Binary Relevance method, using ML-kNN as the multi-label classifier. The results from our experiments show that LexGA-ML-CFS improved predictive accuracy, by comparison with other methods, in some cases, but in general there was no statistically significant different between the results of LexGA-ML-CFS and other methods
Mining Brain Networks using Multiple Side Views for Neurological Disorder Identification
Mining discriminative subgraph patterns from graph data has attracted great
interest in recent years. It has a wide variety of applications in disease
diagnosis, neuroimaging, etc. Most research on subgraph mining focuses on the
graph representation alone. However, in many real-world applications, the side
information is available along with the graph data. For example, for
neurological disorder identification, in addition to the brain networks derived
from neuroimaging data, hundreds of clinical, immunologic, serologic and
cognitive measures may also be documented for each subject. These measures
compose multiple side views encoding a tremendous amount of supplemental
information for diagnostic purposes, yet are often ignored. In this paper, we
study the problem of discriminative subgraph selection using multiple side
views and propose a novel solution to find an optimal set of subgraph features
for graph classification by exploring a plurality of side views. We derive a
feature evaluation criterion, named gSide, to estimate the usefulness of
subgraph patterns based upon side views. Then we develop a branch-and-bound
algorithm, called gMSV, to efficiently search for optimal subgraph features by
integrating the subgraph mining process and the procedure of discriminative
feature selection. Empirical studies on graph classification tasks for
neurological disorders using brain networks demonstrate that subgraph patterns
selected by the multi-side-view guided subgraph selection approach can
effectively boost graph classification performances and are relevant to disease
diagnosis.Comment: in Proceedings of IEEE International Conference on Data Mining (ICDM)
201
Distributed Correlation-Based Feature Selection in Spark
CFS (Correlation-Based Feature Selection) is an FS algorithm that has been
successfully applied to classification problems in many domains. We describe
Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and
distributed version of the CFS algorithm, capable of dealing with the large
volumes of data typical of big data applications. Two versions of the algorithm
were implemented and compared using the Apache Spark cluster computing model,
currently gaining popularity due to its much faster processing times than
Hadoop's MapReduce model. We tested our algorithms on four publicly available
datasets, each consisting of a large number of instances and two also
consisting of a large number of features. The results show that our algorithms
were superior in terms of both time-efficiency and scalability. In leveraging a
computer cluster, they were able to handle larger datasets than the
non-distributed WEKA version while maintaining the quality of the results,
i.e., exactly the same features were returned by our algorithms when compared
to the original algorithm available in WEKA.Comment: 25 pages, 5 figure
- …