15,281 research outputs found
Reduct-based ranking of attributes
The paper is dedicated to the area of feature selection, in particular a notion of attribute rankings that allow to estimate importance of variables. In the research presented for ranking construction a new weighting factor was defined, based on relative reducts. A reduct constitutes an embedded mechanism of feature selection, specific to rough set theory. The proposed factor takes into account the number of reducts in which a given attribute exists, as well as lengths of reducts. Two approaches for reduct generation were employed and compared, with search executed by a genetic algorithm. To validate the usefulness of the reduct-based rankings in the process of feature reduction, for gradually decreasing subsets of attributes, selected through rankings, sets of decision rules were induced in classical rough set approach. The performance of all rule classifiers was evaluated, and experimental results showed that the proposed rankings led to at least the same, or even increased classification accuracy for reduced sets of features than in the case of operating on the entire set of condition attributes. The experiments were performed on datasets from stylometry domain, with treating authorship attribution as a classification task, and stylometric descriptors as characteristic features defining writing styles
Feature weighting techniques for CBR in software effort estimation studies: A review and empirical evaluation
Context : Software effort estimation is one of the most important activities in the software development process. Unfortunately, estimates are often substantially wrong. Numerous estimation methods have been proposed including Case-based Reasoning (CBR). In order to improve CBR estimation accuracy, many researchers have proposed feature weighting techniques (FWT). Objective: Our purpose is to systematically review the empirical evidence to determine whether FWT leads to improved predictions. In addition we evaluate these techniques from the perspectives of (i) approach (ii) strengths and weaknesses (iii) performance and (iv) experimental evaluation approach including the data sets used. Method: We conducted a systematic literature review of published, refereed primary studies on FWT (2000-2014). Results: We identified 19 relevant primary studies. These reported a range of different techniques. 17 out of 19 make benchmark comparisons with standard CBR and 16 out of 17 studies report improved accuracy. Using a one-sample sign test this positive impact is significant (p = 0:0003). Conclusion: The actionable conclusion from this study is that our review of all relevant empirical evidence supports the use of FWTs and we recommend that researchers and practitioners give serious consideration to their adoption
A Survey on Soft Subspace Clustering
Subspace clustering (SC) is a promising clustering technology to identify
clusters based on their associations with subspaces in high dimensional spaces.
SC can be classified into hard subspace clustering (HSC) and soft subspace
clustering (SSC). While HSC algorithms have been extensively studied and well
accepted by the scientific community, SSC algorithms are relatively new but
gaining more attention in recent years due to better adaptability. In the
paper, a comprehensive survey on existing SSC algorithms and the recent
development are presented. The SSC algorithms are classified systematically
into three main categories, namely, conventional SSC (CSSC), independent SSC
(ISSC) and extended SSC (XSSC). The characteristics of these algorithms are
highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201
- …