Search CORE

73,107 research outputs found

A Scalable and Effective Rough Set Theory based Approach for Big Data Pre-processing

Author: Beck Gael
Chelly Dagdia Zaineb
Lebbah Mustapha
Zarges Christine
Publication venue
Publication date: 02/05/2020
Field of study

International audienceA big challenge in the knowledge discovery process is to perform data pre-processing, specifically feature selection, on a large amount of data and high dimensional attribute set. A variety of techniques have been proposed in the literature to deal with this challenge with different degrees of success as most of these techniques need further information about the given input data for thresholding, need to specify noise levels or use some feature ranking procedures. To overcome these limitations, rough set theory (RST) can be used to discover the dependency within the data and reduce the number of attributes enclosed in an input data set while using the data alone and requiring no supplementary information. However, when it comes to massive data sets, RST reaches its limits as it is highly computationally expensive. In this paper, we propose a scalable and effective rough set theory-based approach for large-scale data pre-processing, specifically for feature selection, under the Spark framework. In our detailed experiments, data sets with up to 10,000 attributes have been considered, revealing that our proposed solution achieves a good speedup and performs its feature selection task well without sacrificing performance. Thus, making it relevant to big data

Crossref

Aberystwyth Research Portal

INRIA a CCSD electronic archive server

HAL-Paris 13

Reduct-based ranking of attributes

Author: Stańczyk Urszula
Zielosko Beata
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

The paper is dedicated to the area of feature selection, in particular a notion of attribute rankings that allow to estimate importance of variables. In the research presented for ranking construction a new weighting factor was defined, based on relative reducts. A reduct constitutes an embedded mechanism of feature selection, specific to rough set theory. The proposed factor takes into account the number of reducts in which a given attribute exists, as well as lengths of reducts. Two approaches for reduct generation were employed and compared, with search executed by a genetic algorithm. To validate the usefulness of the reduct-based rankings in the process of feature reduction, for gradually decreasing subsets of attributes, selected through rankings, sets of decision rules were induced in classical rough set approach. The performance of all rule classifiers was evaluated, and experimental results showed that the proposed rankings led to at least the same, or even increased classification accuracy for reduced sets of features than in the case of operating on the entire set of condition attributes. The experiments were performed on datasets from stylometry domain, with treating authorship attribution as a classification task, and stylometric descriptors as characteristic features defining writing styles

Repozytorium Uniwersytetu Śląskiego RE-BUŚ

Analysis of the potentials of multi criteria decision analysis methods to conduct sustainability assessment

Author: Cinelli Marco
Coles Stuart R.
Kirwan Kerry
Publication venue: Elsevier BV
Publication date: 05/07/2014
Field of study

Sustainability assessments require the management of a wide variety of information types, parameters and uncertainties. Multi criteria decision analysis (MCDA) has been regarded as a suitable set of methods to perform sustainability evaluations as a result of its flexibility and the possibility of facilitating the dialogue between stakeholders, analysts and scientists. However, it has been reported that researchers do not usually properly define the reasons for choosing a certain MCDA method instead of another. Familiarity and affinity with a certain approach seem to be the drivers for the choice of a certain procedure. This review paper presents the performance of five MCDA methods (i.e. MAUT, AHP, PROMETHEE, ELECTRE and DRSA) in respect to ten crucial criteria that sustainability assessments tools should satisfy, among which are a life cycle perspective, thresholds and uncertainty management, software support and ease of use. The review shows that MAUT and AHP are fairly simple to understand and have good software support, but they are cognitively demanding for the decision makers, and can only embrace a weak sustainability perspective as trade-offs are the norm. Mixed information and uncertainty can be managed by all the methods, while robust results can only be obtained with MAUT. ELECTRE, PROMETHEE and DRSA are non-compensatory approaches which consent to use a strong sustainability concept, accept a variety of thresholds, but suffer from rank reversal. DRSA is less demanding in terms of preference elicitation, is very easy to understand and provides a straightforward set of decision rules expressed in the form of elementary “if … then …” conditions. Dedicated software is available for all the approaches with a medium to wide range of results capability representation. DRSA emerges as the easiest method, followed by AHP, PROMETHEE and MAUT, while ELECTRE is regarded as fairly difficult. Overall, the analysis has shown that most of the requirements are satisfied by the MCDA methods (although to different extents) with the exclusion of management of mixed data types and adoption of life cycle perspective which are covered by all the considered approaches

Elsevier - Publisher Connector

ZENODO

Warwick Research Archives Portal Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Partitioning Clustering Based on Support Vector Ranking

Author: Huang Lan
Ou Ge
Pang Wei
Peng Qing
Wang Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/11/2016
Field of study

Postprin

Aberdeen University Research