10,818 research outputs found
Semantics-Preserving Dimensionality Reduction: Rough and Fuzzy-Rough-Based Approaches
AbstractâSemantics-preserving dimensionality reduction refers to the problem of selecting those input features that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition, and signal processing. This has found successful application in tasks that involve data sets containing huge numbers of features (in the order of tens of thousands), which would be impossible to process further. Recent examples include text processing and Web content classification. One of the many successful applications of rough set theory has been to this feature selection area. This paper reviews those techniques that preserve the underlying semantics of the data, using crisp and fuzzy rough set-based methodologies. Several approaches to feature selection based on rough set theory are experimentally compared. Additionally, a new area in feature selection, feature grouping, is highlighted and a rough set-based feature grouping technique is detailed. Index TermsâDimensionality reduction, feature selection, feature transformation, rough selection, fuzzy-rough selection.
Performing Feature Selection with ACO
Summary. The main aim of feature selection is to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. In real world problems FS is a must due to the abundance of noisy, irrelevant or misleading features. However, current methods are inadequate at finding optimal reductions. This chapter presents a feature selection mechanism based on Ant Colony Optimization in an attempt to combat this. The method is then applied to the problem of finding optimal feature subsets in the fuzzy-rough data reduction process. The present work is applied to two very different challenging tasks, namely web classification and complex systems monitoring.
Enhancing Big Data Feature Selection Using a Hybrid Correlation-Based Feature Selection
This study proposes an alternate data extraction method that combines three well-known
feature selection methods for handling large and problematic datasets: the correlation-based feature
selection (CFS), best first search (BFS), and dominance-based rough set approach (DRSA) methods.
This study aims to enhance the classifierâs performance in decision analysis by eliminating uncorrelated and inconsistent data values. The proposed method, named CFS-DRSA, comprises several
phases executed in sequence, with the main phases incorporating two crucial feature extraction tasks.
Data reduction is first, which implements a CFS method with a BFS algorithm. Secondly, a data selection process applies a DRSA to generate the optimized dataset. Therefore, this study aims to solve
the computational time complexity and increase the classification accuracy. Several datasets with
various characteristics and volumes were used in the experimental process to evaluate the proposed
methodâs credibility. The methodâs performance was validated using standard evaluation measures
and benchmarked with other established methods such as deep learning (DL). Overall, the proposed
work proved that it could assist the classifier in returning a significant result, with an accuracy rate
of 82.1% for the neural network (NN) classifier, compared to the support vector machine (SVM),
which returned 66.5% and 49.96% for DL. The one-way analysis of variance (ANOVA) statistical
result indicates that the proposed method is an alternative extraction tool for those with difficulties
acquiring expensive big data analysis tools and those who are new to the data analysis field.Ministry of Higher Education under the Fundamental Research Grant Scheme (FRGS/1/2018/ICT04/UTM/01/1)Universiti Teknologi Malaysia (UTM) under Research University Grant Vot-20H04, Malaysia Research University Network (MRUN) Vot 4L876SPEV project, University of Hradec Kralove, Faculty
of Informatics and Management, Czech Republic (ID: 2102â2021), âSmart Solutions in Ubiquitous
Computing Environments
A Data-Driven Condition Monitoring of Product Quality Analysis System Based on RS and AHP
Mechanical and electrical products have complex structure and intelligent control system, their reliability plays an important role in the normal operation of security facilities. However, most manufacturers usually pay more attention to the product designing and manufacturing quality, with little interest in the intelligent fault diagnosis. The objective of this study is to develop the products quality intelligent analysis and management system based on Rough Set (RS) and Analytic Hierarchy Process (AHP). Firstly, this paper reviews the principle of hardware, software design, monitoring platform and quality analysis system to reduce the number of information transfer with computer technology. Secondly, the fault types and feature extractions of different faults of elevators are presented and simplified by using RS theory. Then, the objective weight of level index model can be obtained by AHP method, and the comprehensive analysis weight of each index is obtained by using the value of subjective and objective weight coefficients with the golden ratio. Finally, a comprehensive decision weight of the major index for quality control analysis system of many vertical elevators is presented. The results show that the data-driven condition monitoring and quality analysis system is a kind of important means to prevent a disaster of complex mechanical and electrical products
A hybrid algorithm for Bayesian network structure learning with application to multi-label learning
We present a novel hybrid algorithm for Bayesian network structure learning,
called H2PC. It first reconstructs the skeleton of a Bayesian network and then
performs a Bayesian-scoring greedy hill-climbing search to orient the edges.
The algorithm is based on divide-and-conquer constraint-based subroutines to
learn the local structure around a target variable. We conduct two series of
experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is
currently the most powerful state-of-the-art algorithm for Bayesian network
structure learning. First, we use eight well-known Bayesian network benchmarks
with various data sizes to assess the quality of the learned structure returned
by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in
terms of goodness of fit to new data and quality of the network structure with
respect to the true dependence structure of the data. Second, we investigate
H2PC's ability to solve the multi-label learning problem. We provide
theoretical results to characterize and identify graphically the so-called
minimal label powersets that appear as irreducible factors in the joint
distribution under the faithfulness condition. The multi-label learning problem
is then decomposed into a series of multi-class classification problems, where
each multi-class variable encodes a label powerset. H2PC is shown to compare
favorably to MMHC in terms of global classification accuracy over ten
multi-label data sets covering different application domains. Overall, our
experiments support the conclusions that local structural learning with H2PC in
the form of local neighborhood induction is a theoretically well-motivated and
empirically effective learning framework that is well suited to multi-label
learning. The source code (in R) of H2PC as well as all data sets used for the
empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author
- âŠ