Search CORE

70,297 research outputs found

Combining Feature Reduction and Case Selection in Building CBR Classifiers

Author: Sankar K. Pal
Simon C. K. Shiu
Student Member
Yan Li
Publication venue
Publication date: 04/12/2008
Field of study

Abstract—CBR systems that are built for the classification problems are called CBR classifiers. This paper presents a novel and fast approach to building efficient and competent CBR classifiers that combines both feature reduction (FR) and case selection (CS). It has three central contributions: 1) it develops a fast rough-set method based on relative attribute dependency among features to compute the approximate reduct, 2) it constructs and compares different case selection methods based on the similarity measure and the concepts of case coverage and case reachability, and 3) CBR classifiers built using a combination of the FR and CS processes can reduce the training burden as well as the need to acquire domain knowledge. The overall experimental results demonstrating on four real-life data sets show that the combined FR and CS method can preserve, and may also improve, the solution accuracy while at the same time substantially reducing the storage space. The case retrieval time is also greatly reduced because the use of CBR classifier contains a smaller amount of cases with fewer features. The developed FR and CS combination method is also compared with the kernel PCA and SVMs techniques. Their storage requirement, classification accuracy, and classification speed are presented and discussed. Index Terms—Case-based reasoning, CBR classifier, case selection, feature reduction, k-NN principle, rough sets.

CiteSeerX

Feature selection with test cost constraint

Author: Hu Qinghua
Min Fan
Zhu William
Publication venue: 'Elsevier BV'
Publication date: 25/09/2012
Field of study

Feature selection is an important preprocessing step in machine learning and data mining. In real-world applications, costs, including money, time and other resources, are required to acquire the features. In some cases, there is a test cost constraint due to limited resources. We shall deliberately select an informative and cheap feature subset for classification. This paper proposes the feature selection with test cost constraint problem for this issue. The new problem has a simple form while described as a constraint satisfaction problem (CSP). Backtracking is a general algorithm for CSP, and it is efficient in solving the new problem on medium-sized data. As the backtracking algorithm is not scalable to large datasets, a heuristic algorithm is also developed. Experimental results show that the heuristic algorithm can find the optimal solution in most cases. We also redefine some existing feature selection problems in rough sets, especially in decision-theoretic rough sets, from the viewpoint of CSP. These new definitions provide insight to some new research directions.Comment: 23 page

arXiv.org e-Print Archive

Automation of Feature Engineering for IoT Analytics

Author: Banerjee Snehasis
Chattopadhyay Tanushyam
Garain Utpal
Pal Arpan
Publication venue
Publication date: 13/07/2017
Field of study

This paper presents an approach for automation of interpretable feature selection for Internet Of Things Analytics (IoTA) using machine learning (ML) techniques. Authors have conducted a survey over different people involved in different IoTA based application development tasks. The survey reveals that feature selection is the most time consuming and niche skill demanding part of the entire workflow. This paper shows how feature selection is successfully automated without sacrificing the decision making accuracy and thereby reducing the project completion time and cost of hiring expensive resources. Several pattern recognition principles and state of art (SoA) ML techniques are followed to design the overall approach for the proposed automation. Three data sets are considered to establish the proof-of-concept. Experimental results show that the proposed automation is able to reduce the time for feature selection to

2

days instead of

4-6

months which would have been required in absence of the automation. This reduction in time is achieved without any sacrifice in the accuracy of the decision making process. Proposed method is also compared against Multi Layer Perceptron (MLP) model as most of the state of the art works on IoTA uses MLP based Deep Learning. Moreover the feature selection method is compared against SoA feature reduction technique namely Principal Component Analysis (PCA) and its variants. The results obtained show that the proposed method is effective.Comment: AIoTAS Workshop, ISCA 2017. To be published in ACM SIGBED Review 201

arXiv.org e-Print Archive

Rough set based lattice structure for knowledge representation in medical expert systems: low back pain management case study

Author: Basu Swapan Kumar
Goswami Subrata
Mandal Jyotsna Kumar
Santra Debarpita
Publication venue
Publication date: 02/10/2018
Field of study

The aim of medical knowledge representation is to capture the detailed domain knowledge in a clinically efficient manner and to offer a reliable resolution with the acquired knowledge. The knowledge base to be used by a medical expert system should allow incremental growth with inclusion of updated knowledge over the time. As knowledge are gathered from a variety of knowledge sources by different knowledge engineers, the problem of redundancy is an important concern here due to increased processing time of knowledge and occupancy of large computational storage to accommodate all the gathered knowledge. Also there may exist many inconsistent knowledge in the knowledge base. In this paper, we have proposed a rough set based lattice structure for knowledge representation in medical expert systems which overcomes the problem of redundancy and inconsistency in knowledge and offers computational efficiency with respect to both time and space. We have also generated an optimal set of decision rules that would be used directly by the inference engine. The reliability of each rule has been measured using a new metric called credibility factor, and the certainty and coverage factors of a decision rule have been re-defined. With a set of decisions rules arranged in descending order according to their reliability measures, the medical expert system will consider the highly reliable and certain rules at first, then it would search for the possible and uncertain rules at later stage, if recommended by physicians. The proposed knowledge representation technique has been illustrated using an example from the domain of low back pain. The proposed scheme ensures completeness, consistency, integrity, non-redundancy, and ease of access.Comment: 34 pages, 2 figures, International Journa

arXiv.org e-Print Archive

Cloud Service Provider Evaluation System using Fuzzy Rough Set Technique

Author: Anjana Parwat Singh
Badiwal Priyanka
Rao C. Raghavendra
Wankar Rajeev
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/10/2018
Field of study

Cloud Service Providers (CSPs) offer a wide variety of scalable, flexible, and cost-efficient services to cloud users on demand and pay-per-utilization basis. However, vast diversity in available cloud service providers leads to numerous challenges for users to determine and select the best suitable service. Also, sometimes users need to hire the required services from multiple CSPs which introduce difficulties in managing interfaces, accounts, security, supports, and Service Level Agreements (SLAs). To circumvent such problems having a Cloud Service Broker (CSB) be aware of service offerings and users Quality of Service (QoS) requirements will benefit both the CSPs as well as users. In this work, we proposed a Fuzzy Rough Set based Cloud Service Brokerage Architecture, which is responsible for ranking and selecting services based on users QoS requirements, and finally monitor the service execution. We have used the fuzzy rough set technique for dimension reduction. Used weighted Euclidean distance to rank the CSPs. To prioritize user QoS request, we intended to use user assign weights, also incorporated system assigned weights to give the relative importance to QoS attributes. We compared the proposed ranking technique with an existing method based on the system response time. The case study experiment results show that the proposed approach is scalable, resilience, and produce better results with less searching time.Comment: 12 pages, 7 figures, and 8 table

arXiv.org e-Print Archive

Mappings on Soft Classes

Author: Ahmad B.
Kharal Athar
Publication venue
Publication date: 25/06/2010
Field of study

In this paper, we define the notion of a mapping on soft classes and study several properties of images and inverse images of soft sets supported by examples and counterexamples. Finally, these notions have been applied to the problem of medical diagnosis in medical expert systems.Comment: Accepted manuscript, to appear in New Mathematics and Natural Computatio

arXiv.org e-Print Archive

A Method for Vehicle Collision Risk Assessment through Inferring Driver's Braking Actions in Near-Crash Situations

Author: Ai Yunfei
He Yi
Li Zhixiong
Peng Liqun
Sotelo Miguel Angel
Publication venue
Publication date: 28/04/2020
Field of study

Driving information and data under potential vehicle crashes create opportunities for extensive real-world observations of driver behaviors and relevant factors that significantly influence the driving safety in emergency scenarios. Furthermore, the availability of such data also enhances the collision avoidance systems (CASs) by evaluating driver's actions in near-crash scenarios and providing timely warnings. These applications motivate the need for heuristic tools capable of inferring relationship among driving risk, driver/vehicle characteristics, and road environment. In this paper, we acquired amount of real-world driving data and built a comprehensive dataset, which contains multiple "driver-vehicle-road" attributes. The proposed method works in two steps. In the first step, a variable precision rough set (VPRS) based classification technique is applied to draw a reduced core subset from field driving dataset, which presents the essential attributes set most relevant to driving safety assessment. In the second step, we design a decision strategy by introducing mutual information entropy to quantify the significance of each attribute, then a representative index through accumulation of weighted "driver-vehicle-road" factors is calculated to reflect the driving risk for actual situation. The performance of the proposed method is demonstrated in an offline analysis of the driving data collected in field trials, where the aim is to infer the emergency braking actions in next short term. The results indicate that our proposed model is a good alternative for providing improved warnings in real-time because of its high prediction accuracy and stability.Comment: 14 page

arXiv.org e-Print Archive

Performance analysis of unsupervised feature selection methods

Author: Inbarani H. Hannah
Parveen A. Nisthana
Sathishkumar E. N.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/06/2013
Field of study

Feature selection (FS) is a process which attempts to select more informative features. In some cases, too many redundant or irrelevant features may overpower main features for classification. Feature selection can remedy this problem and therefore improve the prediction accuracy and reduce the computational overhead of classification algorithms. The main aim of feature selection is to determine a minimal feature subset from a problem domain while retaining a suitably high accuracy in representing the original features. In this paper, Principal Component Analysis (PCA), Rough PCA, Unsupervised Quick Reduct (USQR) algorithm and Empirical Distribution Ranking (EDR) approaches are applied to discover discriminative features that will be the most adequate ones for classification. Efficiency of the approaches is evaluated using standard classification metrics.Comment: 7 pages, Conference Publication

arXiv.org e-Print Archive

A Novel Rough Set Reduct Algorithm for Medical Domain Based on Bee Colony Optimization

Author: Suguna N.
Thanushkodi K.
Publication venue
Publication date: 23/06/2010
Field of study

Feature selection refers to the problem of selecting relevant features which produce the most predictive outcome. In particular, feature selection task is involved in datasets containing huge number of features. Rough set theory has been one of the most successful methods used for feature selection. However, this method is still not able to find optimal subsets. This paper proposes a new feature selection method based on Rough set theory hybrid with Bee Colony Optimization (BCO) in an attempt to combat this. This proposed work is applied in the medical domain to find the minimal reducts and experimentally compared with the Quick Reduct, Entropy Based Reduct, and other hybrid Rough Set methods such as Genetic Algorithm (GA), Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO).Comment: IEEE Publication Format, https://sites.google.com/site/journalofcomputing

arXiv.org e-Print Archive

Discovering Stock Price Prediction Rules of Bombay Stock Exchange Using Rough Fuzzy Multi Layer Perception Networks

Author: Chatterjee Dipak
Chaudhuri Arindam
De Kajal
Publication venue
Publication date: 07/07/2013
Field of study

In India financial markets have existed for many years. A functionally accented, diverse, efficient and flexible financial system is vital to the national objective of creating a market driven, productive and competitive economy. Today markets of varying maturity exist in equity, debt, commodities and foreign exchange. In this work we attempt to generate prediction rules scheme for stock price movement at Bombay Stock Exchange using an important Soft Computing paradigm viz., Rough Fuzzy Multi Layer Perception. The use of Computational Intelligence Systems such as Neural Networks, Fuzzy Sets, Genetic Algorithms, etc. for Stock Market Predictions has been widely established. The process is to extract knowledge in the form of rules from daily stock movements. These rules can then be used to guide investors. To increase the efficiency of the prediction process, Rough Sets is used to discretize the data. The methodology uses a Genetic Algorithm to obtain a structured network suitable for both classification and rule extraction. The modular concept, based on divide and conquer strategy, provides accelerated training and a compact network suitable for generating a minimum number of rules with high certainty values. The concept of variable mutation operator is introduced for preserving the localized structure of the constituting Knowledge Based sub-networks, while they are integrated and evolved. Rough Set Dependency Rules are generated directly from the real valued attribute table containing Fuzzy membership values. The paradigm is thus used to develop a rule extraction algorithm. The extracted rules are compared with some of the related rule extraction techniques on the basis of some quantitative performance indices. The proposed methodology extracts rules which are less in number, are accurate, have high certainty factor and have low confusion with less computation time.Comment: Book Chapter: Forecasting Financial Markets in India, Rudra P. Pradhan, Indian Institute of Technology Kharagpur, (Editor), Allied Publishers, India, 200

arXiv.org e-Print Archive