70,297 research outputs found
Combining Feature Reduction and Case Selection in Building CBR Classifiers
Abstract—CBR systems that are built for the classification problems are called CBR classifiers. This paper presents a novel and fast approach to building efficient and competent CBR classifiers that combines both feature reduction (FR) and case selection (CS). It has three central contributions: 1) it develops a fast rough-set method based on relative attribute dependency among features to compute the approximate reduct, 2) it constructs and compares different case selection methods based on the similarity measure and the concepts of case coverage and case reachability, and 3) CBR classifiers built using a combination of the FR and CS processes can reduce the training burden as well as the need to acquire domain knowledge. The overall experimental results demonstrating on four real-life data sets show that the combined FR and CS method can preserve, and may also improve, the solution accuracy while at the same time substantially reducing the storage space. The case retrieval time is also greatly reduced because the use of CBR classifier contains a smaller amount of cases with fewer features. The developed FR and CS combination method is also compared with the kernel PCA and SVMs techniques. Their storage requirement, classification accuracy, and classification speed are presented and discussed. Index Terms—Case-based reasoning, CBR classifier, case selection, feature reduction, k-NN principle, rough sets.
Feature selection with test cost constraint
Feature selection is an important preprocessing step in machine learning and
data mining. In real-world applications, costs, including money, time and other
resources, are required to acquire the features. In some cases, there is a test
cost constraint due to limited resources. We shall deliberately select an
informative and cheap feature subset for classification. This paper proposes
the feature selection with test cost constraint problem for this issue. The new
problem has a simple form while described as a constraint satisfaction problem
(CSP). Backtracking is a general algorithm for CSP, and it is efficient in
solving the new problem on medium-sized data. As the backtracking algorithm is
not scalable to large datasets, a heuristic algorithm is also developed.
Experimental results show that the heuristic algorithm can find the optimal
solution in most cases. We also redefine some existing feature selection
problems in rough sets, especially in decision-theoretic rough sets, from the
viewpoint of CSP. These new definitions provide insight to some new research
directions.Comment: 23 page
Automation of Feature Engineering for IoT Analytics
This paper presents an approach for automation of interpretable feature
selection for Internet Of Things Analytics (IoTA) using machine learning (ML)
techniques. Authors have conducted a survey over different people involved in
different IoTA based application development tasks. The survey reveals that
feature selection is the most time consuming and niche skill demanding part of
the entire workflow. This paper shows how feature selection is successfully
automated without sacrificing the decision making accuracy and thereby reducing
the project completion time and cost of hiring expensive resources. Several
pattern recognition principles and state of art (SoA) ML techniques are
followed to design the overall approach for the proposed automation. Three data
sets are considered to establish the proof-of-concept. Experimental results
show that the proposed automation is able to reduce the time for feature
selection to days instead of months which would have been required in
absence of the automation. This reduction in time is achieved without any
sacrifice in the accuracy of the decision making process. Proposed method is
also compared against Multi Layer Perceptron (MLP) model as most of the state
of the art works on IoTA uses MLP based Deep Learning. Moreover the feature
selection method is compared against SoA feature reduction technique namely
Principal Component Analysis (PCA) and its variants. The results obtained show
that the proposed method is effective.Comment: AIoTAS Workshop, ISCA 2017. To be published in ACM SIGBED Review 201
Rough set based lattice structure for knowledge representation in medical expert systems: low back pain management case study
The aim of medical knowledge representation is to capture the detailed domain
knowledge in a clinically efficient manner and to offer a reliable resolution
with the acquired knowledge. The knowledge base to be used by a medical expert
system should allow incremental growth with inclusion of updated knowledge over
the time. As knowledge are gathered from a variety of knowledge sources by
different knowledge engineers, the problem of redundancy is an important
concern here due to increased processing time of knowledge and occupancy of
large computational storage to accommodate all the gathered knowledge. Also
there may exist many inconsistent knowledge in the knowledge base. In this
paper, we have proposed a rough set based lattice structure for knowledge
representation in medical expert systems which overcomes the problem of
redundancy and inconsistency in knowledge and offers computational efficiency
with respect to both time and space. We have also generated an optimal set of
decision rules that would be used directly by the inference engine. The
reliability of each rule has been measured using a new metric called
credibility factor, and the certainty and coverage factors of a decision rule
have been re-defined. With a set of decisions rules arranged in descending
order according to their reliability measures, the medical expert system will
consider the highly reliable and certain rules at first, then it would search
for the possible and uncertain rules at later stage, if recommended by
physicians. The proposed knowledge representation technique has been
illustrated using an example from the domain of low back pain. The proposed
scheme ensures completeness, consistency, integrity, non-redundancy, and ease
of access.Comment: 34 pages, 2 figures, International Journa
Cloud Service Provider Evaluation System using Fuzzy Rough Set Technique
Cloud Service Providers (CSPs) offer a wide variety of scalable, flexible,
and cost-efficient services to cloud users on demand and pay-per-utilization
basis. However, vast diversity in available cloud service providers leads to
numerous challenges for users to determine and select the best suitable
service. Also, sometimes users need to hire the required services from multiple
CSPs which introduce difficulties in managing interfaces, accounts, security,
supports, and Service Level Agreements (SLAs). To circumvent such problems
having a Cloud Service Broker (CSB) be aware of service offerings and users
Quality of Service (QoS) requirements will benefit both the CSPs as well as
users. In this work, we proposed a Fuzzy Rough Set based Cloud Service
Brokerage Architecture, which is responsible for ranking and selecting services
based on users QoS requirements, and finally monitor the service execution. We
have used the fuzzy rough set technique for dimension reduction. Used weighted
Euclidean distance to rank the CSPs. To prioritize user QoS request, we
intended to use user assign weights, also incorporated system assigned weights
to give the relative importance to QoS attributes. We compared the proposed
ranking technique with an existing method based on the system response time.
The case study experiment results show that the proposed approach is scalable,
resilience, and produce better results with less searching time.Comment: 12 pages, 7 figures, and 8 table
Mappings on Soft Classes
In this paper, we define the notion of a mapping on soft classes and study
several properties of images and inverse images of soft sets supported by
examples and counterexamples. Finally, these notions have been applied to the
problem of medical diagnosis in medical expert systems.Comment: Accepted manuscript, to appear in New Mathematics and Natural
Computatio
A Method for Vehicle Collision Risk Assessment through Inferring Driver's Braking Actions in Near-Crash Situations
Driving information and data under potential vehicle crashes create
opportunities for extensive real-world observations of driver behaviors and
relevant factors that significantly influence the driving safety in emergency
scenarios. Furthermore, the availability of such data also enhances the
collision avoidance systems (CASs) by evaluating driver's actions in near-crash
scenarios and providing timely warnings. These applications motivate the need
for heuristic tools capable of inferring relationship among driving risk,
driver/vehicle characteristics, and road environment. In this paper, we
acquired amount of real-world driving data and built a comprehensive dataset,
which contains multiple "driver-vehicle-road" attributes. The proposed method
works in two steps. In the first step, a variable precision rough set (VPRS)
based classification technique is applied to draw a reduced core subset from
field driving dataset, which presents the essential attributes set most
relevant to driving safety assessment. In the second step, we design a decision
strategy by introducing mutual information entropy to quantify the significance
of each attribute, then a representative index through accumulation of weighted
"driver-vehicle-road" factors is calculated to reflect the driving risk for
actual situation. The performance of the proposed method is demonstrated in an
offline analysis of the driving data collected in field trials, where the aim
is to infer the emergency braking actions in next short term. The results
indicate that our proposed model is a good alternative for providing improved
warnings in real-time because of its high prediction accuracy and stability.Comment: 14 page
Performance analysis of unsupervised feature selection methods
Feature selection (FS) is a process which attempts to select more informative
features. In some cases, too many redundant or irrelevant features may
overpower main features for classification. Feature selection can remedy this
problem and therefore improve the prediction accuracy and reduce the
computational overhead of classification algorithms. The main aim of feature
selection is to determine a minimal feature subset from a problem domain while
retaining a suitably high accuracy in representing the original features. In
this paper, Principal Component Analysis (PCA), Rough PCA, Unsupervised Quick
Reduct (USQR) algorithm and Empirical Distribution Ranking (EDR) approaches are
applied to discover discriminative features that will be the most adequate ones
for classification. Efficiency of the approaches is evaluated using standard
classification metrics.Comment: 7 pages, Conference Publication
A Novel Rough Set Reduct Algorithm for Medical Domain Based on Bee Colony Optimization
Feature selection refers to the problem of selecting relevant features which
produce the most predictive outcome. In particular, feature selection task is
involved in datasets containing huge number of features. Rough set theory has
been one of the most successful methods used for feature selection. However,
this method is still not able to find optimal subsets. This paper proposes a
new feature selection method based on Rough set theory hybrid with Bee Colony
Optimization (BCO) in an attempt to combat this. This proposed work is applied
in the medical domain to find the minimal reducts and experimentally compared
with the Quick Reduct, Entropy Based Reduct, and other hybrid Rough Set methods
such as Genetic Algorithm (GA), Ant Colony Optimization (ACO) and Particle
Swarm Optimization (PSO).Comment: IEEE Publication Format,
https://sites.google.com/site/journalofcomputing
Discovering Stock Price Prediction Rules of Bombay Stock Exchange Using Rough Fuzzy Multi Layer Perception Networks
In India financial markets have existed for many years. A functionally
accented, diverse, efficient and flexible financial system is vital to the
national objective of creating a market driven, productive and competitive
economy. Today markets of varying maturity exist in equity, debt, commodities
and foreign exchange. In this work we attempt to generate prediction rules
scheme for stock price movement at Bombay Stock Exchange using an important
Soft Computing paradigm viz., Rough Fuzzy Multi Layer Perception. The use of
Computational Intelligence Systems such as Neural Networks, Fuzzy Sets, Genetic
Algorithms, etc. for Stock Market Predictions has been widely established. The
process is to extract knowledge in the form of rules from daily stock
movements. These rules can then be used to guide investors. To increase the
efficiency of the prediction process, Rough Sets is used to discretize the
data. The methodology uses a Genetic Algorithm to obtain a structured network
suitable for both classification and rule extraction. The modular concept,
based on divide and conquer strategy, provides accelerated training and a
compact network suitable for generating a minimum number of rules with high
certainty values. The concept of variable mutation operator is introduced for
preserving the localized structure of the constituting Knowledge Based
sub-networks, while they are integrated and evolved. Rough Set Dependency Rules
are generated directly from the real valued attribute table containing Fuzzy
membership values. The paradigm is thus used to develop a rule extraction
algorithm. The extracted rules are compared with some of the related rule
extraction techniques on the basis of some quantitative performance indices.
The proposed methodology extracts rules which are less in number, are accurate,
have high certainty factor and have low confusion with less computation time.Comment: Book Chapter: Forecasting Financial Markets in India, Rudra P.
Pradhan, Indian Institute of Technology Kharagpur, (Editor), Allied
Publishers, India, 200
- …