37,708 research outputs found
Case Slicing Technique for Feature Selection
One of the problems addressed by machine learning is data classification. Finding a
good classification algorithm is an important component of many data mining projects.
Since the 1960s, many algorithms for data classification have been proposed. Data mining researchers often use classifiers to identify important classes of objects within a data repository.This research undertakes two main tasks. The first task is to introduce slicing technique
for feature subset selection. The second task is to enhance classification accuracy based
on the first task, so that it can be used to classify objects or cases based on selected
relevant features only. This new approach called Case Slicing Technique (CST).
Applying to this technique on classification task can result in further enhancing case classification accuracy. Case Slicing Technique (CST) helps in identifying the subset of
features used in computing the similarity measures needed by classification algorithms.
CST was tested on nine datasets from UCI machine learning repositories and domain
theories. The maximum and minimum accuracy obtained is 99% and 96% respectively,
based on the evaluation approach. The most commonly used evaluation technique is
called k-cross validation technique. This technique with k = 10 has been used in this
thesis to evaluate the proposed approach.
CST was compared to other selected classification methods based on feature subset
selection such as Induction of Decision Tree Algorithm (ID3), Base Learning Algorithm
K-Nearest Nighbour Algorithm (k-NN) and NaYve Bay~sA lgorithm (NB). All these
approaches are implemented with RELIEF feature selection approach.
The classification accuracy obtained from the CST method is compared to other selected
classification methods such as Value Difference Metric (VDM), Pre-Category Feature
Importance (PCF), Cross-Category Feature Importance (CCF), Instance-Based
Algorithm (IB4), Decision Tree Algorithms such as Induction of Decision Tree
Algorithm (ID3) and Base Learning Algorithm (C4.5), Rough Set methods such as
Standard Integer Programming (SIP) and Decision Related Integer Programming (DRIP)
and Neural Network methods such as the Multilayer method
Simulated evaluation of faceted browsing based on feature selection
In this paper we explore the limitations of facet based browsing which uses sub-needs of an information need for querying and organising the search process in video retrieval. The underlying assumption of this approach is that the search effectiveness will be enhanced if such an approach is employed for interactive video retrieval using textual and visual features. We explore the performance bounds of a faceted system by carrying out a simulated user evaluation on TRECVid data sets, and also on the logs of a prior user experiment with the system. We first present a methodology to reduce the dimensionality of features by selecting the most important ones. Then, we discuss the simulated evaluation strategies employed in our evaluation and the effect on the use of both textual and visual features. Facets created by users are simulated by clustering video shots using textual and visual features. The experimental results of our study demonstrate that the faceted browser can potentially improve the search effectiveness
Feature subset selection: a correlation based filter approach
Recent work has shown that feature subset selection can have a position affect on the performance of machine learning algorithms. Some algorithms can be slowed or their performance adversely affected by too much data some of which may be irrelevant or redundant to the learning task. Feature subset selection, then, is a method of enhancing the performance of learning algorithms, reducing the hypothesis search space, and, in some cases, reducing the storage requirement. This paper describes a feature subset selector that uses a correlation based heuristic to determine the goodness of feature subsets, and evaluates its effectiveness with three common ML algorithms: a decision tree inducer (C4.5), a naive Bayes classifier, and an instance based learner(IBI). Experiments using a number of standard data sets drawn from real and artificial domains are presented. Feature subset selection gave significant improvement for all three algorithms; C4.5 generated smaller decision trees
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
Decision table for classifying point sources based on FIRST and 2MASS databases
With the availability of multiwavelength, multiscale and multiepoch
astronomical catalogues, the number of features to describe astronomical
objects has increases. The better features we select to classify objects, the
higher the classification accuracy is. In this paper, we have used data sets of
stars and quasars from near infrared band and radio band. Then best-first
search method was applied to select features. For the data with selected
features, the algorithm of decision table was implemented. The classification
accuracy is more than 95.9%. As a result, the feature selection method improves
the effectiveness and efficiency of the classification method. Moreover the
result shows that decision table is robust and effective for discrimination of
celestial objects and used for preselecting quasar candidates for large survey
projects.Comment: 10 pages. accepted by Advances in Space Researc
- …