Search CORE

481,104 research outputs found

Determining appropriate approaches for using data in feature selection

Author: A Kalousis
C Ambroise
DW Aha
F Wilcoxon
G Chandrashekar
H Liu
J Reunanen
JC Platt
JR Quinlan
L Yu
M Lecocke
MA Hall
P Somol
V Bolón-Canedo
Y Han
Y Saeys
Z He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/12/2015
Field of study

Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases

Crossref

Springer - Publisher Connector

University of East Anglia digital repository

Weighted Heuristic Ensemble of Filters

Author: Aldehim Ghadah
Wang Wenjia
Publication venue
Publication date: 22/12/2015
Field of study

Feature selection has become increasingly important in data mining in recent years due to the rapid increase in the dimensionality of big data. However, the reliability and consistency of feature selection methods (filters) vary considerably on different data and no single filter performs consistently well under various conditions. Therefore, feature selection ensemble has been investigated recently to provide more reliable and effective results than any individual one but all the existing feature selection ensemble treat the feature selection methods equally regardless of their performance. In this paper, we present a novel framework which applies weighted feature selection ensemble through proposing a systemic way of adding different weights to the feature selection methods-filters. Also, we investigate how to determine the appropriate weight for each filter in an ensemble. Experiments based on ten benchmark datasets show that theoretically and intuitively adding more weight to ‘good filters’ should lead to better results but in reality it is very uncertain. This assumption was found to be correct for some examples in our experiment. However, for other situations, filters which had been assumed to perform well showed bad performance leading to even worse results. Therefore adding weight to filters might not achieve much in accuracy terms, in addition to increasing complexity, time consumption and clearly decreasing the stability

Crossref

University of East Anglia digital repository

Sequential projection pursuit for optimal transformation of autoregressive coefficients for damage detection in an experimental wind turbine blade

Author: Hoell Simon
Omenzetter Piotr
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

Crossref

Adaptive Parallel Iterative Deepening Search

Author: Cook D. J.
Varnell R. C.
Publication venue: 'AI Access Foundation'
Publication date: 26/05/2011
Field of study

Many of the artificial intelligence techniques developed to date rely on heuristic search through large spaces. Unfortunately, the size of these spaces and the corresponding computational effort reduce the applicability of otherwise novel and effective algorithms. A number of parallel and distributed approaches to search have considerably improved the performance of the search process. Our goal is to develop an architecture that automatically selects parallel search strategies for optimal performance on a variety of search problems. In this paper we describe one such architecture realized in the Eureka system, which combines the benefits of many different approaches to parallel heuristic search. Through empirical and theoretical analyses we observe that features of the problem space directly affect the choice of optimal parallel search strategy. We then employ machine learning techniques to select the optimal parallel search strategy for a given problem space. When a new search task is input to the system, Eureka uses features describing the search space and the chosen architecture to automatically select the appropriate search strategy. Eureka has been tested on a MIMD parallel processor, a distributed network of workstations, and a single workstation using multithreading. Results generated from fifteen puzzle problems, robot arm motion problems, artificial search spaces, and planning problems indicate that Eureka outperforms any of the tested strategies used exclusively for all problem instances and is able to greatly reduce the search time for these applications

arXiv.org e-Print Archive

Crossref

Selecting a Small Set of Optimal Gestures from an Extensive Lexicon

Author: Grosek Jacob
Kutz J. Nathan
Publication venue
Publication date: 30/04/2014
Field of study

Finding the best set of gestures to use for a given computer recognition problem is an essential part of optimizing the recognition performance while being mindful to those who may articulate the gestures. An objective function, called the ellipsoidal distance ratio metric (EDRM), for determining the best gestures from a larger lexicon library is presented, along with a numerical method for incorporating subjective preferences. In particular, we demonstrate an efficient algorithm that chooses the best

n

gestures from a lexicon of

m

gestures where typically

n \ll m

using a weighting of both subjective and objective measures.Comment: 27 pages, 7 figure

arXiv.org e-Print Archive

CiteSeerX