Search CORE

2,030 research outputs found

A Data-Intensive Lightweight Semantic Wrapper Approach to Aid Information Integration

Author: Bao Jie
Braines Dave
Kalfoglou Yannis
Shadbolt Nigel
Smart Paul R
Publication venue
Publication date
Field of study

We argue for the flexible use of lightweight ontologies to aid information integration. Our proposed approach is grounded on the availability and exploitation of existing data sources in a networked environment such as the world wide web (instance data as it is commonly known in the description logic and ontology community). We have devised a mechanism using Semantic Web technologies that wraps each existing data source with semantic information, and we refer to this technique as SWEDER (Semantic Wrapping of Existing Data Sources with Embedded Rules). This technique provides representational homogeneity and a firm basis for information integration amongst these semantically enabled data sources. This technique also directly supports information integration though the use of context ontologies to align two or more semantically wrapped data sources and capture the rules that define these integrations. We have tested this proposed approach using a simple implementation in the domain of organisational and communication data and we speculate on the future directions for this lightweight approach to semantic enablement and contextual alignment of existing network-available data sources

Southampton (e-Prints Soton)

Feature selection for chemical sensor arrays using mutual information

Author: A Kraskov
A Krause
A Rakotomamonjy
A Vergara
A Vergara
A Vergara
A Vergara
Amalia Z. Berna
AZ Berna
B Nelson
B Raman
C Cortes
C Guestrin
CC Chang
CE Shannon
E Llobet
H Dacres
H Koinuma
H Peng
H Zheng
I Guyon
I Rodriguez-Lujan
I Rodriguez-Lujan
IJ Myung
J Fonollosa
J Gardner
James P. Brody
Joseph T. Lizier
L Breiman
L Olsson
M Aleixandre
M Pardo
M Pardo
Mikhail Prokopenko
MK Muezzinoglu
N Friedman
R Battiti
R Binions
S Marco
S Martínez
S Pashami
Stephen C. Trowell
T Nowotny
TC Pearce
Thomas Nowotny
TM Cover
X. Rosalind Wang
XR Wang
Y Saeys
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Western Sydney ResearchDirect

Sussex Research Online

FigShare

An investigation of genetic algorithm-based feature selection techniques applied to keystroke dynamics biometrics

Author: Da Costa Abreu Marjory
Lima do Nascimento Tuany Mariah
Monteiro de Oliveira Andrelyne Vitória
Oliveira Laura
Publication venue: Brazilian Computer Society (SBC)
Publication date: 02/09/2019
Field of study

Due to the continuous use of social networks, users can be vulnerable to online situations such as paedophilia treats. One of the ways to do the investigation of an alleged pedophile is to verify the legitimacy of the genre that it claims. One possible technique to adopt is keystroke dynamics analysis. However, this technique can extract many attributes, causing a negative impact on the accuracy of the classifier due to the presence of redundant and irrelevant attributes. Thus, this work using the wrapper approach in features selection using genetic algorithms and as KNN, SVM and Naive Bayes classifiers. Bringing as best result the SVM classifier with 90% accuracy, identifying what is most suitable for both bases

Sheffield Hallam University Research Archive

Feature diversity for optimized human micro-doppler classification using multistatic radar

Author: Fioranelli Francesco
Griffiths Hugh
Gürbüz Sevgi Zübeyde
Ritchie Matthew
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/01/2017
Field of study

This paper investigates the selection of different combinations of features at different multistatic radar nodes, depending on scenario parameters, such as aspect angle to the target and signal-to-noise ratio, and radar parameters, such as dwell time, polarisation, and frequency band. Two sets of experimental data collected with the multistatic radar system NetRAD are analysed for two separate problems, namely the classification of unarmed vs potentially armed multiple personnel, and the personnel recognition of individuals based on walking gait. The results show that the overall classification accuracy can be significantly improved by taking into account feature diversity at each radar node depending on the environmental parameters and target behaviour, in comparison with the conventional approach of selecting the same features for all nodes

Crossref

UCL Discovery

Enlighten

Combining similarity in time and space for training set formation under concept drift

Author: Zliobaite Indre
Publication venue: 'IOS Press'
Publication date: 01/01/2011
Field of study

Concept drift is a challenge in supervised learning for sequential data. It describes a phenomenon when the data distributions change over time. In such a case accuracy of a classifier benefits from the selective sampling for training. We develop a method for training set selection, particularly relevant when the expected drift is gradual. Training set selection at each time step is based on the distance to the target instance. The distance function combines similarity in space and in time. The method determines an optimal training set size online at every time step using cross validation. It is a wrapper approach, it can be used plugging in different base classifiers. The proposed method shows the best accuracy in the peer group on the real and artificial drifting data. The method complexity is reasonable for the field applications

Repository TU/e

Crossref

Pure OAI Repository

Bournemouth University Research Online

Linear Time Feature Selection for Regularized Least-Squares

Author: Airola Antti
Pahikkala Tapio
Salakoski Tapio
Publication venue
Publication date: 01/01/2010
Field of study

We propose a novel algorithm for greedy forward feature selection for regularized least-squares (RLS) regression and classification, also known as the least-squares support vector machine or ridge regression. The algorithm, which we call greedy RLS, starts from the empty feature set, and on each iteration adds the feature whose addition provides the best leave-one-out cross-validation performance. Our method is considerably faster than the previously proposed ones, since its time complexity is linear in the number of training examples, the number of features in the original data set, and the desired size of the set of selected features. Therefore, as a side effect we obtain a new training algorithm for learning sparse linear RLS predictors which can be used for large scale learning. This speed is possible due to matrix calculus based short-cuts for leave-one-out and feature addition. We experimentally demonstrate the scalability of our algorithm and its ability to find good quality feature sets.Comment: 17 pages, 15 figure

arXiv.org e-Print Archive

CiteSeerX

An optimization-based wrapper approach for utility-based data mining

Author: José Francisco Cagigal da Silva Gomes
Publication venue
Publication date: 12/07/2019
Field of study

Repositório Aberto da Universidade do Porto