2,030 research outputs found
A Data-Intensive Lightweight Semantic Wrapper Approach to Aid Information Integration
We argue for the flexible use of lightweight ontologies to aid information integration. Our proposed approach is grounded on the availability and exploitation of existing data sources in a networked environment such as the world wide web (instance data as it is commonly known in the description logic and ontology community). We have devised a mechanism using Semantic Web technologies that wraps each existing data source with semantic information, and we refer to this technique as SWEDER (Semantic Wrapping of Existing Data Sources with Embedded Rules). This technique provides representational homogeneity and a firm basis for information integration amongst these semantically enabled data sources. This technique also directly supports information integration though the use of context ontologies to align two or more semantically wrapped data sources and capture the rules that define these integrations. We have tested this proposed approach using a simple implementation in the domain of organisational and communication data and we speculate on the future directions for this lightweight approach to semantic enablement and contextual alignment of existing network-available data sources
Feature selection for chemical sensor arrays using mutual information
We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays
An investigation of genetic algorithm-based feature selection techniques applied to keystroke dynamics biometrics
Due to the continuous use of social networks, users can be vulnerable to online situations such as paedophilia treats. One of the ways to do the
investigation of an alleged pedophile is to verify the legitimacy of the genre that
it claims. One possible technique to adopt is keystroke dynamics analysis. However, this technique can extract many attributes, causing a negative impact on
the accuracy of the classifier due to the presence of redundant and irrelevant
attributes. Thus, this work using the wrapper approach in features selection using genetic algorithms and as KNN, SVM and Naive Bayes classifiers. Bringing
as best result the SVM classifier with 90% accuracy, identifying what is most
suitable for both bases
Feature diversity for optimized human micro-doppler classification using multistatic radar
This paper investigates the selection of different combinations of features at different multistatic radar nodes, depending on scenario parameters, such as aspect angle to the target and signal-to-noise ratio, and radar parameters, such as dwell time, polarisation, and frequency band. Two sets of experimental data collected with the multistatic radar system NetRAD are analysed for two separate problems, namely the classification of unarmed vs potentially armed multiple personnel, and the personnel recognition of individuals based on walking gait. The results show that the overall classification accuracy can be significantly improved by taking into account feature diversity at each radar node depending on the environmental parameters and target behaviour, in comparison with the conventional approach of selecting the same features for all nodes
Combining similarity in time and space for training set formation under concept drift
Concept drift is a challenge in supervised learning for sequential data. It describes a phenomenon when the data distributions change over time. In such a case accuracy of a classifier benefits from the selective sampling for training. We develop a method for training set selection, particularly relevant when the expected drift is gradual. Training set selection at each time step is based on the distance to the target instance. The distance function combines similarity in space and in time. The method determines an optimal training set size online at every time step using cross validation. It is a wrapper approach, it can be used plugging in different base classifiers. The proposed method shows the best accuracy in the peer group on the real and artificial drifting data. The method complexity is reasonable for the field applications
Linear Time Feature Selection for Regularized Least-Squares
We propose a novel algorithm for greedy forward feature selection for
regularized least-squares (RLS) regression and classification, also known as
the least-squares support vector machine or ridge regression. The algorithm,
which we call greedy RLS, starts from the empty feature set, and on each
iteration adds the feature whose addition provides the best leave-one-out
cross-validation performance. Our method is considerably faster than the
previously proposed ones, since its time complexity is linear in the number of
training examples, the number of features in the original data set, and the
desired size of the set of selected features. Therefore, as a side effect we
obtain a new training algorithm for learning sparse linear RLS predictors which
can be used for large scale learning. This speed is possible due to matrix
calculus based short-cuts for leave-one-out and feature addition. We
experimentally demonstrate the scalability of our algorithm and its ability to
find good quality feature sets.Comment: 17 pages, 15 figure
- …