2,030 research outputs found

    A Data-Intensive Lightweight Semantic Wrapper Approach to Aid Information Integration

    No full text
    We argue for the flexible use of lightweight ontologies to aid information integration. Our proposed approach is grounded on the availability and exploitation of existing data sources in a networked environment such as the world wide web (instance data as it is commonly known in the description logic and ontology community). We have devised a mechanism using Semantic Web technologies that wraps each existing data source with semantic information, and we refer to this technique as SWEDER (Semantic Wrapping of Existing Data Sources with Embedded Rules). This technique provides representational homogeneity and a firm basis for information integration amongst these semantically enabled data sources. This technique also directly supports information integration though the use of context ontologies to align two or more semantically wrapped data sources and capture the rules that define these integrations. We have tested this proposed approach using a simple implementation in the domain of organisational and communication data and we speculate on the future directions for this lightweight approach to semantic enablement and contextual alignment of existing network-available data sources

    Feature selection for chemical sensor arrays using mutual information

    Get PDF
    We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays

    An investigation of genetic algorithm-based feature selection techniques applied to keystroke dynamics biometrics

    Get PDF
    Due to the continuous use of social networks, users can be vulnerable to online situations such as paedophilia treats. One of the ways to do the investigation of an alleged pedophile is to verify the legitimacy of the genre that it claims. One possible technique to adopt is keystroke dynamics analysis. However, this technique can extract many attributes, causing a negative impact on the accuracy of the classifier due to the presence of redundant and irrelevant attributes. Thus, this work using the wrapper approach in features selection using genetic algorithms and as KNN, SVM and Naive Bayes classifiers. Bringing as best result the SVM classifier with 90% accuracy, identifying what is most suitable for both bases

    Feature diversity for optimized human micro-doppler classification using multistatic radar

    Get PDF
    This paper investigates the selection of different combinations of features at different multistatic radar nodes, depending on scenario parameters, such as aspect angle to the target and signal-to-noise ratio, and radar parameters, such as dwell time, polarisation, and frequency band. Two sets of experimental data collected with the multistatic radar system NetRAD are analysed for two separate problems, namely the classification of unarmed vs potentially armed multiple personnel, and the personnel recognition of individuals based on walking gait. The results show that the overall classification accuracy can be significantly improved by taking into account feature diversity at each radar node depending on the environmental parameters and target behaviour, in comparison with the conventional approach of selecting the same features for all nodes

    Combining similarity in time and space for training set formation under concept drift

    Get PDF
    Concept drift is a challenge in supervised learning for sequential data. It describes a phenomenon when the data distributions change over time. In such a case accuracy of a classifier benefits from the selective sampling for training. We develop a method for training set selection, particularly relevant when the expected drift is gradual. Training set selection at each time step is based on the distance to the target instance. The distance function combines similarity in space and in time. The method determines an optimal training set size online at every time step using cross validation. It is a wrapper approach, it can be used plugging in different base classifiers. The proposed method shows the best accuracy in the peer group on the real and artificial drifting data. The method complexity is reasonable for the field applications

    Linear Time Feature Selection for Regularized Least-Squares

    Full text link
    We propose a novel algorithm for greedy forward feature selection for regularized least-squares (RLS) regression and classification, also known as the least-squares support vector machine or ridge regression. The algorithm, which we call greedy RLS, starts from the empty feature set, and on each iteration adds the feature whose addition provides the best leave-one-out cross-validation performance. Our method is considerably faster than the previously proposed ones, since its time complexity is linear in the number of training examples, the number of features in the original data set, and the desired size of the set of selected features. Therefore, as a side effect we obtain a new training algorithm for learning sparse linear RLS predictors which can be used for large scale learning. This speed is possible due to matrix calculus based short-cuts for leave-one-out and feature addition. We experimentally demonstrate the scalability of our algorithm and its ability to find good quality feature sets.Comment: 17 pages, 15 figure
    corecore