567 research outputs found

    A Classification of Remote Sensing Image Based on Improved Compound Kernels of Svm

    Full text link

    Modeling variable dependencies between characters in Chinese information retrieval

    Get PDF
    Abstract. Chinese IR can work on words and/or character n-grams. In previous studies, when several types of index are used, independence is usually assumed between them, which obviously is not true in reality. In this paper, we propose a model for Chinese IR that integrates different types of dependency between Chinese characters. The role of a pair of dependent characters in the matching process is variable, depending on the pair’s ability to describe the underlying meaning and to retrieve relevant documents. The weight of the pair is learnt using SVM. Our experiments on TREC and NTCIR Chinese collections show that our model can significantly outperform most existing approaches. The results confirm the necessity to integrate dependent pairs of characters in Chinese IR and to use them according to their possible contribution to IR

    Bayesian Reward Filtering

    Full text link

    A Unifying View of Multiple Kernel Learning

    Full text link
    Recent research on multiple kernel learning has lead to a number of approaches for combining kernels in regularized risk minimization. The proposed approaches include different formulations of objectives and varying regularization strategies. In this paper we present a unifying general optimization criterion for multiple kernel learning and show how existing formulations are subsumed as special cases. We also derive the criterion's dual representation, which is suitable for general smooth optimization algorithms. Finally, we evaluate multiple kernel learning in this framework analytically using a Rademacher complexity bound on the generalization error and empirically in a set of experiments

    Competing with stationary prediction strategies

    Get PDF
    In this paper we introduce the class of stationary prediction strategies and construct a prediction algorithm that asymptotically performs as well as the best continuous stationary strategy. We make mild compactness assumptions but no stochastic assumptions about the environment. In particular, no assumption of stationarity is made about the environment, and the stationarity of the considered strategies only means that they do not depend explicitly on time; we argue that it is natural to consider only stationary strategies even for highly non-stationary environments.Comment: 20 page

    Improving the Accuracy of a Two-Stage Algorithm in Evolutionary Product Unit Neural Networks for Classification by Means of Feature Selection

    Get PDF
    This paper introduces a methodology that improves the accuracy of a two-stage algorithm in evolutionary product unit neural networks for classification tasks by means of feature selection. A couple of filters have been taken into consideration to try out the proposal. The experimentation has been carried out on seven data sets from the UCI repository that report test mean accuracy error rates about twenty percent or above with reference classifiers such as C4.5 or 1-NN. The study includes an overall empirical comparison between the models obtained with and without feature selection. Also several classifiers have been tested in order to illustrate the performance of the different filters considered. The results have been contrasted with nonparametric statistical tests and show that our proposal significantly improves the test accuracy of the previous models for the considered data sets. Moreover, the current proposal is much more efficient than a previous methodology developed by us; lastly, the reduction percentage in the number of inputs is above a fifty five, on average.MICYT TIN2007-68084-C02-02MICYT TIN2008-06681-C06-03Junta de Andalucía P08-TIC-374

    A Study of Machine Learning Techniques for Daily Solar Energy Forecasting using Numerical Weather Models

    Get PDF
    Proceedings of: 8th International Symposium on Intelligent Distributed Computing (IDC'2014). Madrid, September 3-5, 2014Forecasting solar energy is becoming an important issue in the context of renewable energy sources and Machine Learning Algorithms play an important rule in this field. The prediction of solar energy can be addressed as a time series prediction problem using historical data. Also, solar energy forecasting can be derived from numerical weather prediction models (NWP). Our interest is focused on the latter approach.We focus on the problem of predicting solar energy from NWP computed from GEFS, the Global Ensemble Forecast System, which predicts meteorological variables for points in a grid. In this context, it can be useful to know how prediction accuracy improves depending on the number of grid nodes used as input for the machine learning techniques. However, using the variables from a large number of grid nodes can result in many attributes which might degrade the generalization performance of the learning algorithms. In this paper both issues are studied using data supplied by Kaggle for the State of Oklahoma comparing Support Vector Machines and Gradient Boosted Regression. Also, three different feature selection methods have been tested: Linear Correlation, the ReliefF algorithm and, a new method based on local information analysis.Publicad

    Generation and matching of ontology data for the semantic web in a peer-to-peer framework

    Full text link
    The abundance of ontology data is very crucial to the emerging semantic web. This paper proposes a framework that supports the generation of ontology data in a peer-to-peer environment. It not only enables users to convert existing structured data to ontology data aligned with given ontology schemas, but also allows them to publish new ontology data with ease. Besides ontology data generation, the common issue of data overlapping over the peers is addressed by the process of ontology data matching in the framework. This process helps turn the implicitly related data among the peers caused by overlapping into explicitly interlinked ontology data, which increases the overall quality of the ontology data. To improve matching accuracy, we explore ontology related features in the matching process. Experiments show that adding these features achieves better overall performance than using traditional features only. © Springer-Verlag Berlin Heidelberg 2007

    Modeling User Search Behavior for Masquerade Detection

    Get PDF
    Masquerade attacks are a common security problem that is a consequence of identity theft. This paper extends prior work by modeling user search behavior to detect deviations indicating a masquerade attack. We hypothesize that each individual user knows their own file system well enough to search in a limited, targeted and unique fashion in order to find information germane to their current task. Masqueraders, on the other hand, will likely not know the file system and layout of another user's desktop, and would likely search more extensively and broadly in a manner that is different than the victim user being impersonated. We identify actions linked to search and information access activities, and use them to build user models. The experimental results show that modeling search behavior reliably detects all masqueraders with a very low false positive rate of 1.1%, far better than prior published results. The limited set of features used for search behavior modeling also results in large performance gains over the same modeling techniques that use larger sets of features
    corecore