9,913 research outputs found

    Survey of data mining approaches to user modeling for adaptive hypermedia

    Get PDF
    The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio

    Decision Boundaries and Classification Performance Of SVM And KNN Classifiers For 2-Dimensional Dataset

    Get PDF
    Support Vector Machines (SVM) and K-Nearest Neighborhood (k-NN) are two most popular classifiers in machine learning. In this paper, we intend to study the generalization performance of the two classifiers by visualizing the decision boundary of each classifier when subjected to a two-dimensional (2-D) dataset. Four different sets of database comprising of 2-D datasets namely the eigenpostures of human (EPHuman), the breast cancer (BCancer), the Swiss roll (SRoll) and Twinpeaks (Tpeaks) were used in this study. Results obtained confirmed SVM classifier superb generalization performance since it contributed the lower classification error rate when compared to the k-NN classifier during the training for binary classification of all 2-D datasets. This is evident and can be clearly visualized through the plots depicting the decision boundaries of the binary classification task

    Spatial aggregation of local likelihood estimates with applications to classification

    Get PDF
    This paper presents a new method for spatially adaptive local (constant) likelihood estimation which applies to a broad class of nonparametric models, including the Gaussian, Poisson and binary response models. The main idea of the method is, given a sequence of local likelihood estimates (``weak'' estimates), to construct a new aggregated estimate whose pointwise risk is of order of the smallest risk among all ``weak'' estimates. We also propose a new approach toward selecting the parameters of the procedure by providing the prescribed behavior of the resulting estimate in the simple parametric situation. We establish a number of important theoretical results concerning the optimality of the aggregated estimate. In particular, our ``oracle'' result claims that its risk is, up to some logarithmic multiplier, equal to the smallest risk for the given family of estimates. The performance of the procedure is illustrated by application to the classification problem. A numerical study demonstrates its reasonable performance in simulated and real-life examples.Comment: Published in at http://dx.doi.org/10.1214/009053607000000271 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Fitness landscape of the cellular automata majority problem: View from the Olympus

    Get PDF
    In this paper we study cellular automata (CAs) that perform the computational Majority task. This task is a good example of what the phenomenon of emergence in complex systems is. We take an interest in the reasons that make this particular fitness landscape a difficult one. The first goal is to study the landscape as such, and thus it is ideally independent from the actual heuristics used to search the space. However, a second goal is to understand the features a good search technique for this particular problem space should possess. We statistically quantify in various ways the degree of difficulty of searching this landscape. Due to neutrality, investigations based on sampling techniques on the whole landscape are difficult to conduct. So, we go exploring the landscape from the top. Although it has been proved that no CA can perform the task perfectly, several efficient CAs for this task have been found. Exploiting similarities between these CAs and symmetries in the landscape, we define the Olympus landscape which is regarded as the ''heavenly home'' of the best local optima known (blok). Then we measure several properties of this subspace. Although it is easier to find relevant CAs in this subspace than in the overall landscape, there are structural reasons that prevent a searcher from finding overfitted CAs in the Olympus. Finally, we study dynamics and performance of genetic algorithms on the Olympus in order to confirm our analysis and to find efficient CAs for the Majority problem with low computational cost
    corecore