9,913 research outputs found
Survey of data mining approaches to user modeling for adaptive hypermedia
The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio
Decision Boundaries and Classification Performance Of SVM And KNN Classifiers For 2-Dimensional Dataset
Support Vector Machines (SVM) and K-Nearest Neighborhood (k-NN) are two most popular classifiers in machine learning. In this paper, we intend to study the generalization performance of the two classifiers by visualizing the decision boundary of each classifier when subjected to a two-dimensional (2-D) dataset. Four different sets of database comprising of 2-D datasets namely the eigenpostures of human (EPHuman), the breast cancer (BCancer), the Swiss roll (SRoll) and Twinpeaks (Tpeaks) were used in this study. Results obtained confirmed SVM classifier superb generalization performance since it contributed the lower classification error rate when compared to the k-NN classifier during the training for binary classification of all 2-D datasets. This is evident and can be clearly visualized through the plots depicting the decision boundaries of the binary classification task
Spatial aggregation of local likelihood estimates with applications to classification
This paper presents a new method for spatially adaptive local (constant)
likelihood estimation which applies to a broad class of nonparametric models,
including the Gaussian, Poisson and binary response models. The main idea of
the method is, given a sequence of local likelihood estimates (``weak''
estimates), to construct a new aggregated estimate whose pointwise risk is of
order of the smallest risk among all ``weak'' estimates. We also propose a new
approach toward selecting the parameters of the procedure by providing the
prescribed behavior of the resulting estimate in the simple parametric
situation. We establish a number of important theoretical results concerning
the optimality of the aggregated estimate. In particular, our ``oracle'' result
claims that its risk is, up to some logarithmic multiplier, equal to the
smallest risk for the given family of estimates. The performance of the
procedure is illustrated by application to the classification problem. A
numerical study demonstrates its reasonable performance in simulated and
real-life examples.Comment: Published in at http://dx.doi.org/10.1214/009053607000000271 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Fitness landscape of the cellular automata majority problem: View from the Olympus
In this paper we study cellular automata (CAs) that perform the computational
Majority task. This task is a good example of what the phenomenon of emergence
in complex systems is. We take an interest in the reasons that make this
particular fitness landscape a difficult one. The first goal is to study the
landscape as such, and thus it is ideally independent from the actual
heuristics used to search the space. However, a second goal is to understand
the features a good search technique for this particular problem space should
possess. We statistically quantify in various ways the degree of difficulty of
searching this landscape. Due to neutrality, investigations based on sampling
techniques on the whole landscape are difficult to conduct. So, we go exploring
the landscape from the top. Although it has been proved that no CA can perform
the task perfectly, several efficient CAs for this task have been found.
Exploiting similarities between these CAs and symmetries in the landscape, we
define the Olympus landscape which is regarded as the ''heavenly home'' of the
best local optima known (blok). Then we measure several properties of this
subspace. Although it is easier to find relevant CAs in this subspace than in
the overall landscape, there are structural reasons that prevent a searcher
from finding overfitted CAs in the Olympus. Finally, we study dynamics and
performance of genetic algorithms on the Olympus in order to confirm our
analysis and to find efficient CAs for the Majority problem with low
computational cost
- …