41 research outputs found

    A comparative study of selected classification accuracy in user profiling

    Get PDF
    In recent years the used of personalization in service provisioning applications has been very popular. However, effective personalization cannot be achieved without accurate user profiles. A number of classification algorithms have been used to classify user related information to create accurate user profiles. In this study four different classification algorithms which are; naive Bayesian (NB), Bayesian Networks (BN), lazy learning of Bayesian rules (LBR) and instance-based learner (IB1) are compared using a set of user profile data. According to our simulation results NB and IB1 classifiers have the highest classification accuracy with the lowest error rate

    Classification accuracy performance of Naïve Bayesian (NB), Bayesian Networks (BN), Lazy Learning of Bayesian Rules(LBR) and Instance-Based Learner (IB1) - comparative study

    Get PDF
    In recent years the used of personalization in service provisioning applications has been very popular. However, effective personalization cannot be achieved without accurate user profiles. A number of classification algorithms have been used to classify user related information to create accurate user profiles. In this study four different classification algorithms which are; naive Bayesian (NB), Bayesian networks (BN), lazy learning of Bayesian rules (LBR) and instance-based learner (IB1) are compared using a set of user profile data. According to our simulation results NB and IB1 classifiers have the highest classification accuracy with the lowest error rate. The obtained simulation results have been evaluated against the existing works of support vector machines (SVMs), decision trees (DTs) and neural networks (NNs)

    A Decision tree-based attribute weighting filter for naive Bayes

    Get PDF
    The naive Bayes classifier continues to be a popular learning algorithm for data mining applications due to its simplicity and linear run-time. Many enhancements to the basic algorithm have been proposed to help mitigate its primary weakness--the assumption that attributes are independent given the class. All of them improve the performance of naïve Bayes at the expense (to a greater or lesser degree) of execution time and/or simplicity of the final model. In this paper we present a simple filter method for setting attribute weights for use with naive Bayes. Experimental results show that naive Bayes with attribute weights rarely degrades the quality of the model compared to standard naive Bayes and, in many cases, improves it dramatically. The main advantages of this method compared to other approaches for improving naive Bayes is its run-time complexity and the fact that it maintains the simplicity of the final model

    Multi-dimensional clustering in user profiling

    Get PDF
    User profiling has attracted an enormous number of technological methods and applications. With the increasing amount of products and services, user profiling has created opportunities to catch the attention of the user as well as achieving high user satisfaction. To provide the user what she/he wants, when and how, depends largely on understanding them. The user profile is the representation of the user and holds the information about the user. These profiles are the outcome of the user profiling. Personalization is the adaptation of the services to meet the user’s needs and expectations. Therefore, the knowledge about the user leads to a personalized user experience. In user profiling applications the major challenge is to build and handle user profiles. In the literature there are two main user profiling methods, collaborative and the content-based. Apart from these traditional profiling methods, a number of classification and clustering algorithms have been used to classify user related information to create user profiles. However, the profiling, achieved through these works, is lacking in terms of accuracy. This is because, all information within the profile has the same influence during the profiling even though some are irrelevant user information. In this thesis, a primary aim is to provide an insight into the concept of user profiling. For this purpose a comprehensive background study of the literature was conducted and summarized in this thesis. Furthermore, existing user profiling methods as well as the classification and clustering algorithms were investigated. Being one of the objectives of this study, the use of these algorithms for user profiling was examined. A number of classification and clustering algorithms, such as Bayesian Networks (BN) and Decision Trees (DTs) have been simulated using user profiles and their classification accuracy performances were evaluated. Additionally, a novel clustering algorithm for the user profiling, namely Multi-Dimensional Clustering (MDC), has been proposed. The MDC is a modified version of the Instance Based Learner (IBL) algorithm. In IBL every feature has an equal effect on the classification regardless of their relevance. MDC differs from the IBL by assigning weights to feature values to distinguish the effect of the features on clustering. Existing feature weighing methods, for instance Cross Category Feature (CCF), has also been investigated. In this thesis, three feature value weighting methods have been proposed for the MDC. These methods are; MDC weight method by Cross Clustering (MDC-CC), MDC weight method by Balanced Clustering (MDC-BC) and MDC weight method by changing the Lower-limit to Zero (MDC-LZ). All of these weighted MDC algorithms have been tested and evaluated. Additional simulations were carried out with existing weighted and non-weighted IBL algorithms (i.e. K-Star and Locally Weighted Learning (LWL)) in order to demonstrate the performance of the proposed methods. Furthermore, a real life scenario is implemented to show how the MDC can be used for the user profiling to improve personalized service provisioning in mobile environments. The experiments presented in this thesis were conducted by using user profile datasets that reflect the user’s personal information, preferences and interests. The simulations with existing classification and clustering algorithms (e.g. Bayesian Networks (BN), Naïve Bayesian (NB), Lazy learning of Bayesian Rules (LBR), Iterative Dichotomister 3 (Id3)) were performed on the WEKA (version 3.5.7) machine learning platform. WEKA serves as a workbench to work with a collection of popular learning schemes implemented in JAVA. In addition, the MDC-CC, MDC-BC and MDC-LZ have been implemented on NetBeans IDE 6.1 Beta as a JAVA application and MATLAB. Finally, the real life scenario is implemented as a Java Mobile Application (Java ME) on NetBeans IDE 7.1. All simulation results were evaluated based on the error rate and accuracy

    Robust Bayesian Linear Classifier Ensembles

    Get PDF
    The original publication is available at http://www.springerlink.comEnsemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect modelling assumptions are possibly made, uniform averaging sometimes performs even better than bayesian model averaging. Linear mixtures over sets of models provide an space that includes uniform averaging as a particular case. We develop two algorithms for learning maximum a posteriori weights for linear mixtures, based on expectation maximization and on constrained optimization. We provide a nontrivial example of the utility of these two algorithms by applying them for one dependence estimators.We develop the conjugate distribution for one dependence estimators and empirically show that uniform averaging is clearly superior to BMA for this family of models. After that we empirically show that the maximum a posteriori linear mixture weights improve accuracy significantly over uniform aggregation.Peer reviewe

    Naive Feature Selection: Sparsity in Naive Bayes

    Full text link
    Due to its linear complexity, naive Bayes classification remains an attractive supervised learning method, especially in very large-scale settings. We propose a sparse version of naive Bayes, which can be used for feature selection. This leads to a combinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We prove that our bound becomes tight as the marginal contribution of additional features decreases. Both binary and multinomial sparse models are solvable in time almost linear in problem size, representing a very small extra relative cost compared to the classical naive Bayes. Numerical experiments on text data show that the naive Bayes feature selection method is as statistically effective as state-of-the-art feature selection methods such as recursive feature elimination, l1l_1-penalized logistic regression and LASSO, while being orders of magnitude faster. For a large data set, having more than with 1.61.6 million training points and about 1212 million features, and with a non-optimized CPU implementation, our sparse naive Bayes model can be trained in less than 15 seconds

    A "non-parametric" version of the naive Bayes classifier

    Get PDF
    Many algorithms have been proposed for the machine learning task of classication. One of the simplest methods, the naive Bayes classifyer, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a Normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-Normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be Normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of Normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-Normal distributions are observed
    corecore