849 research outputs found

    A comparative study of selected classification accuracy in user profiling

    Get PDF
    In recent years the used of personalization in service provisioning applications has been very popular. However, effective personalization cannot be achieved without accurate user profiles. A number of classification algorithms have been used to classify user related information to create accurate user profiles. In this study four different classification algorithms which are; naive Bayesian (NB), Bayesian Networks (BN), lazy learning of Bayesian rules (LBR) and instance-based learner (IB1) are compared using a set of user profile data. According to our simulation results NB and IB1 classifiers have the highest classification accuracy with the lowest error rate

    Classification accuracy performance of NaĆÆve Bayesian (NB), Bayesian Networks (BN), Lazy Learning of Bayesian Rules(LBR) and Instance-Based Learner (IB1) - comparative study

    Get PDF
    In recent years the used of personalization in service provisioning applications has been very popular. However, effective personalization cannot be achieved without accurate user profiles. A number of classification algorithms have been used to classify user related information to create accurate user profiles. In this study four different classification algorithms which are; naive Bayesian (NB), Bayesian networks (BN), lazy learning of Bayesian rules (LBR) and instance-based learner (IB1) are compared using a set of user profile data. According to our simulation results NB and IB1 classifiers have the highest classification accuracy with the lowest error rate. The obtained simulation results have been evaluated against the existing works of support vector machines (SVMs), decision trees (DTs) and neural networks (NNs)

    A New Hierarchical Redundancy Eliminated Tree Augmented Naive Bayes Classifier for Coping with Gene Ontology-based Features

    Get PDF
    The Tree Augmented Naive Bayes classifier is a type of probabilistic graphical model that can represent some feature dependencies. In this work, we propose a Hierarchical Redundancy Eliminated Tree Augmented Naive Bayes (HRE-TAN) algorithm, which considers removing the hierarchical redundancy during the classifier learning process, when coping with data containing hierarchically structured features. The experiments showed that HRE-TAN obtains significantly better predictive performance than the conventional Tree Augmented Naive Bayes classifier, and enhanced the robustness against imbalanced class distributions, in aging-related gene datasets with Gene Ontology terms used as features.Comment: International Conference on Machine Learning (ICML 2016) Computational Biology Worksho

    A Decision tree-based attribute weighting filter for naive Bayes

    Get PDF
    The naive Bayes classifier continues to be a popular learning algorithm for data mining applications due to its simplicity and linear run-time. Many enhancements to the basic algorithm have been proposed to help mitigate its primary weakness--the assumption that attributes are independent given the class. All of them improve the performance of naĆÆve Bayes at the expense (to a greater or lesser degree) of execution time and/or simplicity of the final model. In this paper we present a simple filter method for setting attribute weights for use with naive Bayes. Experimental results show that naive Bayes with attribute weights rarely degrades the quality of the model compared to standard naive Bayes and, in many cases, improves it dramatically. The main advantages of this method compared to other approaches for improving naive Bayes is its run-time complexity and the fact that it maintains the simplicity of the final model

    Ontology-based annotation using naive Bayes and decision trees

    Get PDF
    The Cognitive Paradigm Ontology (CogPO) defines an ontological relationship between academic terms and experiments in the field of neuroscience. BrainMap (www.brainmap.org) is a database of literature describing these experiments, which are annotated by human experts based on the ontological framework defined in CogPO. We present a stochastic approach to automate this process. We begin with a gold standard corpus of abstracts annotated by experts, and model the annotations with a group of naive Bayes classifiers, then explore the inherent relationship among different components defined by the ontology using a probabilistic decision tree model. Our solution outperforms conventional text mining approaches by taking advantage of an ontology. We consider five essential ontological components (Stimulus Modality, Stimulus Type, Response Modality, Response Type, and Instructions) in CogPO, evaluate the probability of successfully categorizing a research paper on each component by training a basic multi-label naive Bayes classifier with a set of examples taken from the BrainMap database which are already manually annotated by human experts. According to the performance of the classifiers we create a decision tree to label the components sequentially on different levels. Each node of the decision tree is associated with a naive Bayes classifier built in different subspaces of the input universe. We first make decisions on those components whose labels are comparatively easy to predict, and then use these predetermined conditions to narrow down the input space along all tree paths, therefore boosting the performance of the naive Bayes classification upon components whose labels are difficult to predict. For annotating a new instance, we use the classifiers associated with the nodes to find labels for each component, starting from the root and then tracking down the tree perhaps on multiple paths. The annotation is completed when the bottom level is reached, where all labels produced along the paths are collected

    Applications Of Machine Learning In Biology And Medicine

    Get PDF
    Machine learning as a field is defined to be the set of computational algorithms that improve their performance by assimilating data. As such, the field as a whole has found applications in many diverse disciplines from robotics and communication in engineering to economics and finance, and also biology and medicine. It should not come as a surprise that many popular methods in use today have completely different origins. Despite this heterogeneity, different methods can be divided into standard tasks, such as supervised, unsupervised, semi-supervised and reinforcement learning. Although machine learning as a field can be formalized as methods trying to solve certain standard tasks, applying these tasks on datasets from different fields comes with certain caveats, and sometimes is fraught with challenges. In this thesis, we develop general procedures and novel solutions, dealing with practical problems that arise when modeling biological and medical data. Cost sensitive learning is an important area of research in machine learning which addresses the widespread and practical problem of dealing with different costs during the learning and deployment of classification algorithms. In many applications such as credit fraud detection, network intrusion and specifically medical diagnosis domains, prior class distributions are highly skewed, which makes the training examples very much unbalanced. Combining this with uneven misclassification costs renders standard machine learning approaches useless in learning an acceptable decision function. We experimentally show the benefits and shortcomings of various methods that convert cost blind learning algorithms to cost sensitive ones. Using the results and best practices found for cost sensitive learning, we design and develop a machine learning approach to ontology mapping. Next, we present a novel approach to deal with uncertainty in classification when costs are unknown or otherwise hard to assign. Support Vector Machines (SVM) are considered to be among the most successful approaches for classification. However prediction of instances near the decision boundary depends more on the specific parameter selection or noise in data, rather than a clear difference in features. In many applications such as medical diagnosis, these regions should be labeled as uncertain rather than assigned to any particular class. Furthermore, instances may belong to novel disease subtypes that are not from any previously known class. In such applications, declining to make a prediction could be beneficial when more powerful but expensive tests are available. We develop a novel approach for optimal selection of the threshold and show its successful application on three biological and medical datasets. The last part of this thesis provides novel solutions for handling high dimensional data. Although high-dimensional data is ubiquitously found in many disciplines, current life science research almost always involves high-dimensional genomics/proteomics data. The ``omics\u27\u27 data provide a wealth of information and have changed the research landscape in biology and medicine. However, these data are plagued with noise, redundancy and collinearity, which makes the discovery process very difficult and costly. Any method that can accurately detect irrelevant and noisy variables in omics data would be highly valuable. We present Robust Feature Selection (RFS), a randomized feature selection approach dedicated to low-sample high-dimensional data. RFS combines an embedded feature selection method with a randomization procedure for stability. Recent advances in sparse recovery and estimation methods have provided efficient and asymptotically consistent feature selection algorithms. However, these methods lack finite sample error control due to instability. Furthermore, the chances of correct recovery diminish with more collinearity among features. To overcome these difficulties, RFS uses a randomization procedure to provide an accurate and stable feature selection method. We thoroughly evaluate RFS by comparing it to a number of popular univariate and multivariate feature selection methods and show marked prediction accuracy improvement of a diagnostic signature, while preserving a good stability

    Machine learning classifiers: Evaluation of the performance in online reviews

    Get PDF
    This paper aims to evaluate the performance of the machine learning classifiers and identify the most suitable classifier for classifying sentiment value. The term ā€œsentiment valueā€ in this study is referring to the polarity (positive, negative or neutral) of the text. This work applies machine learning classifiers from WEKA (Waikato Environment for Knowledge Analysis) toolkit in order to perform their evaluation. WEKA toolkit is a great set of tools for data mining and classification. The performance of the machine learning classifiers was measured by examining overall accuracy, recall, precision, kappa statistic and applying few visualization techniques. Finally, the analysis is applied to find the most suitable classifier for classifying sentiment value. Results show that two classifiers from Rules and Trees categories of classifiers perform equally best comparing to the other classifiers from categories, such as Bayes, Functions, Lazy and Meta. This paper explores the performance of machine learning classifiers in sentiment value classification in the online reviews. Data used is never been used before to explore the performance of machine learning classifiers
    • ā€¦
    corecore