5 research outputs found

    Bayesian Classifiers Programmed In SQL Using PCA

    Get PDF
    The Bayesian classifier is a fundamental classification technique We also consider different concepts regarding Dimensionality Reduction techniques for retrieving lossless data In this paper we proposed a new architecture for pre-processing the data Here we improved our Bayesian classifier to produce more accurate models with skewed distributions data sets with missing information and subsets of points having significant overlap with each other which are known issues for clustering algorithms so we are interested in combining Dimensionality Reduction technique like PCA with Bayesian Classifiers to accelerate computations and evaluate complex mathematical equations The proposed architecture in this project contains the following stages pre-processing of input data Na ve Bayesian classifier Bayesian classifier Principal component analysis and database Principal Component Analysis PCA is the process of reducing components by calculating Eigen values and Eigen Vectors We consider two algorithms in this paper Bayesian Classifier based on KMeans BKM and Na ve Bayesian Classifier Algorithm N

    A decomposition of classes via clustering to explain and improve naive bayes

    No full text
    Abstract. We propose a method to improve the probability estimates made by Naive Bayes to avoid the effects of poor class conditional probabilities based on product distributions when each class spreads into multiple regions. Our approach is based on applying a clustering algorithm to each subset of examples that belong to the same class, and to consider each cluster as a class of its own. Experiments on 26 real-world datasets show a significant improvement in performance when the class decomposition process is applied, particularly when the mean number of clusters per class is large.

    Predictive Learning with Heterogeneity in Populations

    Get PDF
    University of Minnesota Ph.D. dissertation. October 2017. Major: Computer Science. Advisor: Vipin Kumar. 1 computer file (PDF); x, 119 pages.Predictive learning forms the backbone of several data-driven systems powering scientific as well as commercial applications, e.g., filtering spam messages, detecting faces in images, forecasting health risks, and mapping ecological resources. However, one of the major challenges in applying standard predictive learning methods in real-world applications is the heterogeneity in populations of data instances, i.e., different groups (or populations) of data instances show different nature of predictive relationships. For example, different populations of human subjects may show different risks for a disease even if they have similar diagnosis reports, depending on their ethnic profiles, medical history, and lifestyle choices. In the presence of population heterogeneity, a central challenge is that the training data comprises of instances belonging from multiple populations, and the instances in the test set may be from a different population than that of the training instances. This limits the effectiveness of standard predictive learning frameworks that are based on the assumption that the instances are independent and identically distributed (i.i.d), which are ideally true only in simplistic settings. This thesis introduces several ways of learning predictive models with heterogeneity in populations, by incorporating information about the context of every data instance, which is available in varying types and formats in different application settings. It introduces a novel multi-task learning framework for problems where we have access to some ancillary variables that can be grouped to produce homogeneous partitions of data instances, thus addressing the heterogeneity in populations. This thesis also introduces a novel strategy for constructing mode-specific ensembles in binary classification settings, where each class shows multi-modal distribution due to the heterogeneity in their populations. When the context of data instances is implicitly defined such that the test data is known to comprise of contextually similar groups, this thesis presents a novel framework for adapting classification decisions using the group-level properties of test instances. This thesis also builds the foundations of a novel paradigm of scientific discovery, termed as theory-guided data science, that seeks to explore the full potential of data science methods but without ignoring the treasure of knowledge contained in scientific theories and principles
    corecore