53 research outputs found

    Selective naive Bayes predictor with mixtures of truncated exponentials

    Get PDF
    Naive Bayes models have been successfully used in classification problems where the class variable is discrete. Naive Bayes models have been applied to regression or prediction problems, i.e. classification problems with continuous class, but usually under the assumption that the joint distribution of the feature variables and the class is multivariate Gaussian. In this paper we are interested in regres- sion problems where some of the feature variables are discrete while the others are continuous. We propose a Naive Bayes predictor based on the approximation of the joint distribution by a Mixture of Truncated Exponentials (MTE). We have designed a procedure for selecting the variables that should be used in the construction of the model. This scheme is based on the mutual information between each of the candidate variables and the class. Since the mutual information can not be computed exactly for the MTE distribution, we introduce an unbiased estimator of it, based on Monte Carlo methods. We test the performance of the proposed model in three real life problems, related to higher education management

    Learning naive Bayes regression models with missing data using mixtures of truncated exponentials

    Get PDF
    In the last years, mixtures of truncated exponentials (MTEs) have received much attention within the context of probabilistic graphical models, as they provide a framework for hybrid Bayesian networks which is compatible with standard inference algorithms and no restriction on the structure of the network is considered. Recently, MTEs have also been successfully applied to regression problems in which the underlying network structure is a na ̈ıve Bayes or a TAN. However, the algorithms described so far in the literature operate over complete databases. In this paper we propose an iterative algorithm for constructing na ̈ıve Bayes regression models from incomplete databases. It is based on a variation of the data augmentation method in which the missing values of the explanatory variables are filled by simulating from their posterior distributions, while the missing values of the response variable are generated from its conditional expectation given the explanatory variables. We illustrate through a set of experiments with various databases that the proposed algorithm behaves reasonably well

    LEARNING BAYESIAN NETWORKS FOR REGRESSION FROM INCOMPLETE DATABASES*

    Get PDF
    In this paper we address the problem of inducing Bayesian network models for regression from incomplete databases. We use mixtures of truncated exponentials (MTEs) to represent the joint distribution in the induced networks. We consider two particular Bayesian network structures, the so-called na¨ıve Bayes and TAN, which have been successfully used as regression models when learning from complete data. We propose an iterative procedure for inducing the models, based on a variation of the data augmentation method in which the missing values of the explanatory variables are filled by simulating from their posterior distributions, while the missing values of the response variable are generated using the conditional expectation of the response given the explanatory variables. We also consider the refinement of the regression models by using variable selection and bias reduction. We illustrate through a set of experiments with various databases the performance of the proposed algorithms

    Regression using hybrid Bayesian networks: modelling landscape - socioeconomy relationships

    Get PDF
    Modelling environmental systems becomes a challenge when dealing directly with continuous and discrete data simultaneously. The aim in regression is to give a prediction of a response variable given the value of some feature variables. Multiple linear regression models, commonly used in environmental science, have a number of limitations: (1) all feature variables must be instantiated to obtain a prediction, and (2) the inclusion of categorical variables usually yields more complicated models. Hybrid Bayesian networks are an appropriate approach to solve regression problems without such limitations, and they also provide additional advantages. This methodology is applied to modelling landscape - socioeconomy relationships for different types of data (continuous, discrete or hybrid). Three models relating socioeconomy and landscape are proposed, and two scenarios of socioeconomic change are introduced in each one to obtain a prediction. This proposal can be easily applied to other areas in environmental modelling

    Groundwater quality assessment using data clustering based on hybrid Bayesian networks

    Get PDF
    Bayesian networks have become a standard in the field of Artificial Intelligence as a means of dealing with uncertainty and risk modelling. In recent years, there has been particular interest in the simultaneous use of continuous and discrete domains, obviating the need for discretization, using so-called hybrid Bayesian networks. In these hybrid environments, Mixtures of Truncated Exponentials (MTEs) provide a suitable solution for working without any restriction. The objective of this study is the assessment of groundwater quality through the design and application of a probabilistic clustering, based on hybrid Bayesian networks with MTEs. Firstly, the results obtained allows the differentiation of three groups of sampling points, indicating three different classes of groundwater quality. Secondly, the probability that a sampling point belongs to each cluster allows the uncertainty in the clusters to be assessed, as well as the risks associated in terms of water quality management. The methodology developed could be applied to other fields in environmental sciences

    Directional naive Bayes classifiers

    Get PDF
    Directional data are ubiquitous in science. These data have some special properties that rule out the use of classical statistics. Therefore, different distributions and statistics, such as the univariate von Mises and the multivariate von Mises–Fisher distributions, should be used to deal with this kind of information. We extend the naive Bayes classifier to the case where the conditional probability distributions of the predictive variables follow either of these distributions. We consider the simple scenario, where only directional predictive variables are used, and the hybrid case, where discrete, Gaussian and directional distributions are mixed. The classifier decision functions and their decision surfaces are studied at length. Artificial examples are used to illustrate the behavior of the classifiers. The proposed classifiers are then evaluated over eight datasets, showing competitive performances against other naive Bayes classifiers that use Gaussian distributions or discretization to manage directional data

    Bayesian networks in environmental modeling

    Get PDF
    Bayesian networks (BNs), also known as Bayesian belief networks or Bayes nets, are a kind of probabilistic graphical model that has become very popular to practitioners mainly due to the powerful probability theory involved, which makes them able to deal with a wide range of problems.The goal of this review is to show how BNs are being used in environmental modelling. We are interested in the application of BNs, from January 1990 to December 2010, in the areas of the ISI Web of Knowledge related to Environmental Sciences. It is noted that only the 4.2% of the papers have been published under this item. The different steps that configure modelling via BNs have been revised: aim of the model, data preprocessing, model learning, validation and software. Our literature review indicates that BNs have barely been used for Environmental Science and their potential is, as yet, largely unexploited

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

    Internet of Things and Machine Learning Applications for Smart Precision Agriculture

    Get PDF
    Agriculture forms the major part of our Indian economy. In the current world, agriculture and irrigation are the essential and foremost sectors. It is a mandatory need to apply information and communication technology in our agricultural industries to aid agriculturalists and farmers to improve vice all stages of crop cultivation and post-harvest. It helps to enhance the country’s G.D.P. Agriculture needs to be assisted by modern automation to produce the maximum yield. The recent development in technology has a significant impact on agriculture. The evolutions of Machine Learning (ML) and the Internet of Things (IoT) have supported researchers to implement this automation in agriculture to support farmers. ML allows farmers to improve yield make use of effective land utilisation, the fruitfulness of the soil, level of water, mineral insufficiencies control pest, trim development and horticulture. Application of remote sensors like temperature, humidity, soil moisture, water level sensors and pH value will provide an idea to on active farming, which will show accuracy as well as practical agriculture to deal with challenges in the field. This advancement could empower agricultural management systems to handle farm data in an orchestrated manner and increase the agribusiness by formulating effective strategies. This paper highlights contribute to an overview of the modern technologies deployed to agriculture and suggests an outline of the current and potential applications, and discusses the challenges and possible solutions and implementations. Besides, it elucidates the problems, specific potential solutions, and future directions for the agriculture sector using Machine Learning and the Internet of things

    Proceedings of the 35th International Workshop on Statistical Modelling : July 20- 24, 2020 Bilbao, Basque Country, Spain

    Get PDF
    466 p.The InternationalWorkshop on Statistical Modelling (IWSM) is a reference workshop in promoting statistical modelling, applications of Statistics for researchers, academics and industrialist in a broad sense. Unfortunately, the global COVID-19 pandemic has not allowed holding the 35th edition of the IWSM in Bilbao in July 2020. Despite the situation and following the spirit of the Workshop and the Statistical Modelling Society, we are delighted to bring you the proceedings book of extended abstracts
    corecore