61 research outputs found

    Naive possibilistic classifiers for imprecise or uncertain numerical data

    Get PDF
    International audienceIn real-world problems, input data may be pervaded with uncertainty. In this paper, we investigate the behavior of naive possibilistic classifiers, as a counterpart to naive Bayesian ones, for dealing with classification tasks in the presence of uncertainty. For this purpose, we extend possibilistic classifiers, which have been recently adapted to numerical data, in order to cope with uncertainty in data representation. Here the possibility distributions that are used are supposed to encode the family of Gaussian probabilistic distributions that are compatible with the considered dataset. We consider two types of uncertainty: (i) the uncertainty associated with the class in the training set, which is modeled by a possibility distribution over class labels, and (ii) the imprecision pervading attribute values in the testing set represented under the form of intervals for continuous data. Moreover, the approach takes into account the uncertainty about the estimation of the Gaussian distribution parameters due to the limited amount of data available. We first adapt the possibilistic classification model, previously proposed for the certain case, in order to accommodate the uncertainty about class labels. Then, we propose an algorithm based on the extension principle to deal with imprecise attribute values. The experiments reported show the interest of possibilistic classifiers for handling uncertainty in data. In particular, the probability-to-possibility transform-based classifier shows a robust behavior when dealing with imperfect data

    Possibilistic classifiers for numerical data

    Get PDF
    International audienceNaive Bayesian Classifiers, which rely on independence hypotheses, together with a normality assumption to estimate densities for numerical data, are known for their simplicity and their effectiveness. However, estimating densities, even under the normality assumption, may be problematic in case of poor data. In such a situation, possibility distributions may provide a more faithful representation of these data. Naive Possibilistic Classifiers (NPC), based on possibility theory, have been recently proposed as a counterpart of Bayesian classifiers to deal with classification tasks. There are only few works that treat possibilistic classification and most of existing NPC deal only with categorical attributes. This work focuses on the estimation of possibility distributions for continuous data. In this paper we investigate two kinds of possibilistic classifiers. The first one is derived from classical or flexible Bayesian classifiers by applying a probability–possibility transformation to Gaussian distributions, which introduces some further tolerance in the description of classes. The second one is based on a direct interpretation of data in possibilistic formats that exploit an idea of proximity between data values in different ways, which provides a less constrained representation of them. We show that possibilistic classifiers have a better capability to detect new instances for which the classification is ambiguous than Bayesian classifiers, where probabilities may be poorly estimated and illusorily precise. Moreover, we propose, in this case, an hybrid possibilistic classification approach based on a nearest-neighbour heuristics to improve the accuracy of the proposed possibilistic classifiers when the available information is insufficient to choose between classes. Possibilistic classifiers are compared with classical or flexible Bayesian classifiers on a collection of benchmarks databases. The experiments reported show the interest of possibilistic classifiers. In particular, flexible possibilistic classifiers perform well for data agreeing with the normality assumption, while proximity-based possibilistic classifiers outperform others in the other cases. The hybrid possibilistic classification exhibits a good ability for improving accuracy

    Possibilistic networks parameter learning: Preliminary empirical comparison

    Get PDF
    International audienceLike Bayesian networks, possibilistic ones compactly encode joint uncertainty representations over a set of variables. Learning possibilistic networks from data in general and from imperfect or scarce data in particular, has not received enough attention. Indeed, only few works deal with learning the structure and the parameters of a possibilistic network from a dataset. This paper provides a preliminary comparative empirical evaluation of two approaches for learning the parameters of a possibilistic network from empirical data. The first method is a possibilistic approach while the second one first learns imprecise probability measures then transforms them into possibility distributions by means of probability-possibility transformations. The comparative evaluation focuses on learning belief networks on datasets with missing data and scarce datasets

    SPOCC: Scalable POssibilistic Classifier Combination -- toward robust aggregation of classifiers

    Full text link
    We investigate a problem in which each member of a group of learners is trained separately to solve the same classification task. Each learner has access to a training dataset (possibly with overlap across learners) but each trained classifier can be evaluated on a validation dataset. We propose a new approach to aggregate the learner predictions in the possibility theory framework. For each classifier prediction, we build a possibility distribution assessing how likely the classifier prediction is correct using frequentist probabilities estimated on the validation set. The possibility distributions are aggregated using an adaptive t-norm that can accommodate dependency and poor accuracy of the classifier predictions. We prove that the proposed approach possesses a number of desirable classifier combination robustness properties

    Ant Possibilistic Fuzzy Clustered Forecasting on High Dimensional Data

    Get PDF
    ABSTRACT: Stock market plays a significant role and has greater influence on basic economic energies of a country. Rapid changes in the stock exchange market with high dimensional uncertain data make the investors to look for effective forecasting using prediction mining techniques. The high dimensional stock data are classified into profitability, stability, cash flow and growth rate but does not deal completely with uncertain attribute values. On the other hand with large amount of uncertainty, the stock attributes and classes are not included simultaneously with the conditional probabilistic (i.e., Fuzzy set) distributional functions. Moreover, the test Possibilistic approaches (i.e., predictive mining) is not carried out on genuine uncertain data. So, the research pay attention on solving the forecasting problem with predictive data mining approach and helps the investors to select suitable portfolios. To forecast complex high dimensional uncertain data, Ant Possibilistic Fuzzy Clustered Forecasting (AP-FCF) method is proposed in this paper. AP-FCF method avoids the repeating mistake on uncertain stock attributes and classes and provides domain knowledge to the investors according to the current feature salience

    A Novel Classification of Uncertain Stream Data using Ant Colony Optimization Based on Radial Basis Function

    Get PDF
    There are many potential sources of data uncertainty, such as imperfect measurement or sampling, intrusive environmental monitoring, unreliable sensor networks, and inaccurate medical diagnoses. To avoid unintended results, data mining from new applications like sensors and location-based services needs to be done with care. When attempting to classify data with a high degree of uncertainty, many researchers have turned to heuristic approaches and machine learning (ML) methods. We propose an entirely new ML method in this paper by fusing the Radial Basis Function (RBF) network based on ant colony optimization (ACO). After introducing a large amount of uncertainty into a dataset, we normalize the data and finish training on clean data. The ant colony optimization algorithm is then used to train a recurrent neural network. Finally, we evaluate our proposed method against some of the most popular ML methods, including a k-nearest neighbor, support vector machine, random forest, decision tree, logistic regression, and extreme gradient boosting (Xgboost). Error metrics show that our model significantly outperforms the gold standard and other popular ML methods. Using industry-standard performance metrics, the results of our experiments show that our proposed method does a better job of classifying uncertain data than other method

    Water filtration by using apple and banana peels as activated carbon

    Get PDF
    Water filter is an important devices for reducing the contaminants in raw water. Activated from charcoal is used to absorb the contaminants. Fruit peels are some of the suitable alternative carbon to substitute the charcoal. Determining the role of fruit peels which were apple and banana peels powder as activated carbon in water filter is the main goal. Drying and blending the peels till they become powder is the way to allow them to absorb the contaminants. Comparing the results for raw water before and after filtering is the observation. After filtering the raw water, the reading for pH was 6.8 which is in normal pH and turbidity reading recorded was 658 NTU. As for the colour, the water becomes more clear compared to the raw water. This study has found that fruit peels such as banana and apple are an effective substitute to charcoal as natural absorbent

    Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods

    Full text link
    The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular.Comment: 59 page
    • …
    corecore