61 research outputs found
Naive possibilistic classifiers for imprecise or uncertain numerical data
International audienceIn real-world problems, input data may be pervaded with uncertainty. In this paper, we investigate the behavior of naive possibilistic classifiers, as a counterpart to naive Bayesian ones, for dealing with classification tasks in the presence of uncertainty. For this purpose, we extend possibilistic classifiers, which have been recently adapted to numerical data, in order to cope with uncertainty in data representation. Here the possibility distributions that are used are supposed to encode the family of Gaussian probabilistic distributions that are compatible with the considered dataset. We consider two types of uncertainty: (i) the uncertainty associated with the class in the training set, which is modeled by a possibility distribution over class labels, and (ii) the imprecision pervading attribute values in the testing set represented under the form of intervals for continuous data. Moreover, the approach takes into account the uncertainty about the estimation of the Gaussian distribution parameters due to the limited amount of data available. We first adapt the possibilistic classification model, previously proposed for the certain case, in order to accommodate the uncertainty about class labels. Then, we propose an algorithm based on the extension principle to deal with imprecise attribute values. The experiments reported show the interest of possibilistic classifiers for handling uncertainty in data. In particular, the probability-to-possibility transform-based classifier shows a robust behavior when dealing with imperfect data
Possibilistic classifiers for numerical data
International audienceNaive Bayesian Classifiers, which rely on independence hypotheses, together with a normality assumption to estimate densities for numerical data, are known for their simplicity and their effectiveness. However, estimating densities, even under the normality assumption, may be problematic in case of poor data. In such a situation, possibility distributions may provide a more faithful representation of these data. Naive Possibilistic Classifiers (NPC), based on possibility theory, have been recently proposed as a counterpart of Bayesian classifiers to deal with classification tasks. There are only few works that treat possibilistic classification and most of existing NPC deal only with categorical attributes. This work focuses on the estimation of possibility distributions for continuous data. In this paper we investigate two kinds of possibilistic classifiers. The first one is derived from classical or flexible Bayesian classifiers by applying a probability–possibility transformation to Gaussian distributions, which introduces some further tolerance in the description of classes. The second one is based on a direct interpretation of data in possibilistic formats that exploit an idea of proximity between data values in different ways, which provides a less constrained representation of them. We show that possibilistic classifiers have a better capability to detect new instances for which the classification is ambiguous than Bayesian classifiers, where probabilities may be poorly estimated and illusorily precise. Moreover, we propose, in this case, an hybrid possibilistic classification approach based on a nearest-neighbour heuristics to improve the accuracy of the proposed possibilistic classifiers when the available information is insufficient to choose between classes. Possibilistic classifiers are compared with classical or flexible Bayesian classifiers on a collection of benchmarks databases. The experiments reported show the interest of possibilistic classifiers. In particular, flexible possibilistic classifiers perform well for data agreeing with the normality assumption, while proximity-based possibilistic classifiers outperform others in the other cases. The hybrid possibilistic classification exhibits a good ability for improving accuracy
Possibilistic networks parameter learning: Preliminary empirical comparison
International audienceLike Bayesian networks, possibilistic ones compactly encode joint uncertainty representations over a set of variables. Learning possibilistic networks from data in general and from imperfect or scarce data in particular, has not received enough attention. Indeed, only few works deal with learning the structure and the parameters of a possibilistic network from a dataset. This paper provides a preliminary comparative empirical evaluation of two approaches for learning the parameters of a possibilistic network from empirical data. The first method is a possibilistic approach while the second one first learns imprecise probability measures then transforms them into possibility distributions by means of probability-possibility transformations. The comparative evaluation focuses on learning belief networks on datasets with missing data and scarce datasets
SPOCC: Scalable POssibilistic Classifier Combination -- toward robust aggregation of classifiers
We investigate a problem in which each member of a group of learners is
trained separately to solve the same classification task. Each learner has
access to a training dataset (possibly with overlap across learners) but each
trained classifier can be evaluated on a validation dataset. We propose a new
approach to aggregate the learner predictions in the possibility theory
framework. For each classifier prediction, we build a possibility distribution
assessing how likely the classifier prediction is correct using frequentist
probabilities estimated on the validation set. The possibility distributions
are aggregated using an adaptive t-norm that can accommodate dependency and
poor accuracy of the classifier predictions. We prove that the proposed
approach possesses a number of desirable classifier combination robustness
properties
Ant Possibilistic Fuzzy Clustered Forecasting on High Dimensional Data
ABSTRACT: Stock market plays a significant role and has greater influence on basic economic energies of a country. Rapid changes in the stock exchange market with high dimensional uncertain data make the investors to look for effective forecasting using prediction mining techniques. The high dimensional stock data are classified into profitability, stability, cash flow and growth rate but does not deal completely with uncertain attribute values. On the other hand with large amount of uncertainty, the stock attributes and classes are not included simultaneously with the conditional probabilistic (i.e., Fuzzy set) distributional functions. Moreover, the test Possibilistic approaches (i.e., predictive mining) is not carried out on genuine uncertain data. So, the research pay attention on solving the forecasting problem with predictive data mining approach and helps the investors to select suitable portfolios. To forecast complex high dimensional uncertain data, Ant Possibilistic Fuzzy Clustered Forecasting (AP-FCF) method is proposed in this paper. AP-FCF method avoids the repeating mistake on uncertain stock attributes and classes and provides domain knowledge to the investors according to the current feature salience
A Novel Classification of Uncertain Stream Data using Ant Colony Optimization Based on Radial Basis Function
There are many potential sources of data uncertainty, such as imperfect measurement or sampling, intrusive environmental monitoring, unreliable sensor networks, and inaccurate medical diagnoses. To avoid unintended results, data mining from new applications like sensors and location-based services needs to be done with care. When attempting to classify data with a high degree of uncertainty, many researchers have turned to heuristic approaches and machine learning (ML) methods. We propose an entirely new ML method in this paper by fusing the Radial Basis Function (RBF) network based on ant colony optimization (ACO). After introducing a large amount of uncertainty into a dataset, we normalize the data and finish training on clean data. The ant colony optimization algorithm is then used to train a recurrent neural network. Finally, we evaluate our proposed method against some of the most popular ML methods, including a k-nearest neighbor, support vector machine, random forest, decision tree, logistic regression, and extreme gradient boosting (Xgboost). Error metrics show that our model significantly outperforms the gold standard and other popular ML methods. Using industry-standard performance metrics, the results of our experiments show that our proposed method does a better job of classifying uncertain data than other method
Water filtration by using apple and banana peels as activated carbon
Water filter is an important devices for reducing the contaminants in raw water. Activated from charcoal is used to absorb the contaminants. Fruit peels are some of the suitable alternative carbon to substitute the charcoal. Determining the role of fruit peels which were apple and banana peels powder as activated carbon in water filter is the main goal. Drying and blending the peels till they become powder is the way to allow them to absorb the contaminants. Comparing the results for raw water before and after filtering is the observation. After filtering the raw water, the reading for pH was 6.8 which is in normal pH and turbidity reading recorded was 658 NTU. As for the colour, the water becomes more clear compared to the raw water. This study has found that fruit peels such as banana and apple are an effective substitute to charcoal as natural absorbent
Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods
The notion of uncertainty is of major importance in machine learning and
constitutes a key element of machine learning methodology. In line with the
statistical tradition, uncertainty has long been perceived as almost synonymous
with standard probability and probabilistic predictions. Yet, due to the
steadily increasing relevance of machine learning for practical applications
and related issues such as safety requirements, new problems and challenges
have recently been identified by machine learning scholars, and these problems
may call for new methodological developments. In particular, this includes the
importance of distinguishing between (at least) two different types of
uncertainty, often referred to as aleatoric and epistemic. In this paper, we
provide an introduction to the topic of uncertainty in machine learning as well
as an overview of attempts so far at handling uncertainty in general and
formalizing this distinction in particular.Comment: 59 page
- …