404 research outputs found

    Possibilistic classifiers for numerical data

    Get PDF
    International audienceNaive Bayesian Classifiers, which rely on independence hypotheses, together with a normality assumption to estimate densities for numerical data, are known for their simplicity and their effectiveness. However, estimating densities, even under the normality assumption, may be problematic in case of poor data. In such a situation, possibility distributions may provide a more faithful representation of these data. Naive Possibilistic Classifiers (NPC), based on possibility theory, have been recently proposed as a counterpart of Bayesian classifiers to deal with classification tasks. There are only few works that treat possibilistic classification and most of existing NPC deal only with categorical attributes. This work focuses on the estimation of possibility distributions for continuous data. In this paper we investigate two kinds of possibilistic classifiers. The first one is derived from classical or flexible Bayesian classifiers by applying a probability–possibility transformation to Gaussian distributions, which introduces some further tolerance in the description of classes. The second one is based on a direct interpretation of data in possibilistic formats that exploit an idea of proximity between data values in different ways, which provides a less constrained representation of them. We show that possibilistic classifiers have a better capability to detect new instances for which the classification is ambiguous than Bayesian classifiers, where probabilities may be poorly estimated and illusorily precise. Moreover, we propose, in this case, an hybrid possibilistic classification approach based on a nearest-neighbour heuristics to improve the accuracy of the proposed possibilistic classifiers when the available information is insufficient to choose between classes. Possibilistic classifiers are compared with classical or flexible Bayesian classifiers on a collection of benchmarks databases. The experiments reported show the interest of possibilistic classifiers. In particular, flexible possibilistic classifiers perform well for data agreeing with the normality assumption, while proximity-based possibilistic classifiers outperform others in the other cases. The hybrid possibilistic classification exhibits a good ability for improving accuracy

    Naive possibilistic classifiers for imprecise or uncertain numerical data

    Get PDF
    International audienceIn real-world problems, input data may be pervaded with uncertainty. In this paper, we investigate the behavior of naive possibilistic classifiers, as a counterpart to naive Bayesian ones, for dealing with classification tasks in the presence of uncertainty. For this purpose, we extend possibilistic classifiers, which have been recently adapted to numerical data, in order to cope with uncertainty in data representation. Here the possibility distributions that are used are supposed to encode the family of Gaussian probabilistic distributions that are compatible with the considered dataset. We consider two types of uncertainty: (i) the uncertainty associated with the class in the training set, which is modeled by a possibility distribution over class labels, and (ii) the imprecision pervading attribute values in the testing set represented under the form of intervals for continuous data. Moreover, the approach takes into account the uncertainty about the estimation of the Gaussian distribution parameters due to the limited amount of data available. We first adapt the possibilistic classification model, previously proposed for the certain case, in order to accommodate the uncertainty about class labels. Then, we propose an algorithm based on the extension principle to deal with imprecise attribute values. The experiments reported show the interest of possibilistic classifiers for handling uncertainty in data. In particular, the probability-to-possibility transform-based classifier shows a robust behavior when dealing with imperfect data

    Land cover classification using fuzzy rules and aggregation of contextual information through evidence theory

    Full text link
    Land cover classification using multispectral satellite image is a very challenging task with numerous practical applications. We propose a multi-stage classifier that involves fuzzy rule extraction from the training data and then generation of a possibilistic label vector for each pixel using the fuzzy rule base. To exploit the spatial correlation of land cover types we propose four different information aggregation methods which use the possibilistic class label of a pixel and those of its eight spatial neighbors for making the final classification decision. Three of the aggregation methods use Dempster-Shafer theory of evidence while the remaining one is modeled after the fuzzy k-NN rule. The proposed methods are tested with two benchmark seven channel satellite images and the results are found to be quite satisfactory. They are also compared with a Markov random field (MRF) model-based contextual classification method and found to perform consistently better.Comment: 14 pages, 2 figure

    Possibilistic networks parameter learning: Preliminary empirical comparison

    Get PDF
    International audienceLike Bayesian networks, possibilistic ones compactly encode joint uncertainty representations over a set of variables. Learning possibilistic networks from data in general and from imperfect or scarce data in particular, has not received enough attention. Indeed, only few works deal with learning the structure and the parameters of a possibilistic network from a dataset. This paper provides a preliminary comparative empirical evaluation of two approaches for learning the parameters of a possibilistic network from empirical data. The first method is a possibilistic approach while the second one first learns imprecise probability measures then transforms them into possibility distributions by means of probability-possibility transformations. The comparative evaluation focuses on learning belief networks on datasets with missing data and scarce datasets

    SPOCC: Scalable POssibilistic Classifier Combination -- toward robust aggregation of classifiers

    Full text link
    We investigate a problem in which each member of a group of learners is trained separately to solve the same classification task. Each learner has access to a training dataset (possibly with overlap across learners) but each trained classifier can be evaluated on a validation dataset. We propose a new approach to aggregate the learner predictions in the possibility theory framework. For each classifier prediction, we build a possibility distribution assessing how likely the classifier prediction is correct using frequentist probabilities estimated on the validation set. The possibility distributions are aggregated using an adaptive t-norm that can accommodate dependency and poor accuracy of the classifier predictions. We prove that the proposed approach possesses a number of desirable classifier combination robustness properties

    A Possibilistic Query Translation Approach for Cross-Language Information Retrieval

    Get PDF
    International audienceIn this paper, we explore several statistical methods to find solutions to the problem of query translation ambiguity. Indeed, we propose and compare a new possibilistic approach for query translation derived from a probabilistic one, by applying a classical probability-possibility transformation of probability distributions, which introduces a certain tolerance in the selection of word translations. Finally, the best words are selected based on a similarity measure. The experiments are performed on CLEF-2003 French-English CLIR collection, which allowed us to test the effectiveness of the possibilistic approach

    The Prototyping and Focused Discriminating Strategy for Pattern Recognition and one Instantiation: the MELIDIS System

    Get PDF
    This paper presents the Prototyping and Focused Discriminating (PFD) strategy for pattern recognition. This strategy takes benefits from the duality between model generation and discrimination. Both collaborate through a focusing mechanism that detects the conflicts between the class models and drive the discrimination. Classifiers based on this collaboration benefit from a set of useful properties. The MĂ©lidis system illustrates this strategy and extends its possibilities, using a fuzzy framework. As shown by experiments, the resulting system provides an interesting compromise between accuracy and compactness. Experiments also demonstrate the interest of the new strategy and of its focusing mechanism

    Conservative and aggressive rough SVR modeling

    Get PDF
    AbstractSupport vector regression provides an alternative to the neural networks in modeling non-linear real-world patterns. Rough values, with a lower and upper bound, are needed whenever the variables under consideration cannot be represented by a single value. This paper describes two approaches for the modeling of rough values with support vector regression (SVR). One approach, by attempting to ensure that the predicted high value is not greater than the upper bound and that the predicted low value is not less than the lower bound, is conservative in nature. On the contrary, we also propose an aggressive approach seeking a predicted high which is not less than the upper bound and a predicted low which is not greater than the lower bound. The proposal is shown to use ϵ-insensitivity to provide a more flexible version of lower and upper possibilistic regression models. The usefulness of our work is realized by modeling the rough pattern of a stock market index, and can be taken advantage of by conservative and aggressive traders

    Uncertainty-wise software anti-patterns detection: A possibilistic evolutionary machine learning approach

    Get PDF
    Context: Code smells (a.k.a. anti-patterns) are manifestations of poor design solutions that can deteriorate software maintainability and evolution. Research gap: Existing works did not take into account the issue of uncertain class labels, which is an important inherent characteristic of the smells detection problem. More precisely, two human experts may have different degrees of uncertainty about the smelliness of a particular software class not only for the smell detection task but also for the smell type identification one. Unluckily, existing approaches usually reject and/or ignore uncertain data that correspond to software classes (i.e. dataset instances) with uncertain labels. Throwing away and/or disregarding the uncertainty factor could considerably degrade the detection/identification process effectiveness. From a solution approach viewpoint, there is no work in the literature that proposed a method that is able to detect and/or identify code smells while preserving the uncertainty aspect. Objective: The main goal of our research work is to handle the uncertainty factor, issued from human experts, in detecting and/or identifying code smells by proposing an evolutionary approach that is able to deal with anti-patterns classification with uncertain labels. Method: We suggest Bi-ADIPOK, as an effective search-based tool that is capable to tackle the previously mentioned challenge for both detection and identification cases. The proposed method corresponds to an EA (Evolutionary Algorithm) that optimizes a set of detectors encoded as PK-NNs (Possibilistic K-nearest neighbors) based on a bi-level hierarchy, in which the upper level role consists on finding the optimal PK-NNs parameters, while the lower level one is to generate the PK-NNs. A newly fitness function has been proposed fitness function PomAURPC-OVA_dist (Possibilistic modified Area Under Recall Precision Curve One-Versus-All_distance, abbreviated PAURPC_d in this paper). Bi-ADIPOK is able to deal with label uncertainty using some concepts stemming from the Possibility Theory. Furthermore, the PomAURPC-OVA_dist is capable to process the uncertainty issue even with imbalanced data. We notice that Bi-ADIPOK is first built and then validated using a possibilistic base of smell examples that simulates and mimics the subjectivity of software engineers opinions. Results: The statistical analysis of the obtained results on a set of comparative experiments with respect to four relevant state-of-the-art methods shows the merits of our proposal. The obtained detection results demonstrate that, for the uncertain environment, the PomAURPC-OVA_dist of Bi-ADIPOK ranges between 0.902 and 0.932 and its IAC lies between 0.9108 and 0.9407, while for the certain environment, the PomAURPC-OVA_dist lies between 0.928 and 0.955 and the IAC ranges between 0.9477 and 0.9622. Similarly, the identification results, for the uncertain environment, indicate that the PomAURPC-OVA_dist of Bi-ADIPOK varies between 0.8576 and 0.9273 and its IAC is between 0.8693 and 0.9318. For the certain environment, the PomAURPC-OVA_dist lies between 0.8613 and 0.9351 and the IAC values are between 0.8672 and 0.9476. With uncertain data, Bi-ADIPOK can find 35% more code smells than the second best approach (i.e., BLOP). Furthermore, Bi-ADIPOK has succeeded to reduce the number of false alarms (i.e., misclassified smelly instances) by 12%. In addition, our proposed approach can identify 43% more smell types than BLOP and reduces the number of false alarms by 32%. The same results have been obtained for the certain environment, demonstrating Bi-ADIPOK's ability to deal with such environment
    • …
    corecore