22 research outputs found

    Statistical reasoning with set-valued information : Ontic vs. epistemic views

    Get PDF
    International audienceIn information processing tasks, sets may have a conjunctive or a disjunctive reading. In the conjunctive reading, a set represents an object of interest and its elements are subparts of the object, forming a composite description. In the disjunctive reading, a set contains mutually exclusive elements and refers to the representation of incomplete knowledge. It does not model an actual object or quantity, but partial information about an underlying object or a precise quantity. This distinction between what we call ontic vs. epistemic sets remains valid for fuzzy sets, whose membership functions, in the disjunctive reading are possibility distributions, over deterministic or random values. This paper examines the impact of this distinction in statistics. We show its importance because there is a risk of misusing basic notions and tools, such as conditioning, distance between sets, variance, regression, etc. when data are set-valued. We discuss several examples where the ontic and epistemic points of view yield different approaches to these concepts

    Why Fuzzy Decision Trees are Good Rankers

    No full text

    Preface

    No full text

    Aleatoric and epistemic uncertainty in machine learning : an introduction to concepts and methods

    No full text
    The notion of uncertainty is of major importance in machine learning and constitutes a key element of machine learning methodology. In line with the statistical tradition, uncertainty has long been perceived as almost synonymous with standard probability and probabilistic predictions. Yet, due to the steadily increasing relevance of machine learning for practical applications and related issues such as safety requirements, new problems and challenges have recently been identified by machine learning scholars, and these problems may call for new methodological developments. In particular, this includes the importance of distinguishing between (at least) two different types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we provide an introduction to the topic of uncertainty in machine learning as well as an overview of attempts so far at handling uncertainty in general and formalizing this distinction in particular

    Bipartite Ranking through Minimization of Univariate Loss

    No full text
    Minimization of the rank loss or, equivalently, maximization of the AUC in bipartite ranking calls for minimizing the number of disagreements between pairs of instances. Since the complexity of this problem is inherently quadratic in the number of training examples, it is tempting to ask how much is actually lost by minimizing a simple univariate loss function, as done by standard classification methods, as a surrogate. In this paper, we first note that minimization of 0/1 loss is not an option, as it may yield an arbitrarily high rank loss. We show, however, that better results can be achieved by means of a weighted (cost-sensitive) version of 0/1 loss. Yet, the real gain is obtained through margin-based loss functions, for which we are able to derive proper bounds, not only for rank risk but, more importantly, also for rank regret. The paper is completed with an experimental study in which we address specific questions raised by our theoretical analysis

    Bipartite Ranking through Minimization of Univariate Loss

    No full text
    Minimization of the rank loss or, equivalently, maximization of the AUC in bipartite ranking calls for minimizing the number of disagreements between pairs of instances. Since the complexity of this problem is inherently quadratic in the number of training examples, it is tempting to ask how much is actually lost by minimizing a simple univariate loss function, as done by standard classification methods, as a surrogate. In this paper, we first note that minimization of 0/1 loss is not an option, as it may yield an arbitrarily high rank loss. We show, however, that better results can be achieved by means of a weighted (cost-sensitive) version of 0/1 loss. Yet, the real gain is obtained through margin-based loss functions, for which we are able to derive proper bounds, not only for rank risk but, more importantly, also for rank regret. The paper is completed with an experimental study in which we address specific questions raised by our theoretical analysis

    Correlation-based embedding of pairwise score data

    No full text
    Strickert M, Bunte K, Schleif F-M, Huellermeier E. Correlation-based embedding of pairwise score data. Neurocomputing. 2014;141:97-109.Neighbor-preserving embedding of relational data in low-dimensional Euclidean spaces is studied. Contrary to variants of stochastic neighbor embedding that minimize divergence measures between estimated neighborhood probability distributions, the proposed approach fits configurations in the output space by maximizing correlation with potentially asymmetric or missing relationships in the input space. In addition to the linear Pearson correlation measure, the use of soft formulations of Spearman and Kendall rank correlation is investigated for optimizing embeddings like 2D point cloud configurations. We illustrate how this scale-invariant correlation-based framework of multidimensional scaling (cbMDS) helps going beyond distance-preserving scaling approaches and how the embedding results are characteristically different from recent neighborhood embedding techniques. (C) 2014 Elsevier B.V. All rights reserved
    corecore