964 research outputs found

    Joint Reasoning for Multi-Faceted Commonsense Knowledge

    No full text
    Commonsense knowledge (CSK) supports a variety of AI applications, from visual understanding to chatbots. Prior works on acquiring CSK, such as ConceptNet, have compiled statements that associate concepts, like everyday objects or activities, with properties that hold for most or some instances of the concept. Each concept is treated in isolation from other concepts, and the only quantitative measure (or ranking) of properties is a confidence score that the statement is valid. This paper aims to overcome these limitations by introducing a multi-faceted model of CSK statements and methods for joint reasoning over sets of inter-related statements. Our model captures four different dimensions of CSK statements: plausibility, typicality, remarkability and salience, with scoring and ranking along each dimension. For example, hyenas drinking water is typical but not salient, whereas hyenas eating carcasses is salient. For reasoning and ranking, we develop a method with soft constraints, to couple the inference over concepts that are related in in a taxonomic hierarchy. The reasoning is cast into an integer linear programming (ILP), and we leverage the theory of reduction costs of a relaxed LP to compute informative rankings. This methodology is applied to several large CSK collections. Our evaluation shows that we can consolidate these inputs into much cleaner and more expressive knowledge. Results are available at https://dice.mpi-inf.mpg.de

    SHAPNN: Shapley Value Regularized Tabular Neural Network

    Full text link
    We present SHAPNN, a novel deep tabular data modeling architecture designed for supervised learning. Our approach leverages Shapley values, a well-established technique for explaining black-box models. Our neural network is trained using standard backward propagation optimization methods, and is regularized with realtime estimated Shapley values. Our method offers several advantages, including the ability to provide valid explanations with no computational overhead for data instances and datasets. Additionally, prediction with explanation serves as a regularizer, which improves the model's performance. Moreover, the regularized prediction enhances the model's capability for continual learning. We evaluate our method on various publicly available datasets and compare it with state-of-the-art deep neural network models, demonstrating the superior performance of SHAPNN in terms of AUROC, transparency, as well as robustness to streaming data.Comment: 9 pages, 8 figure

    A Survey of Neural Trees

    Full text link
    Neural networks (NNs) and decision trees (DTs) are both popular models of machine learning, yet coming with mutually exclusive advantages and limitations. To bring the best of the two worlds, a variety of approaches are proposed to integrate NNs and DTs explicitly or implicitly. In this survey, these approaches are organized in a school which we term as neural trees (NTs). This survey aims to present a comprehensive review of NTs and attempts to identify how they enhance the model interpretability. We first propose a thorough taxonomy of NTs that expresses the gradual integration and co-evolution of NNs and DTs. Afterward, we analyze NTs in terms of their interpretability and performance, and suggest possible solutions to the remaining challenges. Finally, this survey concludes with a discussion about other considerations like conditional computation and promising directions towards this field. A list of papers reviewed in this survey, along with their corresponding codes, is available at: https://github.com/zju-vipa/awesome-neural-treesComment: 35 pages, 7 figures and 1 tabl

    A Bayesian framework for concept learning

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 1999.Includes bibliographical references (p. 297-314).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Human concept learning presents a version of the classic problem of induction, which is made particularly difficult by the combination of two requirements: the need to learn from a rich (i.e. nested and overlapping) vocabulary of possible concepts and the need to be able to generalize concepts reasonably from only a few positive examples. I begin this thesis by considering a simple number concept game as a concrete illustration of this ability. On this task, human learners can with reasonable confidence lock in on one out of a billion billion billion logically possible concepts, after seeing only four positive examples of the concept, and can generalize informatively after seeing just a single example. Neither of the two classic approaches to inductive inference hypothesis testing in a constrained space of possible rules and computing similarity to the observed examples can provide a complete picture of how people generalize concepts in even this simple setting. This thesis proposes a new computational framework for understanding how people learn concepts from examples, based on the principles of Bayesian inference. By imposing the constraints of a probabilistic model of the learning situation, the Bayesian learner can draw out much more information about a concept's extension from a given set of observed examples than either rule-based or similarity-based approaches do, and can use this information in a rational way to infer the probability that any new object is also an instance of the concept. There are three components of the Bayesian framework: a prior probability distribution over a hypothesis space of possible concepts; a likelihood function, which scores each hypothesis according to its probability of generating the observed examples; and the principle of hypothesis averaging, under which the learner computes the probability of generalizing a concept to new objects by averaging the predictions of all hypotheses weighted by their posterior probability (proportional to the product of their priors and likelihoods). The likelihood, under the assumption of randomly sampled positive examples, embodies the size principle for scoring hypotheses: smaller consistent hypotheses are more likely than larger hypotheses, and they become exponentially more likely as the number of observed examples increases. The principle of hypothesis averaging allows the Bayesian framework to accommodate both rule-like and similarity-like generalization behavior, depending on how peaked the posterior probability is. Together, the size principle plus hypothesis averaging predict a convergence from similarity-like generalization (due to a broad posterior distribution) after very few examples are observed to rule-like generalization (due to a sharply peaked posterior distribution) after sufficiently many examples have been observed. The main contributions of this thesis are as follows. First and foremost, I show how it is possible for people to learn and generalize concepts from just one or a few positive examples (Chapter 2). Building on that understanding, I then present a series of case studies of simple concept learning situations where the Bayesian framework yields both qualitative and quantitative insights into the real behavior of human learners (Chapters 3-5). These cases each focus on a different learning domain. Chapter 3 looks at generalization in continuous feature spaces, a typical representation of objects in psychology and machine learning with the virtues of being analytically tractable and empirically accessible, but the downside of being highly abstract and artificial. Chapter 4 moves to the more natural domain of learning words for categories of objects and shows the relevance of the same phenomena and explanatory principles introduced in the more abstract setting of Chapters 1-3 for real-world learning tasks like this one. In each of these domains, both similarity-like and rule-like generalization emerge as special cases of the Bayesian framework in the limits of very few or very many examples, respectively. However, the transition from similarity to rules occurs much faster in the word learning domain than in the continuous feature space domain. I propose a Bayesian explanation of this difference in learning curves that places crucial importance on the density or sparsity of overlapping hypotheses in the learner's hypothesis space. To test this proposal, a third case study (Chapter 5) returns to the domain of number concepts, in which human learners possess a more complex body of prior knowledge that leads to a hypothesis space with both sparse and densely overlapping components. Here, the Bayesian theory predicts and human learners produce either rule-based or similarity-based generalization from a few examples, depending on the precise examples observed. I also discusses how several classic reasoning heuristics may be used to approximate the much more elaborate computations of Bayesian inference that this domain requires. In each of these case studies, I confront some of the classic questions of concept learning and induction: Is the acquisition of concepts driven mainly by pre-existing knowledge or the statistical force of our observations? Is generalization based primarily on abstract rules or similarity to exemplars? I argue that in almost all instances, the only reasonable answer to such questions is, Both. More importantly, I show how the Bayesian framework allows us to answer much more penetrating versions of these questions: How does prior knowledge interact with the observed examples to guide generalization? Why does generalization appear rule-based in some cases and similarity-based in others? Finally, Chapter 6 summarizes the major contributions in more detailed form and discusses how this work ts into the larger picture of contemporary research on human learning, thinking, and reasoning.by Joshua B. Tenenbaum.Ph.D
    • …
    corecore