183 research outputs found

    Concepts in a Probabilistic Language of Thought

    Get PDF
    Note: The book chapter is reprinted courtesy of The MIT Press, from the forthcoming edited collection “The Conceptual Mind: New Directions in the Study of Concepts” edited by Eric Margolis and Stephen Laurence, print date Spring 2015.Knowledge organizes our understanding of the world, determining what we expect given what we have already seen. Our predictive representations have two key properties: they are productive, and they are graded. Productive generalization is possible because our knowledge decomposes into concepts—elements of knowledge that are combined and recombined to describe particular situations. Gradedness is the observable effect of accounting for uncertainty—our knowledge encodes degrees of belief that lead to graded probabilistic predictions. To put this a different way, concepts form a combinatorial system that enables description of many different situations; each such situation specifies a distribution over what we expect to see in the world, given what we have seen. We may think of this system as a probabilistic language of thought (PLoT) in which representations are built from language-like composition of concepts and the content of those representations is a probability distribution on world states. The purpose of this chapter is to formalize these ideas in computational terms, to illustrate key properties of the PLoT approach with a concrete example, and to draw connections with other views of conceptual structure.This work was supported by ONR awards N00014-09-1-0124 and N00014-13- 1-0788, by a John S. McDonnell Foundation Scholar Award, and by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216

    Causal Responsibility and Robust Causation

    Get PDF
    How do people judge the degree of causal responsibility that an agent has for the outcomes of her actions? We show that a relatively unexplored factor – the robustness (or stability) of the causal chain linking the agent’s action and the outcome – influences judgments of causal responsibility of the agent. In three experiments, we vary robustness by manipulating the number of background circumstances under which the action causes the effect, and find that causal responsibility judgments increase with robustness. In the first experiment, the robustness manipulation also raises the probability of the effect given the action. Experiments 2 and 3 control for probability-raising, and show that robustness still affects judgments of causal responsibility. In particular, Experiment 3 introduces an Ellsberg type of scenario to manipulate robustness, while keeping the conditional probability and the skill deployed in the action fixed. Experiment 4, replicates the results of Experiment 3, while contrasting between judgments of causal strength and of causal responsibility. The results show that in all cases, the perceived degree of responsibility (but not of causal strength) increases with the robustness of the action-outcome causal chain

    Understanding Social Reasoning in Language Models with Language Models

    Full text link
    As Large Language Models (LLMs) become increasingly integrated into our everyday lives, understanding their ability to comprehend human mental states becomes critical for ensuring effective interactions. However, despite the recent attempts to assess the Theory-of-Mind (ToM) reasoning capabilities of LLMs, the degree to which these models can align with human ToM remains a nuanced topic of exploration. This is primarily due to two distinct challenges: (1) the presence of inconsistent results from previous evaluations, and (2) concerns surrounding the validity of existing evaluation methodologies. To address these challenges, we present a novel framework for procedurally generating evaluations with LLMs by populating causal templates. Using our framework, we create a new social reasoning benchmark (BigToM) for LLMs which consists of 25 controls and 5,000 model-written evaluations. We find that human participants rate the quality of our benchmark higher than previous crowd-sourced evaluations and comparable to expert-written evaluations. Using BigToM, we evaluate the social reasoning capabilities of a variety of LLMs and compare model performances with human performance. Our results suggest that GPT4 has ToM capabilities that mirror human inference patterns, though less reliable, while other LLMs struggle

    What's fair? How children assign reward to members of teams with differing causal structures

    Get PDF
    How do children reward individual members of a team that has just won or lost a game? We know that from pre-school age, children consider agents’ performance when allocating reward. Here we assess whether children can go further and appreciate performance in context: The same pattern of performance can contribute to a team outcome in different ways, depending on the underlying rule framework. Two experiments, with three age groups (4/5-year-olds, 6/7-year-olds, and adults), varied performance of team members, with the same performance patterns considered under three different game rules for winning or losing. These three rules created distinct underlying causal structures (additive, conjunctive, disjunctive), for how individual performance affected the overall team outcome. Even the youngest children differentiated between different game rules in their reward allocations. Rather than only rewarding individual performance, or whether the team won/lost, children were sensitive to the team structure and how players’ performance contributed to the win/loss under each of the three game rules. Not only do young children consider it fair to allocate resources based on merit, but they are also sensitive to the causal structure of the situation which dictates how individual contributions combine to determine the team outcome

    Social Contract AI: Aligning AI Assistants with Implicit Group Norms

    Full text link
    We explore the idea of aligning an AI assistant by inverting a model of users' (unknown) preferences from observed interactions. To validate our proposal, we run proof-of-concept simulations in the economic ultimatum game, formalizing user preferences as policies that guide the actions of simulated players. We find that the AI assistant accurately aligns its behavior to match standard policies from the economic literature (e.g., selfish, altruistic). However, the assistant's learned policies lack robustness and exhibit limited generalization in an out-of-distribution setting when confronted with a currency (e.g., grams of medicine) that was not included in the assistant's training distribution. Additionally, we find that when there is inconsistency in the relationship between language use and an unknown policy (e.g., an altruistic policy combined with rude language), the assistant's learning of the policy is slowed. Overall, our preliminary results suggest that developing simulation frameworks in which AI assistants need to infer preferences from diverse users can provide a valuable approach for studying practical alignment questions.Comment: SoLaR NeurIPS 2023 Workshop (https://solar-neurips.github.io/

    Euphrasia Eye Drops in Preterm Neonates With Ocular Discharge: A Randomized Double-Blind Placebo-Controlled Trial

    Full text link
    Aim: To investigate whether the early administration of Euphrasia eye dropsÂź in preterm neonates presenting with ocular discharge fosters the resolution of the ocular discharge and reduces the need for topical antibiotic therapy, as compared to placebo. Methods: We conducted a randomized double-blind placebo-controlled trial at the University Children's Hospital Bern, Switzerland. Preterm neonates with white, yellow, or green ocular discharge were included. Infants were randomly assigned (1:1) to the Euphrasia arm (Euphrasia eye dropsÂź, Weleda AG, Arlesheim) or the placebo arm (NaCl 0.9%). Euphrasia or placebo was administrated at a dose of one drop in each eye four times a day over a period of 96 h. The primary outcome was the treatment success, defined as no ocular discharge at 96 h and no use of topical antibiotic therapy during the 96-h intervention. Results: A total of 114 neonates were screened and 84 were randomized. Among neonates in the Euphrasia arm, 22 (55.0%) achieved our primary outcome compared to 21 (51.2%) in the placebo arm (p = 0.85). In the Euphrasia arm, time to resolution of reddening tended to fall within the shorter bracket of 24 to 48 h (24 (92.3%) vs. 12 (80.0%) in the placebo arm, p = 0.34) and relapse or first signs of reddening during the 96-h intervention tended to be lower [3 (7.9%) eyes vs. 8 (18.2%) eyes in the placebo arm, p = 0.17]. Tearing at 96 h tended to be lower in the Euphrasia arm [5 (12.8%) eyes in the Euphrasia arm vs. 12 (27.3%) eyes in the placebo arm, p = 0.10]. Discussion: Euphrasia did not significantly improve treatment success, defined as no ocular discharge at 96 h and no use of topical antibiotic therapy during the 96-h intervention. However, results suggest that Euphrasia may be of benefit for symptoms such as reddening and tearing, and thus improve the comfort of patients. Trial Registration: The trial is registered at the US National Institutes of Health (ClinicalTrials.gov) NCT04122300 and at the portal for human research in Switzerland SNCTP000003490. Keywords: Euphrasia drops; complementary medicine; congenital nasolacrimal duct obstruction; ocular discharge; preterm neonate

    An IRT Analysis of Motive Questionnaires: The Unified Motive Scales

    Get PDF
    Multiple inventories claiming to assess the same explicit motive (achievement, power, or affiliation) show only mediocre convergent validity. In three studies (N = 1685) the structure, nomological net, and content coverage of multiple existing motive scales was investigated with exploratory factor analyses. The analyses revealed four approach factors (achievement, power, affiliation, and intimacy) and a general avoidance factor with a facet structure. New scales (the Unified Motive Scales; UMS) were developed using IRT, reflecting these underlying dimensions. In comparison to existing questionnaires, the UMS have the highest measurement precision and provide short (6-item) and ultra-short (3-item) scales. In a fourth study (N = 96), the UMS demonstrated incremental validity over existing motive scales with respect to several outcome criteria

    The Search for Invariance: Repeated Positive Testing Serves the Goals of Causal Learning

    Get PDF
    Positive testing is characteristic of exploratory behavior, yet it seems to be at odds with the aim of information seeking. After all, repeated demonstrations of one’s current hypothesis often produce the same evidence and fail to distinguish it from potential alternatives. Research on the development of scientific reasoning and adult rule learning have both documented and attempted to explain this behavior. The current chapter reviews this prior work and introduces a novel theoretical account—the Search for Invariance (SI) hypothesis—which suggests that producing multiple positive examples serves the goals of causal learning. This hypothesis draws on the interventionist framework of causal reasoning, which suggests that causal learners are concerned with the invariance of candidate hypotheses. In a probabilistic and interdependent causal world, our primary goal is to determine whether, and in what contexts, our causal hypotheses provide accurate foundations for inference and intervention—not to disconfirm their alternatives. By recognizing the central role of invariance in causal learning, the phenomenon of positive testing may be reinterpreted as a rational information-seeking strategy
    • 

    corecore