1,174 research outputs found
Does ChatGPT resemble humans in language use?
Large language models (LLMs) and LLM-driven chatbots such as ChatGPT have
shown remarkable capacities in comprehending and producing language. However,
their internal workings remain a black box in cognitive terms, and it is
unclear whether LLMs and chatbots can develop humanlike characteristics in
language use. Cognitive scientists have devised many experiments that probe,
and have made great progress in explaining, how people process language. We
subjected ChatGPT to 12 of these experiments, pre-registered and with 1,000
runs per experiment. In 10 of them, ChatGPT replicated the human pattern of
language use. It associated unfamiliar words with different meanings depending
on their forms, continued to access recently encountered meanings of ambiguous
words, reused recent sentence structures, reinterpreted implausible sentences
that were likely to have been corrupted by noise, glossed over errors, drew
reasonable inferences, associated causality with different discourse entities
according to verb semantics, and accessed different meanings and retrieved
different words depending on the identity of its interlocutor. However, unlike
humans, it did not prefer using shorter words to convey less informative
content and it did not use context to disambiguate syntactic ambiguities. We
discuss how these convergences and divergences may occur in the transformer
architecture. Overall, these experiments demonstrate that LLM-driven chatbots
like ChatGPT are capable of mimicking human language processing to a great
extent, and that they have the potential to provide insights into how people
learn and use language
Causally-Inspired Generalizable Deep Learning Methods under Distribution Shifts
Deep learning methods achieved remarkable success in various areas of artificial intelligence, due to their powerful distribution-matching capabilities. However, these successes rely heavily on the i.i.d assumption, i.e., the data distributions in the training and test datasets are the same. In this way, current deep learning methods typically exhibit poor generalization under distribution shift, performing poorly on test data with a distribution that differs from the training data. This significantly hinders the application of deep learning methods to real-world scenarios, as the distribution of test data is not always the same as the training distribution in our rapidly evolving world.
This thesis aims to discuss how to construct generalizable deep learning methods under distribution shifts. To achieve this, the thesis first models one prediction task as a structural causal model (SCM) which establishes the relationship between variables using directed acyclic graphs. In an SCM, some variables are easily changed across domains while others are not. However, deep learning methods often unintentionally mix invariant variables with easily changed variables, and thus deviate the learned model from the true one, resulting in the poor generalization ability under distribution shift.
To remedy this issue, we propose specific algorithms to model such an invariant part of the SCM with deep learning methods, and experimentally show it is beneficial for the trained model to generalize well into different distributions of the same task. Last, we further propose to identify and model the variant information in the new test distribution so that we can fully adapt the trained deep learning model accordingly.
We show the method can be extended for several practical applications, such as classification under label shift, image translation under semantics shift, robotics control in dynamics generalization and generalizing large language models into visual question-answer tasks
Probabilistic coherence, logical consistency, and Bayesian learning: Neural language models as epistemic agents
It is argued that suitably trained neural language models exhibit key properties of epistemic agency: they hold probabilistically coherent and logically consistent degrees of belief, which they can rationally revise in the face of novel evidence. To this purpose, we conduct computational experiments with rankers: T5 models [Raffel et al. 2020] that are pretrained on carefully designed synthetic corpora. Moreover, we introduce a procedure for eliciting a model’s degrees of belief, and define numerical metrics that measure the extent to which given degrees of belief violate (probabilistic, logical, and Bayesian) rationality constraints. While pretrained rankers are found to suffer from global inconsistency (in agreement with, e.g., [Jang et al. 2021]), we observe that subsequent self-training on auto-generated texts allows rankers to gradually obtain a probabilistically coherent belief system that is aligned with logical constraints. In addition, such self-training is found to have a pivotal role in rational evidential learning, too, for it seems to enable rankers to propagate a novel evidence item through their belief systems, successively re-adjusting individual degrees of belief. All this, we conclude, confirms the Rationality Hypothesis, i.e., the claim that suitable trained NLMs may exhibit advanced rational skills. We suggest that this hypothesis has empirical, yet also normative and conceptual ramifications far beyond the practical linguistic problems NLMs have originally been designed to solve
Recommended from our members
Inductive Bias and Modular Design for Sample-Efficient Neural Language Learning
Most of the world's languages suffer from the paucity of annotated data. This curbs the effectiveness of supervised learning, the most widespread approach to modelling language. Instead, an alternative paradigm could take inspiration from the propensity of children to acquire language from limited stimuli, in order to enable machines to learn any new language from a few examples. The abstract mechanisms underpinning this ability include 1) a set of in-born inductive biases and 2) the deep entrenchment of language in other perceptual and cognitive faculties, combined with the ability to transfer and recombine knowledge across these domains. The main contribution of my thesis is giving concrete form to both these intuitions.
Firstly, I argue that endowing a neural network with the correct inductive biases is equivalent to constructing a prior distribution over its weights and its architecture (including connectivity patterns and non-linear activations). This prior is inferred by "reverse-engineering" a representative set of observed languages and harnessing typological features documented by linguists. Thus, I provide a unified framework for cross-lingual transfer and architecture search by recasting them as hierarchical Bayesian neural models.
Secondly, the skills relevant to different language varieties and different tasks in natural language processing are deeply intertwined. Hence, the neural weights modelling the data for each of their combinations can be imagined as lying in a structured space. I introduce a Bayesian generative model of this space, which is factorised into latent variables representing each language and each task. By virtue of this modular design, predictions can generalise to unseen combinations by extrapolating from the data of observed combinations.
The proposed models are empirically validated on a spectrum of language-related tasks (character-level language modelling, part-of-speech tagging, named entity recognition, and common-sense reasoning) and a typologically diverse sample of about a hundred languages. Compared to a series of competitive baselines, they achieve better performances in new languages in zero-shot and few-shot learning settings. In general, they hold promise to extend state-of-the-art language technology to under-resourced languages by means of sample efficiency and robustness to the cross-lingual variation.ERC (Consolidator Grant 648909) Lexical
Google Research Faculty Award 201
A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks
Autonomous agents must learn to collaborate. It is not scalable to develop a
new centralized agent every time a task's difficulty outpaces a single agent's
abilities. While multi-agent collaboration research has flourished in
gridworld-like environments, relatively little work has considered visually
rich domains. Addressing this, we introduce the novel task FurnMove in which
agents work together to move a piece of furniture through a living room to a
goal. Unlike existing tasks, FurnMove requires agents to coordinate at every
timestep. We identify two challenges when training agents to complete FurnMove:
existing decentralized action sampling procedures do not permit expressive
joint action policies and, in tasks requiring close coordination, the number of
failed actions dominates successful actions. To confront these challenges we
introduce SYNC-policies (synchronize your actions coherently) and CORDIAL
(coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58%
completion rate on FurnMove, an impressive absolute gain of 25 percentage
points over competitive decentralized baselines. Our dataset, code, and
pretrained models are available at https://unnat.github.io/cordial-sync .Comment: Accepted to ECCV 2020 (spotlight); Project page:
https://unnat.github.io/cordial-syn
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field
that aims to design computer agents with intelligent capabilities such as
understanding, reasoning, and learning through integrating multiple
communicative modalities, including linguistic, acoustic, visual, tactile, and
physiological messages. With the recent interest in video understanding,
embodied autonomous agents, text-to-image generation, and multisensor fusion in
application domains such as healthcare and robotics, multimodal machine
learning has brought unique computational and theoretical challenges to the
machine learning community given the heterogeneity of data sources and the
interconnections often found between modalities. However, the breadth of
progress in multimodal research has made it difficult to identify the common
themes and open questions in the field. By synthesizing a broad range of
application domains and theoretical frameworks from both historical and recent
perspectives, this paper is designed to provide an overview of the
computational and theoretical foundations of multimodal machine learning. We
start by defining two key principles of modality heterogeneity and
interconnections that have driven subsequent innovations, and propose a
taxonomy of 6 core technical challenges: representation, alignment, reasoning,
generation, transference, and quantification covering historical and recent
trends. Recent technical achievements will be presented through the lens of
this taxonomy, allowing researchers to understand the similarities and
differences across new approaches. We end by motivating several open problems
for future research as identified by our taxonomy
- …