152 research outputs found
From task structures to world models: What do LLMs know?
In what sense does a large language model have knowledge? The answer to this
question extends beyond the capabilities of a particular AI system, and
challenges our assumptions about the nature of knowledge and intelligence. We
answer by granting LLMs "instrumental knowledge"; knowledge defined by a
certain set of abilities. We then ask how such knowledge is related to the more
ordinary, "worldly" knowledge exhibited by human agents, and explore this in
terms of the degree to which instrumental knowledge can be said to incorporate
the structured world models of cognitive science. We discuss ways LLMs could
recover degrees of worldly knowledge, and suggest such recovery will be
governed by an implicit, resource-rational tradeoff between world models and
task demands
Self-Supervised Intrinsic Image Decomposition
Intrinsic decomposition from a single image is a highly challenging task, due
to its inherent ambiguity and the scarcity of training data. In contrast to
traditional fully supervised learning approaches, in this paper we propose
learning intrinsic image decomposition by explaining the input image. Our
model, the Rendered Intrinsics Network (RIN), joins together an image
decomposition pipeline, which predicts reflectance, shape, and lighting
conditions given a single image, with a recombination function, a learned
shading model used to recompose the original input based off of intrinsic image
predictions. Our network can then use unsupervised reconstruction error as an
additional signal to improve its intermediate representations. This allows
large-scale unlabeled data to be useful during training, and also enables
transferring learned knowledge to images of unseen object categories, lighting
conditions, and shapes. Extensive experiments demonstrate that our method
performs well on both intrinsic image decomposition and knowledge transfer.Comment: NIPS 2017 camera-ready version, project page:
http://rin.csail.mit.edu
Differentially Private Accelerated Optimization Algorithms
We present two classes of differentially private optimization algorithms
derived from the well-known accelerated first-order methods. The first
algorithm is inspired by Polyak's heavy ball method and employs a smoothing
approach to decrease the accumulated noise on the gradient steps required for
differential privacy. The second class of algorithms are based on Nesterov's
accelerated gradient method and its recent multi-stage variant. We propose a
noise dividing mechanism for the iterations of Nesterov's method in order to
improve the error behavior of the algorithm. The convergence rate analyses are
provided for both the heavy ball and the Nesterov's accelerated gradient method
with the help of the dynamical system analysis techniques. Finally, we conclude
with our numerical experiments showing that the presented algorithms have
advantages over the well-known differentially private algorithms.Comment: 28 pages, 4 figure
Modeling human intuitions about liquid flow with particle-based simulation
Humans can easily describe, imagine, and, crucially, predict a wide variety
of behaviors of liquids--splashing, squirting, gushing, sloshing, soaking,
dripping, draining, trickling, pooling, and pouring--despite tremendous
variability in their material and dynamical properties. Here we propose and
test a computational model of how people perceive and predict these liquid
dynamics, based on coarse approximate simulations of fluids as collections of
interacting particles. Our model is analogous to a "game engine in the head",
drawing on techniques for interactive simulations (as in video games) that
optimize for efficiency and natural appearance rather than physical accuracy.
In two behavioral experiments, we found that the model accurately captured
people's predictions about how liquids flow among complex solid obstacles, and
was significantly better than two alternatives based on simple heuristics and
deep neural networks. Our model was also able to explain how people's
predictions varied as a function of the liquids' properties (e.g., viscosity
and stickiness). Together, the model and empirical results extend the recent
proposal that human physical scene understanding for the dynamics of rigid,
solid objects can be supported by approximate probabilistic simulation, to the
more complex and unexplored domain of fluid dynamics.Comment: Under review at PLOS Computational Biolog
Nonlinear Processing with Linear Optics
Deep neural networks have achieved remarkable breakthroughs by leveraging
multiple layers of data processing to extract hidden representations, albeit at
the cost of large electronic computing power. To enhance energy efficiency and
speed, the optical implementation of neural networks aims to harness the
advantages of optical bandwidth and the energy efficiency of optical
interconnections. In the absence of low-power optical nonlinearities, the
challenge in the implementation of multilayer optical networks lies in
realizing multiple optical layers without resorting to electronic components.
In this study, we present a novel framework that uses multiple scattering that
is capable of synthesizing programmable linear and nonlinear transformations
concurrently at low optical power by leveraging the nonlinear relationship
between the scattering potential, represented by data, and the scattered field.
Theoretical and experimental investigations show that repeating the data by
multiple scattering enables non-linear optical computing at low power
continuous wave light.Comment: 20 pages, 9 figures and 1 tabl
Evaluating Spatial Understanding of Large Language Models
Large language models (LLMs) show remarkable capabilities across a variety of
tasks. Despite the models only seeing text in training, several recent studies
suggest that LLM representations implicitly capture aspects of the underlying
grounded concepts. Here, we explore LLM representations of a particularly
salient kind of grounded knowledge -- spatial relationships. We design
natural-language navigation tasks and evaluate the ability of LLMs, in
particular GPT-3.5-turbo, GPT-4, and Llama2 series models, to represent and
reason about spatial structures, and compare these abilities to human
performance on the same tasks. These tasks reveal substantial variability in
LLM performance across different spatial structures, including square,
hexagonal, and triangular grids, rings, and trees. We also discover that,
similar to humans, LLMs utilize object names as landmarks for maintaining
spatial maps. Finally, in extensive error analysis, we find that LLMs' mistakes
reflect both spatial and non-spatial factors. These findings suggest that LLMs
appear to capture certain aspects of spatial structure implicitly, but room for
improvement remains
- …