22,579 research outputs found
Consciousness is learning: predictive processing systems that learn by binding may perceive themselves as conscious
Machine learning algorithms have achieved superhuman performance in specific
complex domains. Yet learning online from few examples and efficiently
generalizing across domains remains elusive. In humans such learning proceeds
via declarative memory formation and is closely associated with consciousness.
Predictive processing has been advanced as a principled Bayesian inference
framework for understanding the cortex as implementing deep generative
perceptual models for both sensory data and action control. However, predictive
processing offers little direct insight into fast compositional learning or the
mystery of consciousness. Here we propose that through implementing online
learning by hierarchical binding of unpredicted inferences, a predictive
processing system may flexibly generalize in novel situations by forming
working memories for perceptions and actions from single examples, which can
become short- and long-term declarative memories retrievable by associative
recall. We argue that the contents of such working memories are unified yet
differentiated, can be maintained by selective attention and are consistent
with observations of masking, postdictive perceptual integration, and other
paradigm cases of consciousness research. We describe how the brain could have
evolved to use perceptual value prediction for reinforcement learning of
complex action policies simultaneously implementing multiple survival and
reproduction strategies. 'Conscious experience' is how such a learning system
perceptually represents its own functioning, suggesting an answer to the meta
problem of consciousness. Our proposal naturally unifies feature binding,
recurrent processing, and predictive processing with global workspace, and, to
a lesser extent, the higher order theories of consciousness.Comment: This version adds 5 figures (new) and only modifies the text to
reference the figure
Graph-based Reinforcement Learning meets Mixed Integer Programs: An application to 3D robot assembly discovery
Robot assembly discovery is a challenging problem that lives at the
intersection of resource allocation and motion planning. The goal is to combine
a predefined set of objects to form something new while considering task
execution with the robot-in-the-loop. In this work, we tackle the problem of
building arbitrary, predefined target structures entirely from scratch using a
set of Tetris-like building blocks and a robotic manipulator. Our novel
hierarchical approach aims at efficiently decomposing the overall task into
three feasible levels that benefit mutually from each other. On the high level,
we run a classical mixed-integer program for global optimization of block-type
selection and the blocks' final poses to recreate the desired shape. Its output
is then exploited to efficiently guide the exploration of an underlying
reinforcement learning (RL) policy. This RL policy draws its generalization
properties from a flexible graph-based representation that is learned through
Q-learning and can be refined with search. Moreover, it accounts for the
necessary conditions of structural stability and robotic feasibility that
cannot be effectively reflected in the previous layer. Lastly, a grasp and
motion planner transforms the desired assembly commands into robot joint
movements. We demonstrate our proposed method's performance on a set of
competitive simulated RAD environments, showcase real-world transfer, and
report performance and robustness gains compared to an unstructured end-to-end
approach. Videos are available at https://sites.google.com/view/rl-meets-milp
Towards practical reinforcement learning for tokamak magnetic control
Reinforcement learning (RL) has shown promising results for real-time control
systems, including the domain of plasma magnetic control. However, there are
still significant drawbacks compared to traditional feedback control approaches
for magnetic confinement. In this work, we address key drawbacks of the RL
method; achieving higher control accuracy for desired plasma properties,
reducing the steady-state error, and decreasing the required time to learn new
tasks. We build on top of \cite{degrave2022magnetic}, and present algorithmic
improvements to the agent architecture and training procedure. We present
simulation results that show up to 65\% improvement in shape accuracy, achieve
substantial reduction in the long-term bias of the plasma current, and
additionally reduce the training time required to learn new tasks by a factor
of 3 or more. We present new experiments using the upgraded RL-based
controllers on the TCV tokamak, which validate the simulation results achieved,
and point the way towards routinely achieving accurate discharges using the RL
approach
- …