Search CORE

241,819 research outputs found

Intentions and Creative Insights: a Reinforcement Learning Study of Creative Exploration in Problem-Solving

Author: Colin Thomas R.
Publication venue: 'University of Plymouth'
Publication date: 01/01/2020
Field of study

Insight is perhaps the cognitive phenomenon most closely associated with creativity. People engaged in problem-solving sometimes experience a sudden transformation: they see the problem in a radically different manner, and simultaneously feel with great certainty that they have found the right solution. The change of problem representation is called "restructuring", and the affective changes associated with sudden progress are called the "Aha!" experience. Together, restructuring and the "Aha!" experience characterize insight. Reinforcement Learning is both a theory of biological learning and a subfield of machine learning. In its psychological and neuroscientific guise, it is used to model habit formation, and, increasingly, executive function. In its artificial intelligence guise, it is currently the favored paradigm for modeling agents interacting with an environment. Reinforcement learning, I argue, can serve as a model of insight: its foundation in learning coincides with the role of experience in insight problem-solving; its use of an explicit "value" provides the basis for the "Aha!" experience; and finally, in a hierarchical form, it can achieve a sudden change of representation resembling restructuring. An experiment helps confirm some parallels between reinforcement learning and insight. It shows how transfer from prior tasks results in considerably accelerated learning, and how the value function increase resembles the sense of progress corresponding to the "Aha!"-moment. However, a model of insight on the basis of hierarchical reinforcement learning did not display the expected "insightful" behavior. A second model of insight is presented, in which temporal abstraction is based on self-prediction: by predicting its own future decisions, an agent adjusts its course of action on the basis of unexpected events. This kind of temporal abstraction, I argue, corresponds to what we call "intentions", and offers a promising model for biological insight. It explains the "Aha!" experience as resulting from a temporal difference error, whereas restructuring results from an adjustment of the agent's internal state on the basis of either new information or a stochastic interpretation of stimuli. The model is called the actor-critic-intention (ACI) architecture. Finally, the relationship between intentions, insight, and creativity is extensively discussed in light of these models: other works in the philosophical and scientific literature are related to, and sometimes illuminated by the ACI architecture

Ghent University Academic Bibliography

Plymouth Electronic Archive and Research Library

Incentive Mechanisms for Hierarchical Spectrum Markets

Author: Alpcan Tansu
Chorppath Anil Kumar
Iosifidis George
Koutsopoulos Iordanis
Publication venue
Publication date: 12/12/2011
Field of study

In this paper, we study spectrum allocation mechanisms in hierarchical multi-layer markets which are expected to proliferate in the near future based on the current spectrum policy reform proposals. We consider a setting where a state agency sells spectrum channels to Primary Operators (POs) who subsequently resell them to Secondary Operators (SOs) through auctions. We show that these hierarchical markets do not result in a socially efficient spectrum allocation which is aimed by the agency, due to lack of coordination among the entities in different layers and the inherently selfish revenue-maximizing strategy of POs. In order to reconcile these opposing objectives, we propose an incentive mechanism which aligns the strategy and the actions of the POs with the objective of the agency, and thus leads to system performance improvement in terms of social welfare. This pricing-based scheme constitutes a method for hierarchical market regulation. A basic component of the proposed incentive mechanism is a novel auction scheme which enables POs to allocate their spectrum by balancing their derived revenue and the welfare of the SOs.Comment: 9 page

arXiv.org e-Print Archive

University of Thessaly Institutional Repository

Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning

Author: Bouvrie Jake
Maggioni Mauro
Publication venue
Publication date: 05/12/2012
Field of study

Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of sub-problems at different scales is automatically determined. Coarsened MDPs are themselves independent, deterministic MDPs, and may be solved using existing algorithms. The multiscale representation delivered by this procedure decouples sub-tasks from each other and can lead to substantial improvements in convergence rates both locally within sub-problems and globally across sub-problems, yielding significant computational savings. A second fundamental aspect of this work is that these multiscale decompositions yield new transfer opportunities across different problems, where solutions of sub-tasks at different levels of the hierarchy may be amenable to transfer to new problems. Localized transfer of policies and potential operators at arbitrary scales is emphasized. Finally, we demonstrate compression and transfer in a collection of illustrative domains, including examples involving discrete and continuous statespaces.Comment: 86 pages, 15 figure

arXiv.org e-Print Archive

CiteSeerX

Cognitive Structures of Good and Poor Novice Problem Solvers in Physics

Author: Ferguson-Hessler Monica G.M.
Jong Ton de
Publication venue: American Psychological Association
Publication date: 01/01/1986
Field of study

The way knowledge is organized in memory is generally expected to relate to the degree of success in problem solving. In the present study, we investigated whether good novice problem solvers have their knowledge arranged around problem types to a greater extent than poor problem solvers have. In the subject of physics (electricity and magnetism), 12 problem types were distinguished according to their underlying physics principles. For each problem type, a set of elements of knowledge containing characteristics of the problem situation, declarative knowledge, and procedural knowledge was constructed. All of the resulting 65 elements were printed on cards, and first-year university students in physics ( N = 47) were asked to sort these cards into coherent piles shortly after they had taken an examination on electricity and magnetism. Essentially, good novice problem solvers sorted the cards according to problem types; the sorting by the poor problem solvers seemed to be determined to a greater extent by the surface characteristics of the elements. We concluded than an organization of knowledge around problem types might be highly conducive to good performance in problem solving by novice problem solvers

Crossref

University of Twente Research Information

Normalization and Other Topics in Multi-Objective Optimization

Author: Grodzevich Oleg
Romanko Oleksandr
Publication venue
Publication date: 01/01/2006
Field of study

Learning Representations in Model-Free Hierarchical Reinforcement Learning

Author: Noelle David C.
Rafati Jacob
Publication venue
Publication date: 12/04/2019
Field of study

Common approaches to Reinforcement Learning (RL) are seriously challenged by large-scale applications involving huge state spaces and sparse delayed reward feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address this scalability issue by learning action selection policies at multiple levels of temporal abstraction. Abstraction can be had by identifying a relatively small set of states that are likely to be useful as subgoals, in concert with the learning of corresponding skill policies to achieve those subgoals. Many approaches to subgoal discovery in HRL depend on the analysis of a model of the environment, but the need to learn such a model introduces its own problems of scale. Once subgoals are identified, skills may be learned through intrinsic motivation, introducing an internal reward signal marking subgoal attainment. In this paper, we present a novel model-free method for subgoal discovery using incremental unsupervised learning over a small memory of the most recent experiences (trajectories) of the agent. When combined with an intrinsic motivation learning mechanism, this method learns both subgoals and skills, based on experiences in the environment. Thus, we offer an original approach to HRL that does not require the acquisition of a model of the environment, suitable for large-scale applications. We demonstrate the efficiency of our method on two RL problems with sparse delayed feedback: a variant of the rooms environment and the first screen of the ATARI 2600 Montezuma's Revenge game

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Fusion of Head and Full-Body Detectors for Multi-Object Tracking

Author: Cremers Daniel
Henschel Roberto
Leal-Taixé Laura
Rosenhahn Bodo
Publication venue
Publication date: 24/04/2018
Field of study

In order to track all persons in a scene, the tracking-by-detection paradigm has proven to be a very effective approach. Yet, relying solely on a single detector is also a major limitation, as useful image information might be ignored. Consequently, this work demonstrates how to fuse two detectors into a tracking system. To obtain the trajectories, we propose to formulate tracking as a weighted graph labeling problem, resulting in a binary quadratic program. As such problems are NP-hard, the solution can only be approximated. Based on the Frank-Wolfe algorithm, we present a new solver that is crucial to handle such difficult problems. Evaluation on pedestrian tracking is provided for multiple scenarios, showing superior results over single detector tracking and standard QP-solvers. Finally, our tracker ranks 2nd on the MOT16 benchmark and 1st on the new MOT17 benchmark, outperforming over 90 trackers.Comment: 10 pages, 4 figures; Winner of the MOT17 challenge; CVPRW 201

arXiv.org e-Print Archive

Crossref