Search CORE

460 research outputs found

New prioritized value iteration for Markov decision processes

Author: A Moore
Alberto Reyes-Ballesteros
C Boutilier
D Blackwell
D Wingate
DP Bertsekas
EA Hansen
Edgar Alvarado-Mendez
Eva Onaindia
G Shani
HC Tijms
I Chang
J. Gabriel Aviña-Cervantes
Jose Ruiz-Pinales
K Hinderer
M Sniedovich
M Sniedovich
Ma. de Guadalupe Garcia-Hernandez
ML Puterman
ML Puterman
RE Bellman
RE Bellman
RJ Vanderbei
S Russell
Sergio Ledesma-Orozco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2012
Field of study

The problem of solving large Markov decision processes accurately and quickly is challenging. Since the computational effort incurred is considerable, current research focuses on finding superior acceleration techniques. For instance, the convergence properties of current solution methods depend, to a great extent, on the order of backup operations. On one hand, algorithms such as topological sorting are able to find good orderings but their overhead is usually high. On the other hand, shortest path methods, such as Dijkstra's algorithm which is based on priority queues, have been applied successfully to the solution of deterministic shortest-path Markov decision processes. Here, we propose an improved value iteration algorithm based on Dijkstra's algorithm for solving shortest path Markov decision processes. The experimental results on a stochastic shortest-path problem show the feasibility of our approach. © Springer Science+Business Media B.V. 2011.García Hernández, MDG.; Ruiz Pinales, J.; Onaindia De La Rivaherrera, E.; Aviña Cervantes, JG.; Ledesma Orozco, S.; Alvarado Mendez, E.; Reyes Ballesteros, A. (2012). New prioritized value iteration for Markov decision processes. Artificial Intelligence Review. 37(2):157-167. doi:10.1007/s10462-011-9224-zS157167372Agrawal S, Roth D (2002) Learning a sparse representation for object detection. In: Proceedings of the 7th European conference on computer vision. Copenhagen, Denmark, pp 1–15Bellman RE (1954) The theory of dynamic programming. Bull Amer Math Soc 60: 503–516Bellman RE (1957) Dynamic programming. Princeton University Press, New JerseyBertsekas DP (1995) Dynamic programming and optimal control. Athena Scientific, MassachusettsBhuma K, Goldsmith J (2003) Bidirectional LAO* algorithm. In: Proceedings of indian international conferences on artificial intelligence. p 980–992Blackwell D (1965) Discounted dynamic programming. Ann Math Stat 36: 226–235Bonet B, Geffner H (2003a) Faster heuristic search algorithms for planning with uncertainty and full feedback. In: Proceedings of the 18th international joint conference on artificial intelligence. Morgan Kaufmann, Acapulco, México, pp 1233–1238Bonet B, Geffner H (2003b) Labeled RTDP: improving the convergence of real-time dynamic programming. In: Proceedings of the international conference on automated planning and scheduling. Trento, Italy, pp 12–21Bonet B, Geffner H (2006) Learning depth-first search: a unified approach to heuristic search in deterministic and non-deterministic settings and its application to MDP. In: Proceedings of the 16th international conference on automated planning and scheduling. Cumbria, UKBoutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 11: 1–94Chang I, Soo H (2007) Simulation-based algorithms for Markov decision processes Communications and control engineering. Springer, LondonDai P, Goldsmith J (2007a) Faster dynamic programming for Markov decision processes. Technical report. Doctoral consortium, department of computer science and engineering. University of WashingtonDai P, Goldsmith J (2007b) Topological value iteration algorithm for Markov decision processes. In: Proceedings of the 20th international joint conference on artificial intelligence. Hyderabad, India, pp 1860–1865Dai P, Hansen EA (2007c) Prioritizing bellman backups without a priority queue. In: Proceedings of the 17th international conference on automated planning and scheduling, association for the advancement of artificial intelligence. Rhode Island, USA, pp 113–119Dibangoye JS, Chaib-draa B, Mouaddib A (2008) A Novel prioritization technique for solving Markov decision processes. In: Proceedings of the 21st international FLAIRS (The Florida Artificial Intelligence Research Society) conference, association for the advancement of artificial intelligence. Florida, USAFerguson D, Stentz A (2004) Focused propagation of MDPs for path planning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence. pp 310–317Hansen EA, Zilberstein S (2001) LAO: a heuristic search algorithm that finds solutions with loops. Artif Intell 129: 35–62Hinderer K, Waldmann KH (2003) The critical discount factor for finite Markovian decision processes with an absorbing set. Math Methods Oper Res 57: 1–19Li L (2009) A unifying framework for computational reinforcement learning theory. PhD Thesis. The state university of New Jersey, New Brunswick. NJLittman ML, Dean TL, Kaelbling LP (1995) On the complexity of solving Markov decision problems.In: Proceedings of the 11th international conference on uncertainty in artificial intelligence. Montreal, Quebec pp 394–402McMahan HB, Gordon G (2005a) Fast exact planning in Markov decision processes. In: Proceedings of the 15th international conference on automated planning and scheduling. Monterey, CA, USAMcMahan HB, Gordon G (2005b) Generalizing Dijkstra’s algorithm and gaussian elimination for solving MDPs. Technical report, Carnegie Mellon University, PittsburghMeuleau N, Brafman R, Benazera E (2006) Stochastic over-subscription planning using hierarchies of MDPs. In: Proceedings of the 16th international conference on automated planning and scheduling. Cumbria, UK, pp 121–130Moore A, Atkeson C (1993) Prioritized sweeping: reinforcement learning with less data and less real time. Mach Learn 13: 103–130Puterman ML (1994) Markov decision processes. Wiley Editors, New YorkPuterman ML (2005) Markov decision processes. Wiley Inter Science Editors, New YorkRussell S (2005) Artificial intelligence: a modern approach. Making complex decisions (Ch-17), 2nd edn. Pearson Prentice Hill Ed., USAShani G, Brafman R, Shimony S (2008) Prioritizing point-based POMDP solvers. IEEE Trans Syst Man Cybern 38(6): 1592–1605Sniedovich M (2006) Dijkstra’s algorithm revisited: the dynamic programming connexion. Control Cybern 35: 599–620Sniedovich M (2010) Dynamic programming: foundations and principles, 2nd edn. Pure and Applied Mathematics Series, UKTijms HC (2003) A first course in stochastic models. Discrete-time Markov decision processes (Ch-6). Wiley Editors, UKVanderbei RJ (1996) Optimal sailing strategies. Statistics and operations research program, University of Princeton, USA ( http://www.orfe.princeton.edu/~rvdb/sail/sail.html )Vanderbei RJ (2008) Linear programming: foundations and extensions, 3rd edn. Springer, New YorkWingate D, Seppi KD (2005) Prioritization methods for accelerating MDP solvers. J Mach Learn Res 6: 851–88

Crossref

RiuNet

Understanding Aesthetic Evaluation using Deep Learning

Author: A Blair
CG Johnson
D Singh
DL Donoho
E Brunswik
H Leder
H Leder
J McCormack
J McCormack
L Maaten van der
N Jausovec
P Bontrager
PJ Bentley
R Marimont
RE Bellman
S Todd
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/04/2020
Field of study

A bottleneck in any evolutionary art system is aesthetic evaluation. Many different methods have been proposed to automate the evaluation of aesthetics, including measures of symmetry, coherence, complexity, contrast and grouping. The interactive genetic algorithm (IGA) relies on human-in-the-loop, subjective evaluation of aesthetics, but limits possibilities for large search due to user fatigue and small population sizes. In this paper we look at how recent advances in deep learning can assist in automating personal aesthetic judgement. Using a leading artist's computer art dataset, we use dimensionality reduction methods to visualise both genotype and phenotype space in order to support the exploration of new territory in any generative system. Convolutional Neural Networks trained on the user's prior aesthetic evaluations are used to suggest new possibilities similar or between known high quality genotype-phenotype mappings

arXiv.org e-Print Archive

Goldsmiths Research Online

Crossref

Structure Learning in Human Sequential Decision-Making

Author: A Fel'dbaum
A Gelman
A Johnson
A Smith
AC Courville
AD Horowitz
AJ Yu
C Anderson
C Watkins
D Acuna
D Heckerman
DA Braun
Daniel E. Acuña
I Erev
J Anderson
J Banks
JB Tenenbaum
JB Tenenbaum
JC Gittins
JC Gittins
L Kaelbling
M Steyvers
M Steyvers
MD Lee
MJA Strens
MS Yi
N Gans
ND Daw
P Poupart
P Whittle
Paul Schrater
R Dearden
R Howard
RE Bellman
RE Bellman
RE Neapolitan
RJ Meyer
RS Sutton
SJ Gershman
TEJ Behrens
Tim Behrens
W Edwards
W Edwards
W Schultz
W Schultz
Y Brackbill
Y Sakai
Y Sakai
Publication venue: Public Library of Science
Publication date: 01/12/2010
Field of study

Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in sequential decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people exhibit structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Join forces or cheat: evolutionary analysis of a consumer-resource system

Author: A Houston
AA Melikyan
AR Akhmetzhanov
C Carathéodory
F Dercole
F Hamelin
J Maynard Smith
JD Murray
L Carroll
L Mailleret
L Valen Van
LS Pontryagin
N Perrin
N Perrin
P Auger
RE Bellman
SD Mylius
TL Vincent
WM Schaffer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceIn this contribution we consider a seasonal consumer-resource system and focus on the evolution of consumer behavior. It is assumed that consumer and resource individuals live and interact during seasons of fixed lengths separated by winter periods. All individuals die at the end of the season and the size of the next generation is determined by the the consumer-resource interaction which took place during the season. Resource individuals are assumed to reproduce at a constant rate, while consumers have to trade-off between foraging for resources, which increases their reproductive abilities, or reproducing. Firstly, we assume that consumers cooperate in such a way that they maximize each consumer's individual fitness. Secondly, we consider the case where such a population is challenged by selfish mutants who do not cooperate. Finally we study the system dynamics over many seasons and show that mutants eventually replace the original cooperating population, but are finally as vulnerable as the initial cooperating consumers

Crossref

INRIA a CCSD electronic archive server

HAL-INSU

Neuroevolutionary reinforcement learning for generalized control of simulated helicopters

Author: A Moore
B Tanner
D Floreano
D Floreano
D Goldberg
D Pratihar
DE Moriarty
G Harik
G Tesauro
J Gauci
J Hurst
K.O Stanley
KO Stanley
L Cardamone
LP Kaelbling
LP Kaelbling
M Schmidt
N Hansen
O Maron
O Mihatsch
O Sigaud
P Abbeel
P De Boer
P Geibel
P Nordin
P Ponterosso
P Schroder
P Stagge
P Stone
R Brafman
R Regis
RE Bellman
RE Bellman
Rogier Koppejan
RS Sutton
RS Sutton
S Chen
S Whiteson
S Whiteson
S Whiteson
S Whiteson
S Wilson
Shimon Whiteson
X Yao
Y Jin
Y Jin
Y Ong
Publication venue: Springer-Verlag
Publication date: 01/01/2011
Field of study

This article presents an extended case study in the application of neuroevolution to generalized simulated helicopter hovering, an important challenge problem for reinforcement learning. While neuroevolution is well suited to coping with the domain’s complex transition dynamics and high-dimensional state and action spaces, the need to explore efficiently and learn on-line poses unusual challenges. We propose and evaluate several methods for three increasingly challenging variations of the task, including the method that won first place in the 2008 Reinforcement Learning Competition. The results demonstrate that (1) neuroevolution can be effective for complex on-line reinforcement learning tasks such as generalized helicopter hovering, (2) neuroevolution excels at finding effective helicopter hovering policies but not at learning helicopter models, (3) due to the difficulty of learning reliable models, model-based approaches to helicopter hovering are feasible only when domain expertise is available to aid the design of a suitable model representation and (4) recent advances in efficient resampling can enable neuroevolution to tackle more aggressively generalized reinforcement learning tasks

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

UvA-DARE

International Migration, Integration and Social Cohesion online publications

An intuitionistic fuzzy programming method for group decision making with interval-valued fuzzy preference relations

Author: B Zhu
D Hauser
F Liu
F Liu
F Shen
Feng Wang
Gai-li Xu
H Chen
H Ishibuchi
J Pang
J Wang
Jing Tang
Jiu-ying Dong
KT Atanassov
L Chen
L Mikhailov
L Mikhailov
L Mikhailov
RE Bellman
RE Moore
S Genç
SA Orlovsky
Shu-Ping Wan
TL Saaty
Y Xu
YM Wang
YM Wang
ZJ Wang
ZS Xu
ZS Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Northumbria Research Link

Teeside University's Research Repository

An advanced Bayesian model for the visual tracking of multiple interacting objects

Author: A Doucet
A Doucet
B-N Vo
C Cuevas
C Hue
CM Bishop
CR del Blanco
CR del Blanco
D MacKay
D Reid
D Salmond
E Maggio
G Pulford
H Gauvrit
IJ Cox
LY Pao
M Piccardi
N Gordon
R Mahler
R Streit
RE Bellman
S Arulampalam
S Blackman
S Blackman
S Lauritzen
S Särkkä
T Fortmann
Y Ma
Z Khan
Z Khan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Visual tracking of multiple objects is a key component of many visual-based systems. While there are reliable algorithms for tracking a single object in constrained scenarios, the object tracking is still a challenge in uncontrolled situations involving multiple interacting objects that have a complex dynamics. In this article, a novel Bayesian model for tracking multiple interacting objects in unrestricted situations is proposed. This is accomplished by means of an advanced object dynamic model that predicts possible interactive behaviors, which in turn depend on the inference of potential events of object occlusion. The proposed tracking model can also handle false and missing detections that are typical from visual object detectors operating in uncontrolled scenarios. On the other hand, a Rao-Blackwellization technique has been used to improve the accuracy of the estimated object trajectories, which is a fundamental aspect in the tracking of multiple objects due to its high dimensionality. Excellent results have been obtained using a publicly available database, proving the efficiency of the proposed approach

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Archivo Digital UPM

Identification problems for systems of nonlinear evolution equations and functional equations

Author: AW Leung
GS Jones
İsmet Gölgeleyen
L Debnath
M Kuczma
M Kuczma
Mustafa Yildiz
RA Poluektov
RE Bellman
S Zheng
YE Anikonov
Yurii E Anikonov
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Automatic Design of Synthetic Gene Circuits through Mixed Integer Non-linear Programming

Author: A Mukhopadhyay
B Goodwin
Braun
C Moler
CH Wu
D Densmore
G Rodrigo
G Rodrigo
Ilias Tagkopoulos
J Beal
J García-Ojalvo
JH Davis
JN Weiss
John Kececioglu
JS Griffith
JS Griffith
K Nath
Linh Huynh
LM Tuttle
M Dasika
M Pedersen
MA Marchisio
Matthias Köppe
MB Elowitz
Mukund Thattai
P Belotti
P Francois
Rachael
RE Bellman
S Basu
S Basu
S Hooshangi
T Ellis
T Gardner
T Kuhlman
T Lu
V Bansal
WE Boyce
XJ Feng
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Automatic design of synthetic gene circuits poses a significant challenge to synthetic biology, primarily due to the complexity of biological systems, and the lack of rigorous optimization methods that can cope with the combinatorial explosion as the number of biological parts increases. Current optimization methods for synthetic gene design rely on heuristic algorithms that are usually not deterministic, deliver sub-optimal solutions, and provide no guaranties on convergence or error bounds. Here, we introduce an optimization framework for the problem of part selection in synthetic gene circuits that is based on mixed integer non-linear programming (MINLP), which is a deterministic method that finds the globally optimal solution and guarantees convergence in finite time. Given a synthetic gene circuit, a library of characterized parts, and user-defined constraints, our method can find the optimal selection of parts that satisfy the constraints and best approximates the objective function given by the user. We evaluated the proposed method in the design of three synthetic circuits (a toggle switch, a transcriptional cascade, and a band detector), with both experimentally constructed and synthetic promoter libraries. Scalability and robustness analysis shows that the proposed framework scales well with the library size and the solution space. The work described here is a step towards a unifying, realistic framework for the automated design of biological circuits

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Fuzzy positive primitive formulas

Author: A Atserias
B Rossman
B Rossman
CC Chang
D Geiger
F Esteva
H Fargier
MS Pini
P Cintula
P Dellunde
P Dellunde
P Dellunde
Pedro Meseguer
PG Kolaitis
RE Bellman
S Bistarelli
V Torra
W Hodges
Publication venue
Publication date: 01/01/2018
Field of study

Can non-classical logic contribute to the analysis of complexity in computer science? In this paper, we give an step towards the solution of this open problem, taking a logical model-theoretic approach to the analysis of complexity in fuzzy constraint satisfaction. We study fuzzy positive-primitive sentences, and we present an algebraic characterization of classes axiomatized by these kind of sentences in terms of homomorphisms and finite direct products. The ultimate goal is to study the expressiveness and reasoning mechanisms of non-classical languages, with respect to constraint satisfaction problems and, in general, in modelling decision scenario

Crossref

Diposit Digital de Documents de la UAB