Search CORE

263,748 research outputs found

Discrete and fuzzy dynamical genetic programming in the XCSF learning classifier system

Author: B Mesot
C Van den Broeck
CA Reiter
E Di Paulo
HP Schwefel
J Di
JE Moody
JL Elman
L Bull
L Glass
Larry Bull
M Sipper
MC Su
N Lemke
PL Lanzi
PL Lanzi
Richard J. Preen
SW Wilson
T Werner
TE Ingerson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

A number of representation schemes have been presented for use within learning classifier systems, ranging from binary encodings to neural networks. This paper presents results from an investigation into using discrete and fuzzy dynamical system representations within the XCSF learning classifier system. In particular, asynchronous random Boolean networks are used to represent the traditional condition-action production system rules in the discrete case and asynchronous fuzzy logic networks in the continuous-valued case. It is shown possible to use self-adaptive, open-ended evolution to design an ensemble of such dynamical systems within XCSF to solve a number of well-known test problems

arXiv.org e-Print Archive

Crossref

UWE Bristol Research Repository

Q-learning with Nearest Neighbors

Author: Shah Devavrat
Xie Qiaomin
Publication venue
Publication date: 22/10/2018
Field of study

We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a

d

-dimensional state space and the discounted factor

\gamma \in (0,1)

, given an arbitrary sample path with "covering time"

L

, we establish that the algorithm is guaranteed to output an

\varepsilon

-accurate estimate of the optimal Q-function using

\tilde{O}\big(L/(\varepsilon^3(1-\gamma)^7)\big)

samples. For instance, for a well-behaved MDP, the covering time of the sample path under the purely random policy scales as

\tilde{O}\big(1/\varepsilon^d\big),

so the sample complexity scales as

\tilde{O}\big(1/\varepsilon^{d+3}\big).

Indeed, we establish a lower bound that argues that the dependence of

\tilde{\Omega}\big(1/\varepsilon^{d+2}\big)

is necessary.Comment: Accepted to NIPS 201

arXiv.org e-Print Archive

DSpace@MIT

An implementation of genetic-based learning classifier system on a wet clutch system

Author: De Keyser Robain
Pinte Gregory
Stoev Julian
Wyns Bart
Zhong Yu
Publication venue
Publication date: 01/01/2011
Field of study

Ghent University Academic Bibliography

Multiagent Maximum Coverage Problems: The Trade-off Between Anarchy and Stability

Author: alexis
blume
brown
fudenberg
gairing
koutsoupias
kozai
krause
marden
nemhauser
paccagnan
paccagnan
ramaswamy
spieser
von falkenhausen
young
young
Publication venue
Publication date: 14/03/2020
Field of study

The price of anarchy and price of stability are three well-studied performance metrics that seek to characterize the inefficiency of equilibria in distributed systems. The distinction between these two performance metrics centers on the equilibria that they focus on: the price of anarchy characterizes the quality of the worst-performing equilibria, while the price of stability characterizes the quality of the best-performing equilibria. While much of the literature focuses on these metrics from an analysis perspective, in this work we consider these performance metrics from a design perspective. Specifically, we focus on the setting where a system operator is tasked with designing local utility functions to optimize these performance metrics in a class of games termed covering games. Our main result characterizes a fundamental trade-off between the price of anarchy and price of stability in the form of a fully explicit Pareto frontier. Within this setup, optimizing the price of anarchy comes directly at the expense of the price of stability (and vice versa). Our second results demonstrates how a system-operator could incorporate an additional piece of system-level information into the design of the agents' utility functions to breach these limitations and improve the system's performance. This valuable piece of system-level information pertains to the performance of worst performing agent in the system.Comment: 14 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems

Author: Ho Chien-Ju
Slivkins Aleksandrs
Vaughan Jennifer Wortman
Publication venue
Publication date: 02/09/2015
Field of study

Crowdsourcing markets have emerged as a popular platform for matching available workers with tasks to complete. The payment for a particular task is typically set by the task's requester, and may be adjusted based on the quality of the completed work, for example, through the use of "bonus" payments. In this paper, we study the requester's problem of dynamically adjusting quality-contingent payments for tasks. We consider a multi-round version of the well-known principal-agent model, whereby in each round a worker makes a strategic choice of the effort level which is not directly observable by the requester. In particular, our formulation significantly generalizes the budget-free online task pricing problems studied in prior work. We treat this problem as a multi-armed bandit problem, with each "arm" representing a potential contract. To cope with the large (and in fact, infinite) number of arms, we propose a new algorithm, AgnosticZooming, which discretizes the contract space into a finite number of regions, effectively treating each region as a single arm. This discretization is adaptively refined, so that more promising regions of the contract space are eventually discretized more finely. We analyze this algorithm, showing that it achieves regret sublinear in the time horizon and substantially improves over non-adaptive discretization (which is the only competing approach in the literature). Our results advance the state of art on several different topics: the theory of crowdsourcing markets, principal-agent problems, multi-armed bandits, and dynamic pricing.Comment: This is the full version of a paper in the ACM Conference on Economics and Computation (ACM-EC), 201

arXiv.org e-Print Archive

CiteSeerX