263,748 research outputs found
Discrete and fuzzy dynamical genetic programming in the XCSF learning classifier system
A number of representation schemes have been presented for use within
learning classifier systems, ranging from binary encodings to neural networks.
This paper presents results from an investigation into using discrete and fuzzy
dynamical system representations within the XCSF learning classifier system. In
particular, asynchronous random Boolean networks are used to represent the
traditional condition-action production system rules in the discrete case and
asynchronous fuzzy logic networks in the continuous-valued case. It is shown
possible to use self-adaptive, open-ended evolution to design an ensemble of
such dynamical systems within XCSF to solve a number of well-known test
problems
Q-learning with Nearest Neighbors
We consider model-free reinforcement learning for infinite-horizon discounted
Markov Decision Processes (MDPs) with a continuous state space and unknown
transition kernel, when only a single sample path under an arbitrary policy of
the system is available. We consider the Nearest Neighbor Q-Learning (NNQL)
algorithm to learn the optimal Q function using nearest neighbor regression
method. As the main contribution, we provide tight finite sample analysis of
the convergence rate. In particular, for MDPs with a -dimensional state
space and the discounted factor , given an arbitrary sample
path with "covering time" , we establish that the algorithm is guaranteed
to output an -accurate estimate of the optimal Q-function using
samples. For instance, for a
well-behaved MDP, the covering time of the sample path under the purely random
policy scales as so the sample
complexity scales as Indeed, we
establish a lower bound that argues that the dependence of is necessary.Comment: Accepted to NIPS 201
Multiagent Maximum Coverage Problems: The Trade-off Between Anarchy and Stability
The price of anarchy and price of stability are three well-studied
performance metrics that seek to characterize the inefficiency of equilibria in
distributed systems. The distinction between these two performance metrics
centers on the equilibria that they focus on: the price of anarchy
characterizes the quality of the worst-performing equilibria, while the price
of stability characterizes the quality of the best-performing equilibria. While
much of the literature focuses on these metrics from an analysis perspective,
in this work we consider these performance metrics from a design perspective.
Specifically, we focus on the setting where a system operator is tasked with
designing local utility functions to optimize these performance metrics in a
class of games termed covering games. Our main result characterizes a
fundamental trade-off between the price of anarchy and price of stability in
the form of a fully explicit Pareto frontier. Within this setup, optimizing the
price of anarchy comes directly at the expense of the price of stability (and
vice versa). Our second results demonstrates how a system-operator could
incorporate an additional piece of system-level information into the design of
the agents' utility functions to breach these limitations and improve the
system's performance. This valuable piece of system-level information pertains
to the performance of worst performing agent in the system.Comment: 14 pages, 4 figure
Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems
Crowdsourcing markets have emerged as a popular platform for matching
available workers with tasks to complete. The payment for a particular task is
typically set by the task's requester, and may be adjusted based on the quality
of the completed work, for example, through the use of "bonus" payments. In
this paper, we study the requester's problem of dynamically adjusting
quality-contingent payments for tasks. We consider a multi-round version of the
well-known principal-agent model, whereby in each round a worker makes a
strategic choice of the effort level which is not directly observable by the
requester. In particular, our formulation significantly generalizes the
budget-free online task pricing problems studied in prior work.
We treat this problem as a multi-armed bandit problem, with each "arm"
representing a potential contract. To cope with the large (and in fact,
infinite) number of arms, we propose a new algorithm, AgnosticZooming, which
discretizes the contract space into a finite number of regions, effectively
treating each region as a single arm. This discretization is adaptively
refined, so that more promising regions of the contract space are eventually
discretized more finely. We analyze this algorithm, showing that it achieves
regret sublinear in the time horizon and substantially improves over
non-adaptive discretization (which is the only competing approach in the
literature).
Our results advance the state of art on several different topics: the theory
of crowdsourcing markets, principal-agent problems, multi-armed bandits, and
dynamic pricing.Comment: This is the full version of a paper in the ACM Conference on
Economics and Computation (ACM-EC), 201
- …