184 research outputs found
Solving stable matching problems using answer set programming
Since the introduction of the stable marriage problem (SMP) by Gale and
Shapley (1962), several variants and extensions have been investigated. While
this variety is useful to widen the application potential, each variant
requires a new algorithm for finding the stable matchings. To address this
issue, we propose an encoding of the SMP using answer set programming (ASP),
which can straightforwardly be adapted and extended to suit the needs of
specific applications. The use of ASP also means that we can take advantage of
highly efficient off-the-shelf solvers. To illustrate the flexibility of our
approach, we show how our ASP encoding naturally allows us to select optimal
stable matchings, i.e. matchings that are optimal according to some
user-specified criterion. To the best of our knowledge, our encoding offers the
first exact implementation to find sex-equal, minimum regret, egalitarian or
maximum cardinality stable matchings for SMP instances in which individuals may
designate unacceptable partners and ties between preferences are allowed.
This paper is under consideration in Theory and Practice of Logic Programming
(TPLP).Comment: Under consideration in Theory and Practice of Logic Programming
(TPLP). arXiv admin note: substantial text overlap with arXiv:1302.725
Dynamic Weights in Multi-Objective Deep Reinforcement Learning
Many real-world decision problems are characterized by multiple conflicting
objectives which must be balanced based on their relative importance. In the
dynamic weights setting the relative importance changes over time and
specialized algorithms that deal with such change, such as a tabular
Reinforcement Learning (RL) algorithm by Natarajan and Tadepalli (2005), are
required. However, this earlier work is not feasible for RL settings that
necessitate the use of function approximators. We generalize across weight
changes and high-dimensional inputs by proposing a multi-objective Q-network
whose outputs are conditioned on the relative importance of objectives and we
introduce Diverse Experience Replay (DER) to counter the inherent
non-stationarity of the Dynamic Weights setting. We perform an extensive
experimental evaluation and compare our methods to adapted algorithms from Deep
Multi-Task/Multi-Objective Reinforcement Learning and show that our proposed
network in combination with DER dominates these adapted algorithms across
weight change scenarios and problem domains
Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making
In multi-objective decision planning and learning, much attention is paid to
producing optimal solution sets that contain an optimal policy for every
possible user preference profile. We argue that the step that follows, i.e,
determining which policy to execute by maximising the user's intrinsic utility
function over this (possibly infinite) set, is under-studied. This paper aims
to fill this gap. We build on previous work on Gaussian processes and pairwise
comparisons for preference modelling, extend it to the multi-objective decision
support scenario, and propose new ordered preference elicitation strategies
based on ranking and clustering. Our main contribution is an in-depth
evaluation of these strategies using computer and human-based experiments. We
show that our proposed elicitation strategies outperform the currently used
pairwise methods, and found that users prefer ranking most. Our experiments
further show that utilising monotonicity information in GPs by using a linear
prior mean at the start and virtual comparisons to the nadir and ideal points,
increases performance. We demonstrate our decision support framework in a
real-world study on traffic regulation, conducted with the city of Amsterdam.Comment: AAMAS 2018, Source code at
https://github.com/lmzintgraf/gp_pref_elici
Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets
Many real-world reinforcement learning problems have a hierarchical nature,
and often exhibit some degree of partial observability. While hierarchy and
partial observability are usually tackled separately (for instance by combining
recurrent neural networks and options), we show that addressing both problems
simultaneously is simpler and more efficient in many cases. More specifically,
we make the initiation set of options conditional on the previously-executed
option, and show that options with such Option-Observation Initiation Sets
(OOIs) are at least as expressive as Finite State Controllers (FSCs), a
state-of-the-art approach for learning in POMDPs. OOIs are easy to design based
on an intuitive description of the task, lead to explainable policies and keep
the top-level and option policies memoryless. Our experiments show that OOIs
allow agents to learn optimal policies in challenging POMDPs, while being much
more sample-efficient than a recurrent neural network over options
Dealing with Expert Bias in Collective Decision-Making
Quite some real-world problems can be formulated as decision-making problems
wherein one must repeatedly make an appropriate choice from a set of
alternatives. Expert judgements, whether human or artificial, can help in
taking correct decisions, especially when exploration of alternative solutions
is costly. As expert opinions might deviate, the problem of finding the right
alternative can be approached as a collective decision making problem (CDM).
Current state-of-the-art approaches to solve CDM are limited by the quality of
the best expert in the group, and perform poorly if experts are not qualified
or if they are overly biased, thus potentially derailing the decision-making
process. In this paper, we propose a new algorithmic approach based on
contextual multi-armed bandit problems (CMAB) to identify and counteract such
biased expertises. We explore homogeneous, heterogeneous and polarised expert
groups and show that this approach is able to effectively exploit the
collective expertise, irrespective of whether the provided advice is directly
conducive to good performance, outperforming state-of-the-art methods,
especially when the quality of the provided expertise degrades. Our novel
CMAB-inspired approach achieves a higher final performance and does so while
converging more rapidly than previous adaptive algorithms, especially when
heterogeneous expertise is readily available
Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making
Experts advising decision-makers are likely to display expertise which varies
as a function of the problem instance. In practice, this may lead to
sub-optimal or discriminatory decisions against minority cases. In this work we
model such changes in depth and breadth of knowledge as a partitioning of the
problem space into regions of differing expertise. We provide here new
algorithms that explicitly consider and adapt to the relationship between
problem instances and experts' knowledge. We first propose and highlight the
drawbacks of a naive approach based on nearest neighbor queries. To address
these drawbacks we then introduce a novel algorithm - expertise trees - that
constructs decision trees enabling the learner to select appropriate models. We
provide theoretical insights and empirically validate the improved performance
of our novel approach on a range of problems for which existing methods proved
to be inadequate.Comment: Proceedings of the 40th International Conference on Machine Learning
(2023
- …