62 research outputs found

    Constructive Preference Elicitation over Hybrid Combinatorial Spaces

    Full text link
    Preference elicitation is the task of suggesting a highly preferred configuration to a decision maker. The preferences are typically learned by querying the user for choice feedback over pairs or sets of objects. In its constructive variant, new objects are synthesized "from scratch" by maximizing an estimate of the user utility over a combinatorial (possibly infinite) space of candidates. In the constructive setting, most existing elicitation techniques fail because they rely on exhaustive enumeration of the candidates. A previous solution explicitly designed for constructive tasks comes with no formal performance guarantees, and can be very expensive in (or unapplicable to) problems with non-Boolean attributes. We propose the Choice Perceptron, a Perceptron-like algorithm for learning user preferences from set-wise choice feedback over constructive domains and hybrid Boolean-numeric feature spaces. We provide a theoretical analysis on the attained regret that holds for a large class of query selection strategies, and devise a heuristic strategy that aims at optimizing the regret in practice. Finally, we demonstrate its effectiveness by empirical evaluation against existing competitors on constructive scenarios of increasing complexity.Comment: AAAI 2018, computing methodologies, machine learning, learning paradigms, supervised learning, structured output

    Efficiently Finding Approximately-Optimal Queries for Improving Policies and Guaranteeing Safety

    Full text link
    When a computational agent (called the “robot”) takes actions on behalf of a human user, it may be uncertain about the human’s preferences. The human may initially specify her preferences incompletely or inaccurately. In this case, the robot’s performance may be unsatisfactory or even cause negative side effects to the environment. There are approaches in the literature that may solve this problem. For example, the human can provide some demonstrations which clarify the robot’s uncertainty. The human may give real-time feedback to the robot’s behavior, or monitor the robot and stop the robot when it may perform anything dangerous. However, these methods typically require much of the human’s attention. Alternatively, the robot may estimate the human’s true preferences using the specified preferences, but this is error-prone and requires making assumptions on how the human specifies her preferences. In this thesis, I consider a querying approach. Before taking any actions, the robot has a chance to query the human about her preferences. For example, the robot may query the human about which trajectory in a set of trajectories she likes the most, or whether the human cares about some side effects to the domain. After the human responds to the query, the robot expects to improve its performance and/or guarantee that its behavior is considered safe by the human. If we do not impose any constraint on the number of queries the robot can pose, the robot may keep posing queries until it is absolutely certain about the human’s preferences. This may consume too much of the human’s cognitive load. The information obtained in the responses to some of the queries may only marginally improve the robot’s performance, which is not worth the human’s attention at all. So in the problems considered in this thesis, I constrain the number of queries that the robot can pose, or associate each query with a cost. The research question is how to efficiently find the most useful query under such constraints. Finding a provably optimal query can be challenging since it is usually a combinatorial optimization problem. In this thesis, I contribute to providing efficient query selection algorithms under uncertainty. I first formulate the robot’s uncertainty as reward uncertainty and safety-constraint uncertainty. Under only reward uncertainty, I provide a query selection algorithm that finds approximately-optimal k-response queries. Under only safety-constraint uncertainty, I provide a query selection algorithm that finds an optimal k-element query to improve a known safe policy, and an algorithm that uses a set-cover-based query selection strategy to find an initial safe policy. Under both types of uncertainty simultaneously, I provide a batch-query-based querying method that empirically outperforms other baseline querying methods.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163125/1/shunzh_1.pd

    Evaluating sets of multi-attribute alternatives with uncertain preferences

    Get PDF
    In a decision-making problem, there can be uncertainty regarding the user preferences concerning the available alternatives. Thus, for a decision support system, it is essential to analyse the user preferences to make personalised recommendations. In this thesis we focus on Multiattribute Utility Theory (MAUT) which aims to define user preference models and elicitation procedures for alternatives evaluated with a vector of a fixed number of conflicting criteria. In this context, a preference model is usually represented with a real value function over the criteria used to evaluate alternatives, and an elicitation procedure is a process of defining such value function. The most preferred alternative will then be the one that maximises the value function. With MAUT models, it is common to represent the uncertainty of the user preferences with a parameterised value function. Each instantiation of this parameterisation then represents a user preference model compatible with the preference information collected so far. For example, a common linear value function is the weighted sum of the criteria evaluating an alternative, which is parameterised with respect to the set of weights. We focus on this type of preference models and in particular on value functions evaluating sets of alternatives rather single alternatives. These value functions can be used for example to define if a set of alternatives is preferred to another one, or which is the worst-case loss in terms of utility units of recommending a set of alternatives. We define the concept of setwise minimal equivalent subset (SME) and algorithms for its computation. Briefly, SME is the subset of an input set of alternatives with equivalent value function and minimum cardinality. We generalise standard preference relations used to compare single alternatives with the purpose of comparing sets of alternatives. We provide computational procedures to compute SME and evaluate preference relations with particular focus on linear value functions. We make extensive use of the Minimax Regret criterion, which is a common method to evaluate alternatives for potential questions and recommendations with uncertain value functions. It prescribes an outcome that minimises the worst-case loss with respect to all the possible parameterisation of the value function. In particular, we focus on its setwise generalisation, namely \textit{Setwise Minimax Regret} (SMR), which is the worst-case loss of recommending a set of alternatives. We provide a novel and efficient procedure for the computation of the SMR when supposing a linear value function. We also present a novel incremental preference elicitation framework for a supplier selection process, where a realistic medium-size factory inspires constraints and objectives of the underlying optimization problem. This preference elicitation framework applies for generic multiattribute combinatorial problems based on a linear preference model, and it is particularly useful when the computation of the set of Pareto optimal alternatives is practically unfeasible

    Efficient exact computation of setwise minimax regret for interactive preference elicitation

    Get PDF
    A key issue in artificial intelligence methods for interactive preference elicitation is choosing at each stage an appropriate query to the user, in order to find a near-optimal solution as quickly as possible. A theoretically attractive method is to choose a query that minimises max setwise regret (which corresponds to the worst case loss response in terms of value of information). We focus here on the situation in which the choices are represented explicitly in a database, and with a model of user utility as a weighted sum of the criteria; in this case when the user makes a choice, an agent learns a linear constraint on the unknown vector of weights. We develop an algorithmic method for computing minimax setwise regret for this form of preference model, by making use of a SAT solver with cardinality constraints to prune the search space, and computing max setwise regret using an extreme points method. Our experimental results demonstrate the feasibility of the approach and the very substantial speed up over the state of the art

    Preference Learning

    Get PDF
    This report documents the program and the outcomes of Dagstuhl Seminar 14101 “Preference Learning”. Preferences have recently received considerable attention in disciplines such as machine learning, knowledge discovery, information retrieval, statistics, social choice theory, multiple criteria decision making, decision under risk and uncertainty, operations research, and others. The motivation for this seminar was to showcase recent progress in these different areas with the goal of working towards a common basis of understanding, which should help to facilitate future synergies

    APRIL: Active Preference-learning based Reinforcement Learning

    Get PDF
    This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent demonstrations. Earlier work has presented an iterative preference-based RL framework: expert preferences are exploited to learn an approximate policy return, thus enabling the agent to achieve direct policy search. Iteratively, the agent selects a new candidate policy and demonstrates it; the expert ranks the new demonstration comparatively to the previous best one; the expert's ranking feedback enables the agent to refine the approximate policy return, and the process is iterated. In this paper, preference-based reinforcement learning is combined with active ranking in order to decrease the number of ranking queries to the expert needed to yield a satisfactory policy. Experiments on the mountain car and the cancer treatment testbeds witness that a couple of dozen rankings enable to learn a competent policy

    Maximizing Expected Value of Information in Decision Problems by Querying on a Wish-to-Know Basis.

    Full text link
    An agent acting under uncertainty regarding how it should complete the task assigned to it by its human user can learn more about how it should behave by posing queries to its human user. Asking too many queries, however, may risk requiring undue attentional demand of the user, and so the agent should prioritize asking the most valuable queries. For decision-making agents, Expected Value of Information (EVOI) measures the value of a query, and so given a set of queries the agent can ask, the agent should ask the query that is expected to maximally improve its performance by selecting the query with highest EVOI in its set. Unfortunately, to compute the EVOI of a query, the agent must consider how each possible response would influence its future behavior, which makes query selection particularly challenging in settings where planning the agent's behavior would be expensive even without the added complication of considering queries to ask, especially when there are many potential queries the agent should consider. The focus of this dissertation is on developing query selection algorithms that can be feasibly applied to such settings. The main novel approach studied, Wishful Query Projection (WQP), is based on the intuition that the agent should consider which query to ask on the basis of obtaining knowledge that would help it resolve a particular dilemma that it wishes could be resolved, as opposed to blindly searching its entire query set in hopes of finding one that would give it valuable knowledge. In implementing WQP, this dissertation contributes algorithms that are founded upon the following novel result: for myopic settings, when the agent can ask any query as long as the query has no more than some set number of possible responses, the best query takes the form of asking the user to choose from a specified subset of ways for the agent to behave. The work presented shows that WQP selects queries with near-optimal EVOI when the agent's query set is (1) balanced in the range of queries it contains; and (2) rich in terms of the highest contained query EVOI.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120772/1/rwcohn_1.pd

    qEUBO: A Decision-Theoretic Acquisition Function for Preferential Bayesian Optimization

    Full text link
    Preferential Bayesian optimization (PBO) is a framework for optimizing a decision maker's latent utility function using preference feedback. This work introduces the expected utility of the best option (qEUBO) as a novel acquisition function for PBO. When the decision maker's responses are noise-free, we show that qEUBO is one-step Bayes optimal and thus equivalent to the popular knowledge gradient acquisition function. We also show that qEUBO enjoys an additive constant approximation guarantee to the one-step Bayes-optimal policy when the decision maker's responses are corrupted by noise. We provide an extensive evaluation of qEUBO and demonstrate that it outperforms the state-of-the-art acquisition functions for PBO across many settings. Finally, we show that, under sufficient regularity conditions, qEUBO's Bayesian simple regret converges to zero at a rate o(1/n)o(1/n) as the number of queries, nn, goes to infinity. In contrast, we show that simple regret under qEI, a popular acquisition function for standard BO often used for PBO, can fail to converge to zero. Enjoying superior performance, simple computation, and a grounded decision-theoretic justification, qEUBO is a promising acquisition function for PBO.Comment: In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 202
    • …
    corecore