82 research outputs found

    Resource Allocation Among Agents with MDP-Induced Preferences

    Full text link
    Allocating scarce resources among agents to maximize global utility is, in general, computationally challenging. We focus on problems where resources enable agents to execute actions in stochastic environments, modeled as Markov decision processes (MDPs), such that the value of a resource bundle is defined as the expected value of the optimal MDP policy realizable given these resources. We present an algorithm that simultaneously solves the resource-allocation and the policy-optimization problems. This allows us to avoid explicitly representing utilities over exponentially many resource bundles, leading to drastic (often exponential) reductions in computational complexity. We then use this algorithm in the context of self-interested agents to design a combinatorial auction for allocating resources. We empirically demonstrate the effectiveness of our approach by showing that it can, in minutes, optimally solve problems for which a straightforward combinatorial resource-allocation technique would require the agents to enumerate up to 2^100 resource bundles and the auctioneer to solve an NP-complete problem with an input of that size

    Techniques for the allocation of resources under uncertainty

    Get PDF
    L’allocation de ressources est un problème omniprésent qui survient dès que des ressources limitées doivent être distribuées parmi de multiples agents autonomes (e.g., personnes, compagnies, robots, etc). Les approches standard pour déterminer l’allocation optimale souffrent généralement d’une très grande complexité de calcul. Le but de cette thèse est de proposer des algorithmes rapides et efficaces pour allouer des ressources consommables et non consommables à des agents autonomes dont les préférences sur ces ressources sont induites par un processus stochastique. Afin d’y parvenir, nous avons développé de nouveaux modèles pour des problèmes de planifications, basés sur le cadre des Processus Décisionnels de Markov (MDPs), où l’espace d’actions possibles est explicitement paramétrisés par les ressources disponibles. Muni de ce cadre, nous avons développé des algorithmes basés sur la programmation dynamique et la recherche heuristique en temps-réel afin de générer des allocations de ressources pour des agents qui agissent dans un environnement stochastique. En particulier, nous avons utilisé la propriété acyclique des créations de tâches pour décomposer le problème d’allocation de ressources. Nous avons aussi proposé une stratégie de décomposition approximative, où les agents considèrent des interactions positives et négatives ainsi que les actions simultanées entre les agents gérants les ressources. Cependant, la majeure contribution de cette thèse est l’adoption de la recherche heuristique en temps-réel pour l’allocation de ressources. À cet effet, nous avons développé une approche basée sur la Q-décomposition munie de bornes strictes afin de diminuer drastiquement le temps de planification pour formuler une politique optimale. Ces bornes strictes nous ont permis d’élaguer l’espace d’actions pour les agents. Nous montrons analytiquement et empiriquement que les approches proposées mènent à des diminutions de la complexité de calcul par rapport à des approches de planification standard. Finalement, nous avons testé la recherche heuristique en temps-réel dans le simulateur SADM, un simulateur d’allocation de ressource pour une frégate.Resource allocation is an ubiquitous problem that arises whenever limited resources have to be distributed among multiple autonomous entities (e.g., people, companies, robots, etc). The standard approaches to determine the optimal resource allocation are computationally prohibitive. The goal of this thesis is to propose computationally efficient algorithms for allocating consumable and non-consumable resources among autonomous agents whose preferences for these resources are induced by a stochastic process. Towards this end, we have developed new models of planning problems, based on the framework of Markov Decision Processes (MDPs), where the action sets are explicitly parameterized by the available resources. Given these models, we have designed algorithms based on dynamic programming and real-time heuristic search to formulating thus allocations of resources for agents evolving in stochastic environments. In particular, we have used the acyclic property of task creation to decompose the problem of resource allocation. We have also proposed an approximative decomposition strategy, where the agents consider positive and negative interactions as well as simultaneous actions among the agents managing the resources. However, the main contribution of this thesis is the adoption of stochastic real-time heuristic search for a resource allocation. To this end, we have developed an approach based on distributed Q-values with tight bounds to diminish drastically the planning time to formulate the optimal policy. These tight bounds enable to prune the action space for the agents. We show analytically and empirically that our proposed approaches lead to drastic (in many cases, exponential) improvements in computational efficiency over standard planning methods. Finally, we have tested real-time heuristic search in the SADM simulator, a simulator for the resource allocation of a platform

    Formal Modelling for Multi-Robot Systems Under Uncertainty

    Get PDF
    Purpose of Review: To effectively synthesise and analyse multi-robot behaviour, we require formal task-level models which accurately capture multi-robot execution. In this paper, we review modelling formalisms for multi-robot systems under uncertainty, and discuss how they can be used for planning, reinforcement learning, model checking, and simulation. Recent Findings: Recent work has investigated models which more accurately capture multi-robot execution by considering different forms of uncertainty, such as temporal uncertainty and partial observability, and modelling the effects of robot interactions on action execution. Other strands of work have presented approaches for reducing the size of multi-robot models to admit more efficient solution methods. This can be achieved by decoupling the robots under independence assumptions, or reasoning over higher level macro actions. Summary: Existing multi-robot models demonstrate a trade off between accurately capturing robot dependencies and uncertainty, and being small enough to tractably solve real world problems. Therefore, future research should exploit realistic assumptions over multi-robot behaviour to develop smaller models which retain accurate representations of uncertainty and robot interactions; and exploit the structure of multi-robot problems, such as factored state spaces, to develop scalable solution methods.Comment: 23 pages, 0 figures, 2 tables. Current Robotics Reports (2023). This version of the article has been accepted for publication, after peer review (when applicable) but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://dx.doi.org/10.1007/s43154-023-00104-

    Operational Decision Making under Uncertainty: Inferential, Sequential, and Adversarial Approaches

    Get PDF
    Modern security threats are characterized by a stochastic, dynamic, partially observable, and ambiguous operational environment. This dissertation addresses such complex security threats using operations research techniques for decision making under uncertainty in operations planning, analysis, and assessment. First, this research develops a new method for robust queue inference with partially observable, stochastic arrival and departure times, motivated by cybersecurity and terrorism applications. In the dynamic setting, this work develops a new variant of Markov decision processes and an algorithm for robust information collection in dynamic, partially observable and ambiguous environments, with an application to a cybersecurity detection problem. In the adversarial setting, this work presents a new application of counterfactual regret minimization and robust optimization to a multi-domain cyber and air defense problem in a partially observable environment

    Contextual Models for Sequential Recommendation

    Get PDF
    Recommender systems aim to capture the interests of users in order to provide them with tailored recommendations for items or services they might like. User interests are often unique and depend on many unobservable factors including internal moods or external events. This phenomenon creates a broad range of tasks for recommendation systems that are difficult to address altogether. Nevertheless, analyzing the historical activities of users sheds light on the characteristic traits of individual behaviors in order to enable qualified recommendations. In this thesis, we deal with the problem of comprehending the interests of users, searching for pertinent items, and ranking them to recommend the most relevant items to the users given different contexts and situations. We focus on recommendation problems in sequential scenarios, where a series of past events influences the future decisions of users. These events are either the developed preferences of users over a long span of time or highly influenced by the zeitgeist and common trends. We are among the first to model recommendation systems in a sequential fashion via exploiting the short-term interests of users in session-based scenarios. We leverage reinforcement learning techniques to capture underlying short- and long-term user interests in the absence of explicit feedback and develop novel contextual approaches for sequential recommendation systems. These approaches are designed to efficiently learn models for different types of recommendation tasks and are extended to continuous and multi-agent settings. All the proposed methods are empirically studied on large-scale real-world scenarios ranging from e-commerce to sport and demonstrate excellent performance in comparison to baseline approaches

    Resource allocation problems in stochastic sequential decision making

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2009.Includes bibliographical references (p. 159-162).In this thesis, we study resource allocation problems that arise in the context of stochastic sequential decision making problems. The practical utility of optimal algorithms for these problems is limited due to their high computational and storage requirements. Also, an increasing number of applications require a decentralized solution. We develop techniques for approximately solving certain class of resource allocation problems that arise in the context of stochastic sequential decision making problems that are computationally efficient with a focus on decentralized algorithms where appropriate. The first resource allocation problem that we study is a stochastic sequential decision making problem with multiple decision makers (agents) with two main features 1) Partial observability Each agent may not have complete information regarding the system 2) Limited Communication - Each agent may not be able to communicate with all other agents at all times. We formulate a Markov Decision Process (MDP) for this problem. The features of partial observability and limited communication impose additional computational constraints on the exact solution of the MDP. We propose a scheme for approximating the optimal Q function and the optimal value function associated with this MDP as a linear combination of preselected basis functions. We show that the proposed approximation scheme leads to decentralization of the agents' decisions thereby enabling their implementation under limited communication. We propose a linear program, ALP, for selecting the parameters for combining the basis functions. We establish bounds relating the approximation error due to the choice of the parameters selected by the ALP with the best possible error given the choice of basis functions.(cont.) Motivated by the need for a decentralized solution to the ALP, which is equivalent to a resource allocation problem with separable, concave objective function, we analyze a general class of resource allocation problems with separable concave objective functions. We propose a distributed algorithm for this class of problems when the objective function is differentiable and establish its convergence and convergence rate properties. We develop a smoothing scheme for non-differentiable objective functions and extend the algorithm for this case. Finally, we build on these results to extend the decentralized algorithm to accommodate non-negativity constraints on the resources. Numerical investigations on the performance of the developed algorithm show that our algorithm is competitive with its centralized counterpart. The second resource allocation problem that we study is the problem of optimally accepting or rejecting arriving orders in a Make-To-Order (MTO) manufacturing firm. We model the production facility of the MTO manufacturing firm as a queue and view the time of the production facility as a resource that needs to be optimally allotted between current and future orders. We formulate the Order Acceptance Problem under two arrival processes - Poisson process (OAP-P), and Bernoulli Process (OAP-B) and formulate both problems as MDPs. We provide insights into the structure of the optimal order acceptance policy for OAP-B under the assumption of First Come First Served (FCFS) scheduling of accepted orders.(cont.) We investigate a class of randomized order acceptance policies for OAP-B called static policies that are practically relevant due to their ease of implementation and develop a procedure for computing the policy gradient for any static policy. Using these results for OAP-B, we propose 4 heuristics for OAP-P. We numerically investigate the performance of the proposed heuristics and compare their performance with other heuristics reported in literature. One of our proposed heuristics, FCFS-ValueFunction outperforms other heuristics under a variety of conditions while also being easy to implement.by Hariharan Lakshmanan.Ph.D
    • …
    corecore