26,200 research outputs found

    Bayesian multitask inverse reinforcement learning

    Get PDF
    We generalise the problem of inverse reinforcement learning to multiple tasks, from multiple demonstrations. Each one may represent one expert trying to solve a different task, or as different experts trying to solve the same task. Our main contribution is to formalise the problem as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. In doing so, we introduce a prior on policy optimality, which is more natural to specify. We show that our framework allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure

    Game Networks

    Full text link
    We introduce Game networks (G nets), a novel representation for multi-agent decision problems. Compared to other game-theoretic representations, such as strategic or extensive forms, G nets are more structured and more compact; more fundamentally, G nets constitute a computationally advantageous framework for strategic inference, as both probability and utility independencies are captured in the structure of the network and can be exploited in order to simplify the inference process. An important aspect of multi-agent reasoning is the identification of some or all of the strategic equilibria in a game; we present original convergence methods for strategic equilibrium which can take advantage of strategic separabilities in the G net structure in order to simplify the computations. Specifically, we describe a method which identifies a unique equilibrium as a function of the game payoffs, and one which identifies all equilibria.Comment: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000

    A cooperative cellular and broadcast conditional access system for Pay-TV systems

    Get PDF
    This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2009 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.The lack of interoperability between Pay-TV service providers and a horizontally integrated business transaction model have compromised the competition in the Pay-TV market. In addition, the lack of interactivity with customers has resulted in high churn rate and improper security measures have contributed into considerable business loss. These issues are the main cause of high operational costs and subscription fees in the Pay-TV systems. As a result, this paper presents the Mobile Conditional Access System (MICAS) as an end-to-end access control solution for Pay-TV systems. It incorporates the mobile and broadcasting systems and provides a platform whereby service providers can effectively interact with their customers, personalize their services and adopt appropriate security measurements. This would result in the decrease of operating expenses and increase of customers' satisfaction in the system. The paper provides an overview of state-of-the-art conditional access solutions followed by detailed description of design, reference model implementation and analysis of possible MICAS security architectures.Strategy & Technology (S&T) Lt

    Every Smile is Unique: Landmark-Guided Diverse Smile Generation

    Full text link
    Each smile is unique: one person surely smiles in different ways (e.g., closing/opening the eyes or mouth). Given one input image of a neutral face, can we generate multiple smile videos with distinctive characteristics? To tackle this one-to-many video generation problem, we propose a novel deep learning architecture named Conditional Multi-Mode Network (CMM-Net). To better encode the dynamics of facial expressions, CMM-Net explicitly exploits facial landmarks for generating smile sequences. Specifically, a variational auto-encoder is used to learn a facial landmark embedding. This single embedding is then exploited by a conditional recurrent network which generates a landmark embedding sequence conditioned on a specific expression (e.g., spontaneous smile). Next, the generated landmark embeddings are fed into a multi-mode recurrent landmark generator, producing a set of landmark sequences still associated to the given smile class but clearly distinct from each other. Finally, these landmark sequences are translated into face videos. Our experimental results demonstrate the effectiveness of our CMM-Net in generating realistic videos of multiple smile expressions.Comment: Accepted as a poster in Conference on Computer Vision and Pattern Recognition (CVPR), 201

    On Estimating Multi-Attribute Choice Preferences using Private Signals and Matrix Factorization

    Full text link
    Revealed preference theory studies the possibility of modeling an agent's revealed preferences and the construction of a consistent utility function. However, modeling agent's choices over preference orderings is not always practical and demands strong assumptions on human rationality and data-acquisition abilities. Therefore, we propose a simple generative choice model where agents are assumed to generate the choice probabilities based on latent factor matrices that capture their choice evaluation across multiple attributes. Since the multi-attribute evaluation is typically hidden within the agent's psyche, we consider a signaling mechanism where agents are provided with choice information through private signals, so that the agent's choices provide more insight about his/her latent evaluation across multiple attributes. We estimate the choice model via a novel multi-stage matrix factorization algorithm that minimizes the average deviation of the factor estimates from choice data. Simulation results are presented to validate the estimation performance of our proposed algorithm.Comment: 6 pages, 2 figures, to be presented at CISS conferenc

    An efficient and versatile approach to trust and reputation using hierarchical Bayesian modelling

    No full text
    In many dynamic open systems, autonomous agents must interact with one another to achieve their goals. Such agents may be self-interested and, when trusted to perform an action, may betray that trust by not performing the action as required. Due to the scale and dynamism of these systems, agents will often need to interact with other agents with which they have little or no past experience. Each agent must therefore be capable of assessing and identifying reliable interaction partners, even if it has no personal experience with them. To this end, we present HABIT, a Hierarchical And Bayesian Inferred Trust model for assessing how much an agent should trust its peers based on direct and third party information. This model is robust in environments in which third party information is malicious, noisy, or otherwise inaccurate. Although existing approaches claim to achieve this, most rely on heuristics with little theoretical foundation. In contrast, HABIT is based exclusively on principled statistical techniques: it can cope with multiple discrete or continuous aspects of trustee behaviour; it does not restrict agents to using a single shared representation of behaviour; it can improve assessment by using any observed correlation between the behaviour of similar trustees or information sources; and it provides a pragmatic solution to the whitewasher problem (in which unreliable agents assume a new identity to avoid bad reputation). In this paper, we describe the theoretical aspects of HABIT, and present experimental results that demonstrate its ability to predict agent behaviour in both a simulated environment, and one based on data from a real-world webserver domain. In particular, these experiments show that HABIT can predict trustee performance based on multiple representations of behaviour, and is up to twice as accurate as BLADE, an existing state-of-the-art trust model that is both statistically principled and has been previously shown to outperform a number of other probabilistic trust models

    The fundamental problem of command : plan and compliance in a partially centralised economy

    Get PDF
    When a principal gives an order to an agent and advances resources for its implementation, the temptations for the agent to shirk or steal from the principal rather than comply constitute the fundamental problem of command. Historically, partially centralised command economies enforced compliance in various ways, assisted by nesting the fundamental problem of exchange within that of command. The Soviet economy provides some relevant data. The Soviet command system combined several enforcement mechanisms in an equilibrium that shifted as agents learned and each mechanism's comparative costs and benefits changed. When the conditions for an equilibrium disappeared, the system collapsed.Comparative Economic Studies (2005) 47, 296ā€“314. doi:10.1057/palgrave.ces.810011
    • ā€¦
    corecore