18,657 research outputs found

    Multi-agent Deep Covering Option Discovery

    Full text link
    The use of options can greatly accelerate exploration in reinforcement learning, especially when only sparse reward signals are available. While option discovery methods have been proposed for individual agents, in multi-agent reinforcement learning settings, discovering collaborative options that can coordinate the behavior of multiple agents and encourage them to visit the under-explored regions of their joint state space has not been considered. In this case, we propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space. Also, we propose a novel framework to adopt the multi-agent options in the MARL process. In practice, a multi-agent task can usually be divided into some sub-tasks, each of which can be completed by a sub-group of the agents. Therefore, our algorithm framework first leverages an attention mechanism to find collaborative agent sub-groups that would benefit most from coordinated actions. Then, a hierarchical algorithm, namely HA-MSAC, is developed to learn the multi-agent options for each sub-group to complete their sub-tasks first, and then to integrate them through a high-level policy as the solution of the whole task. This hierarchical option construction allows our framework to strike a balance between scalability and effective collaboration among the agents. The evaluation based on multi-agent collaborative tasks shows that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options, in terms of both faster exploration and higher task rewards.Comment: This paper was presented in part at the ICML Reinforcement Learning for Real Life Workshop, July 202

    Beyond A/B Testing: Sequential Randomization for Developing Interventions in Scaled Digital Learning Environments

    Full text link
    Randomized experiments ensure robust causal inference that are critical to effective learning analytics research and practice. However, traditional randomized experiments, like A/B tests, are limiting in large scale digital learning environments. While traditional experiments can accurately compare two treatment options, they are less able to inform how to adapt interventions to continually meet learners' diverse needs. In this work, we introduce a trial design for developing adaptive interventions in scaled digital learning environments -- the sequential randomized trial (SRT). With the goal of improving learner experience and developing interventions that benefit all learners at all times, SRTs inform how to sequence, time, and personalize interventions. In this paper, we provide an overview of SRTs, and we illustrate the advantages they hold compared to traditional experiments. We describe a novel SRT run in a large scale data science MOOC. The trial results contextualize how learner engagement can be addressed through inclusive culturally targeted reminder emails. We also provide practical advice for researchers who aim to run their own SRTs to develop adaptive interventions in scaled digital learning environments

    Heuristic usability evaluation on games: a modular approach

    Get PDF
    Heuristic evaluation is the preferred method to assess usability in games when experts conduct this evaluation. Many heuristics guidelines have been proposed attending to specificities of games but they only focus on specific subsets of games or platforms. In fact, to date the most used guideline to evaluate games usability is still Nielsen’s proposal, which is focused on generic software. As a result, most evaluations do not cover important aspects in games such as mobility, multiplayer interactions, enjoyability and playability, etc. To promote the usage of new heuristics adapted to different game and platform aspects we propose a modular approach based on the classification of existing game heuristics using metadata and a tool, MUSE (Meta-heUristics uSability Evaluation tool) for games, which allows a rebuild of heuristic guidelines based on metadata selection in order to obtain a customized list for every real evaluation case. The usage of these new rebuilt heuristic guidelines allows an explicit attendance to a wide range of usability aspects in games and a better detection of usability issues. We preliminarily evaluate MUSE with an analysis of two different games, using both the Nielsen’s heuristics and the customized heuristic lists generated by our tool.Unión Europea PI055-15/E0

    Socio-technical transition processes: A real option based reasoning.

    Get PDF
    Using a real option reasoning perspective we study the uncertainties and irreversibilities that impact the investment decisions of firms during the different phases of technological transitions. The analysis of transition dynamics via real options reasoning allows the provision of an alternative and more qualified explanation of investment decisions according to the sequentiality of pathways considered. In our framework, flexibility management through option investments concerns both the incumbent and the future technological regime. In the first case it refers to ex-post flexibility management and in the second case to ex-ante flexibility management.

    Planning as Optimization: Dynamically Discovering Optimal Configurations for Runtime Situations

    Full text link
    The large number of possible configurations of modern software-based systems, combined with the large number of possible environmental situations of such systems, prohibits enumerating all adaptation options at design time and necessitates planning at run time to dynamically identify an appropriate configuration for a situation. While numerous planning techniques exist, they typically assume a detailed state-based model of the system and that the situations that warrant adaptations are known. Both of these assumptions can be violated in complex, real-world systems. As a result, adaptation planning must rely on simple models that capture what can be changed (input parameters) and observed in the system and environment (output and context parameters). We therefore propose planning as optimization: the use of optimization strategies to discover optimal system configurations at runtime for each distinct situation that is also dynamically identified at runtime. We apply our approach to CrowdNav, an open-source traffic routing system with the characteristics of a real-world system. We identify situations via clustering and conduct an empirical study that compares Bayesian optimization and two types of evolutionary optimization (NSGA-II and novelty search) in CrowdNav

    Boolean Matrix Factorization Meets Consecutive Ones Property

    No full text
    Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well

    Futures Exchange Innovations: Reinforcement versus Cannibalism

    Get PDF
    Futures exchanges are in constant search of futures contracts that will generate a profitable level of trading volume. In this context, it would be interesting to determine what effect the introduction of new futures contracts have on the trading volume of the contracts already listed. The introduction of new futures contracts may lead to a volume increase for those contracts already listed and hence, contribute to the success of a futures exchange. On the other hand, the introduction of new futures contracts could lead to a volume decrease for the contracts already listed, thereby undermining the success of the futures exchange accordingly. Using a multi-product hedging model in which the perspective has been shifted from portfolio to exchange management, we study these effects. Using data from two exchanges that are different regarding market liquidity (Amsterdam Exchanges versus Chicago Board of Trade) we show the usefulness of the proposed tool. Our findings have several important implications for a futures exchange's innovation policy.
    corecore