18,657 research outputs found
Multi-agent Deep Covering Option Discovery
The use of options can greatly accelerate exploration in reinforcement
learning, especially when only sparse reward signals are available. While
option discovery methods have been proposed for individual agents, in
multi-agent reinforcement learning settings, discovering collaborative options
that can coordinate the behavior of multiple agents and encourage them to visit
the under-explored regions of their joint state space has not been considered.
In this case, we propose Multi-agent Deep Covering Option Discovery, which
constructs the multi-agent options through minimizing the expected cover time
of the multiple agents' joint state space. Also, we propose a novel framework
to adopt the multi-agent options in the MARL process. In practice, a
multi-agent task can usually be divided into some sub-tasks, each of which can
be completed by a sub-group of the agents. Therefore, our algorithm framework
first leverages an attention mechanism to find collaborative agent sub-groups
that would benefit most from coordinated actions. Then, a hierarchical
algorithm, namely HA-MSAC, is developed to learn the multi-agent options for
each sub-group to complete their sub-tasks first, and then to integrate them
through a high-level policy as the solution of the whole task. This
hierarchical option construction allows our framework to strike a balance
between scalability and effective collaboration among the agents. The
evaluation based on multi-agent collaborative tasks shows that the proposed
algorithm can effectively capture the agent interactions with the attention
mechanism, successfully identify multi-agent options, and significantly
outperforms prior works using single-agent options or no options, in terms of
both faster exploration and higher task rewards.Comment: This paper was presented in part at the ICML Reinforcement Learning
for Real Life Workshop, July 202
Beyond A/B Testing: Sequential Randomization for Developing Interventions in Scaled Digital Learning Environments
Randomized experiments ensure robust causal inference that are critical to
effective learning analytics research and practice. However, traditional
randomized experiments, like A/B tests, are limiting in large scale digital
learning environments. While traditional experiments can accurately compare two
treatment options, they are less able to inform how to adapt interventions to
continually meet learners' diverse needs. In this work, we introduce a trial
design for developing adaptive interventions in scaled digital learning
environments -- the sequential randomized trial (SRT). With the goal of
improving learner experience and developing interventions that benefit all
learners at all times, SRTs inform how to sequence, time, and personalize
interventions. In this paper, we provide an overview of SRTs, and we illustrate
the advantages they hold compared to traditional experiments. We describe a
novel SRT run in a large scale data science MOOC. The trial results
contextualize how learner engagement can be addressed through inclusive
culturally targeted reminder emails. We also provide practical advice for
researchers who aim to run their own SRTs to develop adaptive interventions in
scaled digital learning environments
Heuristic usability evaluation on games: a modular approach
Heuristic evaluation is the preferred method to assess usability in games when experts conduct this
evaluation. Many heuristics guidelines have been proposed attending to specificities of games but
they only focus on specific subsets of games or platforms. In fact, to date the most used guideline to
evaluate games usability is still Nielsen’s proposal, which is focused on generic software. As a
result, most evaluations do not cover important aspects in games such as mobility, multiplayer
interactions, enjoyability and playability, etc. To promote the usage of new heuristics adapted to
different game and platform aspects we propose a modular approach based on the classification of
existing game heuristics using metadata and a tool, MUSE (Meta-heUristics uSability Evaluation
tool) for games, which allows a rebuild of heuristic guidelines based on metadata selection in order
to obtain a customized list for every real evaluation case. The usage of these new rebuilt heuristic
guidelines allows an explicit attendance to a wide range of usability aspects in games and a better
detection of usability issues. We preliminarily evaluate MUSE with an analysis of two different
games, using both the Nielsen’s heuristics and the customized heuristic lists generated by our tool.Unión Europea PI055-15/E0
Socio-technical transition processes: A real option based reasoning.
Using a real option reasoning perspective we study the uncertainties and irreversibilities that impact the investment decisions of firms during the different phases of technological transitions. The analysis of transition dynamics via real options reasoning allows the provision of an alternative and more qualified explanation of investment decisions according to the sequentiality of pathways considered. In our framework, flexibility management through option investments concerns both the incumbent and the future technological regime. In the first case it refers to ex-post flexibility management and in the second case to ex-ante flexibility management.
Planning as Optimization: Dynamically Discovering Optimal Configurations for Runtime Situations
The large number of possible configurations of modern software-based systems,
combined with the large number of possible environmental situations of such
systems, prohibits enumerating all adaptation options at design time and
necessitates planning at run time to dynamically identify an appropriate
configuration for a situation. While numerous planning techniques exist, they
typically assume a detailed state-based model of the system and that the
situations that warrant adaptations are known. Both of these assumptions can be
violated in complex, real-world systems. As a result, adaptation planning must
rely on simple models that capture what can be changed (input parameters) and
observed in the system and environment (output and context parameters). We
therefore propose planning as optimization: the use of optimization strategies
to discover optimal system configurations at runtime for each distinct
situation that is also dynamically identified at runtime. We apply our approach
to CrowdNav, an open-source traffic routing system with the characteristics of
a real-world system. We identify situations via clustering and conduct an
empirical study that compares Bayesian optimization and two types of
evolutionary optimization (NSGA-II and novelty search) in CrowdNav
Boolean Matrix Factorization Meets Consecutive Ones Property
Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well
Futures Exchange Innovations: Reinforcement versus Cannibalism
Futures exchanges are in constant search of futures contracts that will generate a profitable level of trading volume. In this context, it would be interesting to determine what effect the introduction of new futures contracts have on the trading volume of the contracts already listed. The introduction of new futures contracts may lead to a volume increase for those contracts already listed and hence, contribute to the success of a futures exchange. On the other hand, the introduction of new futures contracts could lead to a volume decrease for the contracts already listed, thereby undermining the success of the futures exchange accordingly. Using a multi-product hedging model in which the perspective has been shifted from portfolio to exchange management, we study these effects. Using data from two exchanges that are different regarding market liquidity (Amsterdam Exchanges versus Chicago Board of Trade) we show the usefulness of the proposed tool. Our findings have several important implications for a futures exchange's innovation policy.
- …