16 research outputs found

    Tree-Independent Dual-Tree Algorithms

    Full text link
    Dual-tree algorithms are a widely used class of branch-and-bound algorithms. Unfortunately, developing dual-tree algorithms for use with different trees and problems is often complex and burdensome. We introduce a four-part logical split: the tree, the traversal, the point-to-point base case, and the pruning rule. We provide a meta-algorithm which allows development of dual-tree algorithms in a tree-independent manner and easy extension to entirely new types of trees. Representations are provided for five common algorithms; for k-nearest neighbor search, this leads to a novel, tighter pruning bound. The meta-algorithm also allows straightforward extensions to massively parallel settings.Comment: accepted in ICML 201

    The Parallel Problems Server

    No full text
    Introduction We describe a novel architecture for a "linear algebra server" that operates in parallel on extremely large matrices. Matrices are created by the server and distributed across many machines. All operations therefore take place automatically in parallel. The server is extenisble and includes a general application interface to clients. This project is motivated by three observations. First, many widely-used algorithms in Computer Science can be realized as operations on matrices. Common techniques in smart text retrieval, object recognition and machine learning can all be described in this framework. Second, it is of utmost importance to be able to test new ideas quickly in an interactive setting. Finally, in order to understand the computational issues that arise with many algorithms it is necessary to test them on very large problems. There are commonly available solutions that address this general problem, but there are several difficulties. Interactive and easi

    On the Difficulty of Modular Reinforcement Learning for Real-World Partial Programming

    No full text
    In recent years there has been a great deal of interest in "modular reinforcement learning" (MRL). Typically, problems are decomposed into concurrent subgoals, allowing increased scalability and state abstraction. An arbitrator combines the subagents' preferences to select an action. In this work, we contrast treating an MRL agent as a set of subagents with the same goal with treating an MRL agent as a set of subagents who may have different, possibly conflicting goals. We argue that the latter is a more realistic description of real-world problems, especially when building partial programs. We address a range of algorithms for single-goal MRL, and leveraging social choice theory, we present an impossibility result for applications of such algorithms to multigoal MRL. We suggest an alternative formulation of arbitration as scheduling that avoids the assumptions of comparability of preference that are implicit in single-goal MRL. A notable feature of this formulation is the explicit codification of the tradeoffs between the subproblems. Finally, we introduce A BL, a language that encapsulates many of these ideas

    Object Focused Q-Learning for Autonomous Agents

    Get PDF
    © ACM 2013. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in AAMAS '13 Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems.We present Object Focused Q-learning (OF-Q), a novel reinforcement learning algorithm that can offer exponential speed-ups over classic Q-learning on domains composed of independent objects. An OF-Q agent treats the state space as a collection of objects organized into different object classes. Our key contribution is a control policy that uses non-optimal Q-functions to estimate the risk of ignoring parts of the state space. We compare our algorithm to traditional Q-learning and previous arbitration algorithms in two domains, including a version of Space Invaders

    Automatic Task Decomposition and State Abstraction from Demonstration

    Get PDF
    Presented at the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012) 4-8 June 2012, Valencia, Spain.© 2012, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org).Both Learning from Demonstration (LfD) and Reinforcement Learning (RL) are popular approaches for building decision-making agents. LfD applies supervised learning to a set of human demonstrations to infer and imitate the human policy, while RL uses only a reward signal and exploration to find an optimal policy. For complex tasks both of these techniques may be ineffective. LfD may require many more demonstrations than it is feasible to obtain, and RL can take an inadmissible amount of time to converge. We present Automatic Decomposition and Abstraction from demonstration (ADA), an algorithm that uses mutual information measures over a set of human demonstrations to decompose a sequential decision process into several sub- tasks, finding state abstractions for each one of these sub- tasks. ADA then projects the human demonstrations into the abstracted state space to build a policy. This policy can later be improved using RL algorithms to surpass the performance of the human teacher. We find empirically that ADA can find satisficing policies for problems that are too complex to be solved with traditional LfD and RL algorithms. In particular, we show that we can use mutual information across state features to leverage human demonstrations to reduce the effects of the curse of dimensionality by finding subtasks and abstractions in sequential decision processes

    Autonomous Nondeterministic Tour Guides: Improving Quality of Experience with TTD-MDPs

    Get PDF
    In this paper, we address the problem of building a system of autonomous tour guides for a complex environment, such as a museum with many visitors. Visitors may have varying preferences for types of art or may wish to visit different areas across multiple visits. Often, these goals conflict. For example, many visitors may wish to see the museum's most popular work, but that could cause congestion, ruining the experience. Thus, our task is to build a set of agents that can satisfy their visitors' goals while simultaneously providing quality experiences for all. We use targeted trajectory distribution MDPs (TTD-MDPs), a technology developed to guide players in an interactive entertainment setting. The solution to a TTD-MDP is a probabilistic policy that results in a specific distribution of trajectories through a state space. We motivate TTD-MDPs for the museum tour problem, then describe the development of a number of models of museum visitors. Additionally, we propose a museum model and simulate tours using personalized TTD-MDP tour guides for each kind of visitor. We explain how the use of probabilistic policies reduces the congestion experienced by visitors while preserving their ability to pursue and realize goals

    From devices to tasks: . . .

    No full text
    One of the driving applications of ubiquitous computing is universal appliance interaction: the ability to use arbitrary mobile devices to interact with arbitrary appliances, such as TVs, printers, and lights. Because of limited screen real estate and the plethora of devices and commands available to the user, a central problem in achieving this vision is predicting which appliances and devices the user wishes to use next in order to make interfaces for those devices available. We believe that universal appliance interaction is best supported through the deployment of appliance user interfaces (UIs) that are personalized to a user's habits and information needs. In this paper, we suggest that, in a truly ubiquitous computing environment, the user will not necessarily think of devices as separate entities; therefore, rather than focus on which device the user may want to use next, we present a method for automatically discovering the user's common tasks (e.g., watching a movie, or surfing TV channels), predicting the task that the user wishes to engage in, and generating an appropriate interface that spans multiple devices. We have several results. We show that it is possible to discover and cluster collections of commands that represent tasks and to use history to predict the next task reliably. In fact, we show that moving from devices to tasks is not only a useful way of representing our core problem, but that it is, in fact, an easier problem to solve. Finally, we show that tasks can vary from user to user

    Reinforcement Learning for Declarative Optimization-Based Drama Management

    No full text
    A long-standing challenge in interactive entertainment is the creation of story-based games with dynamically responsive story-lines. Such games are populated by multiple objects and autonomous characters, and must provide a coherent story experience while giving the player freedom of action. To maintain coherence, the game author must provide for modifying the world in reaction to the player's actions, directing agents to act in particular ways (overriding or modulating their autonomy), or causing inanimate objects to reconfigure themselves "behind the player's back". Declarativ
    corecore