16 research outputs found
Tree-Independent Dual-Tree Algorithms
Dual-tree algorithms are a widely used class of branch-and-bound algorithms.
Unfortunately, developing dual-tree algorithms for use with different trees and
problems is often complex and burdensome. We introduce a four-part logical
split: the tree, the traversal, the point-to-point base case, and the pruning
rule. We provide a meta-algorithm which allows development of dual-tree
algorithms in a tree-independent manner and easy extension to entirely new
types of trees. Representations are provided for five common algorithms; for
k-nearest neighbor search, this leads to a novel, tighter pruning bound. The
meta-algorithm also allows straightforward extensions to massively parallel
settings.Comment: accepted in ICML 201
The Parallel Problems Server
Introduction We describe a novel architecture for a "linear algebra server" that operates in parallel on extremely large matrices. Matrices are created by the server and distributed across many machines. All operations therefore take place automatically in parallel. The server is extenisble and includes a general application interface to clients. This project is motivated by three observations. First, many widely-used algorithms in Computer Science can be realized as operations on matrices. Common techniques in smart text retrieval, object recognition and machine learning can all be described in this framework. Second, it is of utmost importance to be able to test new ideas quickly in an interactive setting. Finally, in order to understand the computational issues that arise with many algorithms it is necessary to test them on very large problems. There are commonly available solutions that address this general problem, but there are several difficulties. Interactive and easi
On the Difficulty of Modular Reinforcement Learning for Real-World Partial Programming
In recent years there has been a great deal of interest in "modular reinforcement learning" (MRL). Typically, problems are decomposed into concurrent subgoals, allowing increased scalability and state abstraction. An arbitrator combines the subagents' preferences to select an action. In this work, we contrast treating an MRL agent as a set of subagents with the same goal with treating an MRL agent as a set of subagents who may have different, possibly conflicting goals. We argue that the latter is a more realistic description of real-world problems, especially when building partial programs. We address a range of algorithms for single-goal MRL, and leveraging social choice theory, we present an impossibility result for applications of such algorithms to multigoal MRL. We suggest an alternative formulation of arbitration as scheduling that avoids the assumptions of comparability of preference that are implicit in single-goal MRL. A notable feature of this formulation is the explicit codification of the tradeoffs between the subproblems. Finally, we introduce A BL, a language that encapsulates many of these ideas
Object Focused Q-Learning for Autonomous Agents
© ACM 2013. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in AAMAS '13 Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems.We present Object Focused Q-learning (OF-Q), a novel reinforcement learning algorithm that can offer exponential speed-ups over classic Q-learning on domains composed of independent objects. An OF-Q agent treats the state space as a collection of objects organized into different object classes. Our key contribution is a control policy that uses non-optimal Q-functions to estimate the risk of ignoring parts of the state space. We compare our algorithm to traditional Q-learning and previous arbitration algorithms in two domains, including a version of Space Invaders
Automatic Task Decomposition and State Abstraction from Demonstration
Presented at the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012) 4-8 June 2012, Valencia, Spain.© 2012, International Foundation for Autonomous Agents and
Multiagent Systems (www.ifaamas.org).Both Learning from Demonstration (LfD) and Reinforcement Learning (RL) are popular approaches for building decision-making agents. LfD applies supervised learning to a set of human demonstrations to infer and imitate the human policy, while RL uses only a reward signal and exploration to find an optimal policy. For complex tasks both of these techniques may be ineffective. LfD may require many more demonstrations than it is feasible to obtain, and RL can take an inadmissible amount of time to converge. We present Automatic Decomposition and Abstraction from demonstration (ADA), an algorithm that uses mutual information measures over a set of human demonstrations to decompose a sequential decision process into several sub- tasks, finding state abstractions for each one of these sub- tasks. ADA then projects the human demonstrations into the abstracted state space to build a policy. This policy can later be improved using RL algorithms to surpass the performance of the human teacher. We find empirically that ADA can find satisficing policies for problems that are too complex to be solved with traditional LfD and RL algorithms. In particular, we show that we can use mutual information across state features to leverage human demonstrations to reduce the effects of the curse of dimensionality by finding subtasks and abstractions in sequential decision processes
Autonomous Nondeterministic Tour Guides: Improving Quality of Experience with TTD-MDPs
In this paper, we address the problem of building a system of autonomous tour
guides for a complex environment, such as a museum with many visitors. Visitors
may have varying preferences for types of art or may wish to visit different areas
across multiple visits. Often, these goals conflict. For example, many visitors may
wish to see the museum's most popular work, but that could cause congestion,
ruining the experience. Thus, our task is to build a set of agents that can satisfy
their visitors' goals while simultaneously providing quality experiences for all.
We use targeted trajectory distribution MDPs (TTD-MDPs), a technology developed
to guide players in an interactive entertainment setting. The solution to a
TTD-MDP is a probabilistic policy that results in a specific distribution of trajectories
through a state space. We motivate TTD-MDPs for the museum tour problem,
then describe the development of a number of models of museum visitors.
Additionally, we propose a museum model and simulate tours using personalized
TTD-MDP tour guides for each kind of visitor. We explain how the use of probabilistic
policies reduces the congestion experienced by visitors while preserving
their ability to pursue and realize goals
From devices to tasks: . . .
One of the driving applications of ubiquitous computing is universal appliance interaction: the ability to use arbitrary mobile devices to interact with arbitrary appliances, such as TVs, printers, and lights. Because of limited screen real estate and the plethora of devices and commands available to the user, a central problem in achieving this vision is predicting which appliances and devices the user wishes to use next in order to make interfaces for those devices available. We believe that universal appliance interaction is best supported through the deployment of appliance user interfaces (UIs) that are personalized to a user's habits and information needs. In this paper, we suggest that, in a truly ubiquitous computing environment, the user will not necessarily think of devices as separate entities; therefore, rather than focus on which device the user may want to use next, we present a method for automatically discovering the user's common tasks (e.g., watching a movie, or surfing TV channels), predicting the task that the user wishes to engage in, and generating an appropriate interface that spans multiple devices. We have several results. We show that it is possible to discover and cluster collections of commands that represent tasks and to use history to predict the next task reliably. In fact, we show that moving from devices to tasks is not only a useful way of representing our core problem, but that it is, in fact, an easier problem to solve. Finally, we show that tasks can vary from user to user
Reinforcement Learning for Declarative Optimization-Based Drama Management
A long-standing challenge in interactive entertainment is the creation of story-based games with dynamically responsive story-lines. Such games are populated by multiple objects and autonomous characters, and must provide a coherent story experience while giving the player freedom of action. To maintain coherence, the game author must provide for modifying the world in reaction to the player's actions, directing agents to act in particular ways (overriding or modulating their autonomy), or causing inanimate objects to reconfigure themselves "behind the player's back". Declarativ