158,504 research outputs found
Planning with imperfect information : interceptor assignment
Thesis (S.M.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2006.Includes bibliographical references (p. 121-123).We consider the problem of assigning a scarce number of interceptors to a wave of incoming atmospheric re-entry vehicles (RV). In this single wave, there is time to assign interceptors to a wave of incoming RVs, gain information on the intercept status, and then if necessary, assign interceptors once more. However, the status information of these RVs may not be reliable. This problem becomes challenging when considering the small inventory of interceptors, imperfect information from sensors, and the possibility of future waves of RVs. This work formulates the problem as a partially observable Markov decision process (POMDP) in order to account for the uncertainty in information. We use a POMDP solution algorithm to find an optimal policy for assigning interceptors to RVs in a single wave. From there, three cases are compared in a simulation of a single wave. These cases are perfect information from sensors; imperfect information from sensors, but acting as it were perfect; and accounting for imperfect information from sensors using the POMDP formulation. Using a variety of parameter variation tests, we examine the performance of the POMDP formulation by comparing the probability of an incoming RV avoiding intercept and the interceptor inventory remaining. We vary the reliability of the sensors, as well as the number of interceptors in inventory, and the number of incoming RVs in the wave. The POMDP formulation consistently provides a policy that conserves more interceptors and approaches the probability of intercept of the other cases. However, situations do exist where the POMDP formulation produces a policy that performs less effectively than a strategy assuming perfect information.by Daniel B. McAllister.S.M
Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4
Unlike perfect information games, where all elements are known to every
player, imperfect information games emulate the real-world complexities of
decision-making under uncertain or incomplete information. GPT-4, the recent
breakthrough in large language models (LLMs) trained on massive passive data,
is notable for its knowledge retrieval and reasoning abilities. This paper
delves into the applicability of GPT-4's learned knowledge for imperfect
information games. To achieve this, we introduce \textbf{Suspicion-Agent}, an
innovative agent that leverages GPT-4's capabilities for performing in
imperfect information games. With proper prompt engineering to achieve
different functions, Suspicion-Agent based on GPT-4 demonstrates remarkable
adaptability across a range of imperfect information card games. Importantly,
GPT-4 displays a strong high-order theory of mind (ToM) capacity, meaning it
can understand others and intentionally impact others' behavior. Leveraging
this, we design a planning strategy that enables GPT-4 to competently play
against different opponents, adapting its gameplay style as needed, while
requiring only the game rules and descriptions of observations as input. In the
experiments, we qualitatively showcase the capabilities of Suspicion-Agent
across three different imperfect information games and then quantitatively
evaluate it in Leduc Hold'em. The results show that Suspicion-Agent can
potentially outperform traditional algorithms designed for imperfect
information games, without any specialized training or examples. In order to
encourage and foster deeper insights within the community, we make our
game-related data publicly available
Sampling-based reactive motion planning with temporal logic constraints and imperfect state information
© 2017, Springer International Publishing AG. This paper presents a method that allows mobile systems with uncertainty in motion and sensing to react to unknown environments while high-level specifications are satisfied. Although previous works have addressed the problem of synthesising controllers under uncertainty constraints and temporal logic specifications, reaction to dynamic environments has not been considered under this scenario. The method uses feedback-based information roadmaps (FIRMs) to break the curse of history associated with partially observable systems. A transition system is incrementally constructed based on the idea of FIRMs by adding nodes on the belief space. Then, a policy is found in the product Markov decision process created between the transition system and a Rabin automaton representing a linear temporal logic formula. The proposed solution allows the system to react to previously unknown elements in the environment. To achieve fast reaction time, a FIRM considering the probability of violating the specification in each transition is used to drive the system towards local targets or to avoid obstacles. The method is demonstrated with an illustrative example
Safe Sequential Path Planning Under Disturbances and Imperfect Information
Multi-UAV systems are safety-critical, and guarantees must be made to ensure
no unsafe configurations occur. Hamilton-Jacobi (HJ) reachability is ideal for
analyzing such safety-critical systems; however, its direct application is
limited to small-scale systems of no more than two vehicles due to an
exponentially-scaling computational complexity. Previously, the sequential path
planning (SPP) method, which assigns strict priorities to vehicles, was
proposed; SPP allows multi-vehicle path planning to be done with a
linearly-scaling computational complexity. However, the previous formulation
assumed that there are no disturbances, and that every vehicle has perfect
knowledge of higher-priority vehicles' positions. In this paper, we make SPP
more practical by providing three different methods to account for disturbances
in dynamics and imperfect knowledge of higher-priority vehicles' states. Each
method has different assumptions about information sharing. We demonstrate our
proposed methods in simulations.Comment: American Control Conference, 201
Provably Safe Robot Navigation with Obstacle Uncertainty
As drones and autonomous cars become more widespread it is becoming
increasingly important that robots can operate safely under realistic
conditions. The noisy information fed into real systems means that robots must
use estimates of the environment to plan navigation. Efficiently guaranteeing
that the resulting motion plans are safe under these circumstances has proved
difficult. We examine how to guarantee that a trajectory or policy is safe with
only imperfect observations of the environment. We examine the implications of
various mathematical formalisms of safety and arrive at a mathematical notion
of safety of a long-term execution, even when conditioned on observational
information. We present efficient algorithms that can prove that trajectories
or policies are safe with much tighter bounds than in previous work. Notably,
the complexity of the environment does not affect our methods ability to
evaluate if a trajectory or policy is safe. We then use these safety checking
methods to design a safe variant of the RRT planning algorithm.Comment: RSS 201
The Update Equivalence Framework for Decision-Time Planning
The process of revising (or constructing) a policy immediately prior to
execution -- known as decision-time planning -- is key to achieving superhuman
performance in perfect-information settings like chess and Go. A recent line of
work has extended decision-time planning to more general imperfect-information
settings, leading to superhuman performance in poker. However, these methods
requires considering subgames whose sizes grow quickly in the amount of
non-public information, making them unhelpful when the amount of non-public
information is large. Motivated by this issue, we introduce an alternative
framework for decision-time planning that is not based on subgames but rather
on the notion of update equivalence. In this framework, decision-time planning
algorithms simulate updates of synchronous learning algorithms. This framework
enables us to introduce a new family of principled decision-time planning
algorithms that do not rely on public information, opening the door to sound
and effective decision-time planning in settings with large amounts of
non-public information. In experiments, members of this family produce
comparable or superior results compared to state-of-the-art approaches in
Hanabi and improve performance in 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe
The Specification of the Planning Systems; Report of the 1993 Questionnaire Survey
This report describes a survey carried out as part of a research project undeaaken hy the Institute for Transport Studies and the School of Computer Studies at the University of Leeds, funded by the Science and Engineering Research Council. The project was concerned with the specification of trip planning systems, which are systems which provide information to travellers and potential travellers ahout all aspects of their journey, but in this case principally route and timetable information for public transport users and route information for car travellers before the journey is made.
Previous evidence had suggested that travellers may make sub-optimal travel decisions, meaning that they may make longer, slower or more expensive journeys than necessary because of imperfect information. Other parts of the project addressed sub-optimality in the choice of mode and time of travel but a main objective of the survey described in this report was to examine sub-optimality of route choice separately for journeys with which respondents were familiar and journeys with which they were unfamiliar. Other objectives were concerned with the travel information currently used, or desired by, the respondents, who were randomly-selected travellers from the West Yorkshire town of Mirfield.
Maps were widely used by car drivers - about one in five used them for familiar trips and ahout three quarters used them for unfamiliar trips. For public transport trips, timetables were used hy about half of the travellers making familiar trips and 95 per cent making unfamiliar trips. Information on delays would have been welcomed by both private and puhlic transport travellers: nearly three-quarters of familiar and unfamiliar car trips would have liked congestion information as would a significant minority of bus users. Most of all, public transport-users would have welcomed information on service delays and cancellations. It would seem that real-time information on delays would be a key feature of a successful trip planning system.
The sub-optimality for car journeys averaged at 2.6 minutes per trip for familiar trips and 6 minutes per trip for unfamiliar trips. Sub-optimality was directly related to trip distance for familiar trips hut not for unfamiliar trips. This indicates a modest hut significant reduction in car journey times could be brought about by trip planning systems. Public transport trip sample sizes were too small to permit reliable estimates to be made of their sub-optimality
- …