31 research outputs found
The Optimal Reward Problem: Designing Effective Reward for Bounded Agents.
In the field of reinforcement learning, agent designers build agents which seek to maximize reward. In standard practice, one reward function serves two purposes. It is used to evaluate the agent and is used to directly guide agent behavior in the agent's learning algorithm.
This dissertation makes four main contributions to the theory and practice of reward function design. The first is a demonstration that if an agent is bounded---if it is limited in its ability to maximize expected reward---the designer may benefit by considering two reward functions. A designer reward function is used to evaluate the agent, while a separate agent reward function is used to guide agent behavior. The designer can then solve the Optimal Reward Problem (ORP): choose the agent reward function which leads to the greatest expected reward for the designer.
The second contribution is the demonstration through examples that good reward functions are chosen by assessing an agent's limitations and how they interact with the environment. An agent which maintains knowledge of the environment in the form of a Bayesian posterior distribution, but lacks adequate planning resources, can be given a reward proportional to the variance of the posterior, resulting in provably efficient exploration. An agent with poor modeling assumptions can be punished for visiting the areas of the state space it has trouble modeling, resulting in better performance.
The third contribution is the Policy Gradient for Reward Design (PGRD) algorithm, a convergent gradient ascent algorithm for learning good reward functions. Experiments in multiple environments demonstrate that using PGRD for reward optimization yields better agents than using the designer's reward directly as the agent's reward. It also outperforms the use of an evaluation function at the leaf-states of the planning tree.
Finally, this dissertation shows that the ORP differs from the popular work on potential-based reward shaping. Shaping rewards are constrained by properties of the environment and the designer's reward function, but they generally are defined irrespective of properties of the agent. The best shaping reward functions are suboptimal for some agents and environments.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89705/1/jdsorg_1.pd
Learning plan networks in conversational video games
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.Includes bibliographical references (p. 121-123).We look forward to a future where robots collaborate with humans in the home and workplace, and virtual agents collaborate with humans in games and training simulations. A representation of common ground for everyday scenarios is essential for these agents if they are to be effective collaborators and communicators. Effective collaborators can infer a partner's goals and predict future actions. Effective communicators can infer the meaning of utterances based on semantic context. This thesis introduces a computational cognitive model of common ground called a Plan Network. A Plan Network is a statistical model that provides representations of social roles, object affordances, and expected patterns of behavior and language. I describe a methodology for unsupervised learning of a Plan Network using a multiplayer video game, visualization of this network, and evaluation of the learned model with respect to human judgment of typical behavior. Specifically, I describe learning the Restaurant Plan Network from data collected from over 5,000 players of an online game called The Restaurant Game.by Jeffrey David Orkin.S.M
Combining SOA and BPM Technologies for Cross-System Process Automation
This paper summarizes the results of an industry case study that introduced a cross-system business process automation solution based on a combination of SOA and BPM standard technologies (i.e., BPMN, BPEL, WSDL). Besides discussing major weaknesses of the existing, custom-built, solution and comparing them against experiences with the developed prototype, the paper presents a course of action for transforming the current solution into the proposed solution. This includes a general approach, consisting of four distinct steps, as well as specific action items that are to be performed for every step. The discussion also covers language and tool support and challenges arising from the transformation
State of New Hampshire. Reports, 1907-1908, volume IV.- Biennial
Sometimes issued both annually and biennially; Each vol. contains the reports of various departments of the government of the state of New Hampshire; Includes attorneys general\u27s opinion
Hands-on Science. Advancing Science. Improving Education
The book herein aims to contribute to the advancement of Science to the
improvement of Science Education and to an effective implementation of a
sound widespread scientific literacy at all levels of society. Its chapters reunite
a variety of diverse and valuable works presented in this line of thought at the
15th International Conference on Hands-on Science “Advancing Science.
Improving Education
Security Analysis of System Behaviour - From "Security by Design" to "Security at Runtime" -
The Internet today provides the environment for novel applications and
processes which may evolve way beyond pre-planned scope and
purpose. Security analysis is growing in complexity with the increase
in functionality, connectivity, and dynamics of current electronic
business processes. Technical processes within critical
infrastructures also have to cope with these developments. To tackle
the complexity of the security analysis, the application of models is
becoming standard practice. However, model-based support for security
analysis is not only needed in pre-operational phases but also during
process execution, in order to provide situational security awareness
at runtime.
This cumulative thesis provides three major contributions to modelling
methodology.
Firstly, this thesis provides an approach for model-based analysis and
verification of security and safety properties in order to support
fault prevention and fault removal in system design or redesign.
Furthermore, some construction principles for the design of
well-behaved scalable systems are given.
The second topic is the analysis of the exposition of vulnerabilities
in the software components of networked systems to exploitation by
internal or external threats. This kind of fault forecasting allows
the security assessment of alternative system configurations and
security policies. Validation and deployment of security policies
that minimise the attack surface can now improve fault tolerance and
mitigate the impact of successful attacks.
Thirdly, the approach is extended to runtime applicability. An
observing system monitors an event stream from the observed system
with the aim to detect faults - deviations from the specified
behaviour or security compliance violations - at runtime.
Furthermore, knowledge about the expected behaviour given by an
operational model is used to predict faults in the near
future. Building on this, a holistic security management strategy is
proposed. The architecture of the observing system is described and
the applicability of model-based security analysis at runtime is
demonstrated utilising processes from several industrial scenarios.
The results of this cumulative thesis are provided by 19 selected
peer-reviewed papers
Introductory Computer Forensics
INTERPOL (International Police) built cybercrime programs to keep up with emerging cyber threats, and aims to coordinate and assist international operations for ?ghting crimes involving computers. Although signi?cant international efforts are being made in dealing with cybercrime and cyber-terrorism, ?nding effective, cooperative, and collaborative ways to deal with complicated cases that span multiple jurisdictions has proven dif?cult in practic