270 research outputs found
Symbolic Search in Planning and General Game Playing
Search is an important topic in many areas of AI. Search problems often result in an immense number of states. This work addresses this by using a special datastructure, BDDs, which can represent large sets of states efficiently, often saving space compared to explicit representations. The first part is concerned with an analysis of the complexity of BDDs for some search problems, resulting in lower or upper bounds on BDD sizes for these. The second part is concerned with action planning, an area where the programmer does not know in advance what the search problem will look like. This part presents symbolic algorithms for finding optimal solutions for two different settings, classical and net-benefit planning, as well as several improvements to these algorithms. The resulting planner was able to win the International Planning Competition IPC 2008. The third part is concerned with general game playing, which is similar to planning in that the programmer does not know in advance what game will be played. This work proposes algorithms for instantiating the input and solving games symbolically. For playing, a hybrid player based on UCT and the solver is presented
Quantum-enhanced reinforcement learning
Dissertação de mestrado em Engenharia FísicaThe field of Artificial Intelligence has lately witnessed extraordinary results. The ability to
design a system capable of beating the world champion of Go, an ancient Chinese game
known as the holy grail of AI, caused a spark worldwide, making people believe that some thing revolutionary is about to happen. A different flavor of learning called Reinforcement
Learning is at the core of this revolution. In parallel, we are witnessing the emergence of a
new field, that of Quantum Machine Learning which has already shown promising results in
supervised/unsupervised learning. In this dissertation, we reach for the interplay between
Quantum Computing and Reinforcement Learning.
This learning by interaction was made possible in the quantum setting using the con cept of oraculization of task environments suggested by Dunjko in 2015. In this dissertation,
we extended the oracular instances previously suggested to work in more general stochastic
environments. On top of this quantum agent-environment paradigm we developed a novel
quantum algorithm for near-optimal decision-making based on the Reinforcement Learn ing paradigm known as Sparse Sampling, obtaining a quantum speedup compared to the
classical counterpart. The achievement was a quantum algorithm that exhibits a complexity
independent on the number of states of the environment. This independence guarantees its
suitability for dealing with large state spaces where planning may be inapplicable.
The most important open questions remain whether it is possible to improve the orac ular instances of task environments to deal with even more general environments, especially
the ability to represent negative rewards as a natural mechanism for negative feedback
instead of some normalization of the reward and the extension of the algorithm to perform
an informed tree-based search instead of the uninformed search proposed. Improvements
on this result would allow the comparison between the algorithm and more recent classical
Reinforcement Learning algorithms.O campo da Inteligência Artificial tem tido resultados extraordinários ultimamente, a capacidade de projetar um sistema capaz de vencer o campeão mundial de Go, um antigo jogo de origem Chinesa, conhecido como o santo graal da IA, causou uma faísca em todo o mundo, fazendo as pessoas acreditarem em que algo revolucionário estar a para acontecer. Um tipo diferente de aprendizagem, chamada Aprendizagem por Reforço está no cerne dessa revolução. Em paralelo surge também um novo campo, o da Aprendizagem Máquina Quântica, que já vem apresentando resultados promissores na aprendizagem supervisionada/não, supervisionada. Nesta dissertação, procuramos invés a interação entre Computação Quântica e a Aprendizagem por Reforço.
Esta interação entre agente e Ambiente foi possível no cenário quântico usando o conceito de oraculização de ambientes sugerido por Dunjko em 2015. Neste trabalho, estendemos as instâncias oraculares sugeridas anteriormente para trabalhar em ambientes estocásticos generalizados. Tendo em conta este paradigma quântico agente-ambiente, desenvolvemos um novo algoritmo quântico para tomada de decisão aproximadamente ótima com base no paradigma da Aprendizagem por Reforço conhecido como Amostragem Esparsa, obtendo uma aceleração quântica em comparação com o caso clássico que possibilitou a obtenção de um algoritmo quântico que exibe uma complexidade independente do número de estados do ambiente. Esta independência garante a sua adaptação para ambientes com um grande espaço de estados em que o planeamento pode ser intratável.
As questões mais pertinentes que se colocam é se é possível melhorar as instâncias oraculares de ambientes para lidar com ambientes ainda mais gerais, especialmente a capacidade de exprimir recompensas negativas como um mecanismo natural para feedback negativo em vez de alguma normalização da recompensa. Além disso, a extensão do algoritmo para realizar uma procura em árvore informada ao invés da procura não informada proposta. Melhorias neste resultado permitiriam a comparação entre o algoritmo quântico e os algoritmos clássicos mais recentes da Aprendizagem por Reforço
Intelligent Agents for Active Malware Analysis
The main contribution of this thesis is to give a novel perspective on Active Malware Analysis modeled as a decision making process between intelligent agents. We propose solutions aimed at extracting the behaviors of malware agents with advanced Artificial Intelligence techniques. In particular, we devise novel action selection strategies for the analyzer agents that allow to analyze malware by selecting sequences of triggering actions aimed at maximizing the information acquired. The goal is to create informative models representing the behaviors of the malware agents observed while interacting with them during the analysis process. Such models can then be used to effectively compare a malware against others and to correctly identify the malware famil
Approximate inference in graphical models
Probability theory provides a mathematically rigorous yet conceptually flexible calculus of uncertainty, allowing the construction of complex hierarchical models for real-world inference tasks. Unfortunately, exact inference in probabilistic models is often computationally expensive or even intractable. A close inspection in such situations often reveals that computational bottlenecks are confined to certain aspects of the model, which can be circumvented by approximations without having to sacrifice the model's interesting aspects. The conceptual framework of graphical models provides an elegant means of representing probabilistic models and deriving both exact and approximate inference algorithms in terms of local computations. This makes graphical models an ideal aid in the development of generalizable approximations. This thesis contains a brief introduction to approximate inference in graphical models (Chapter 2), followed by three extensive case studies in which approximate inference algorithms are developed for challenging applied inference problems. Chapter 3 derives the first probabilistic game tree search algorithm. Chapter 4 provides a novel expressive model for inference in psychometric questionnaires. Chapter 5 develops a model for the topics of large corpora of text documents, conditional on document metadata, with a focus on computational speed. In each case, graphical models help in two important ways: They first provide important structural insight into the problem; and then suggest practical approximations to the exact probabilistic solution.This work was supported by a scholarship from Microsoft Research, Ltd
Recommended from our members
Principled control of approximate programs
In conventional computing, most programs are treated as implementations of mathematical functions for which there is an exact output that must computed from a given input. However, in many problem domains, it is sufficient to produce some approximation of this output. For example, when rendering a scene in graphics, it is acceptable to take computational short-cuts if human beings cannot tell the difference in the rendered scene. In other problem domains like machine learning, programs are often implementations of heuristic approaches to solving problems and therefore already compute approximate solutions to the original problem.
This is the key insight for the new research area, approximate computing, which attempts to trade-off such approximations against the cost of computational resources such as program execution time, energy consumption, and memory usage. We believe that approximate computing is an important step towards a more fundamental and comprehensive goal that we call information-efficiency. Current applications compute more information (bits) than are needed to produce their outputs, and since producing and transporting bits of information inside a computer requires energy/computation time/memory usage, information-inefficient computing leads directly to resources inefficiency.
Although there is now a fairly large literature on approximate computing, system researchers have focused mostly on what we can call the forward problem; that is, they have explored different ways in both hardware and software to introduce approximations in a program and have demonstrated that these approximations can enable significant execution speedups and energy savings with some quality degradation of the result. However, these efforts do not provide any guarantee on the amount of the quality degradation. Since the acceptable amount of degradation usually depends on the scenario in which the application is deployed, it is very important to be able to control the degree of approximation. In this dissertation, we refer to this problem as the inverse problem. Relatively little is known about how to solve the inverse problem in a disciplined way.
This dissertation makes two contributions towards solving the inverse problem. First, we investigate a large set of approximate algorithms from a variety of domains in order to understand how approximation is used in real-world applications. From this investigation, we determine that many approximate programs are tunable approximate programs. Tunable approximate programs have one or more parameters called knobs that can be changed to vary the quality of the output of the approximate computation as well as the corresponding cost. For example, an iterative linear equation solver can vary the number of iterations to trade quality of the solution versus the execution time, a Monte Carlo path tracer can change the number of sampling light paths to trade the quality of the resulting image against execution time, etc. Tunable approximate programs provide many opportunities for trading accuracy versus cost. By carefully analyzing these algorithms, we have found a set of patterns for how approximation is applied in tunable programs. Our classification can be used to identify new approximation opportunities in programs.
A second contribution of this dissertation is an approach to solving the inverse problem for tunable approximate programs. Concretely, the problem is to determine knob settings to minimize the cost while keeping the quality degradation within a given bound. There are four challenges: i) for real-world applications, the quality and cost are usually complex non-linear functions of the knobs and these functions are usually hard to express analytically; ii) the quality and the cost for an application vary greatly for different inputs; iii) when an acceptable quality degradation bound is presented, determining the knob setting has to be very efficient so that the extra overhead incurred by the identification will not exceed the cost saved by the approximation; and iv) the approach should be general so that it can be applied to many applications.
To meet these requirements, we formulate the inverse problem as a constrained optimization problem and solve it using a machine learning based approach. We build a system which uses machine learning techniques to learn cost and quality models for the program by profiling the program with a set of representative inputs. Then, when a quality degradation bound is presented, the system searches these error and cost models to identify the knob settings which can achieve the best cost savings while simultaneously guaranteeing the quality degradation bound statistically. We evaluate the system with a set of real world applications, including a social network graph partitioner, an image search engine, a 2-D graph layout engine, a 3-D game physics engine, a SVM solver and a radar signal processing engine. The experiments showed great savings in execution time and energy savings for a variety of quality bounds.Computer Science
Hybrid optimizer for expeditious modeling of virtual urban environments
Tese de mestrado. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 200
Low-resource learning in complex games
This project is concerned with learning to take decisions in complex domains, in games
in particular. Previous work assumes that massive data resources are available for
training, but aside from a few very popular games, this is generally not the case, and the
state of the art in such circumstances is to rely extensively on hand-crafted heuristics.
On the other hand, human players are able to quickly learn from only a handful of
examples, exploiting specific characteristics of the learning problem to accelerate their
learning process. Designing algorithms that function in a similar way is an open area
of research and has many applications in today’s complex decision problems.
One solution presented in this work is design learning algorithms that exploit the
inherent structure of the game. Specifically, we take into account how the action space
can be clustered into sets called types and exploit this characteristic to improve planning
at decision time. Action types can also be leveraged to extract high-level strategies
from a sparse corpus of human play, and this generates more realistic trajectories
during planning, further improving performance.
Another approach that proved successful is using an accurate model of the environment
to reduce the complexity of the learning problem. Similar to how human players
have an internal model of the world that allows them to focus on the relevant parts of
the problem, we decouple learning to win from learning the rules of the game, thereby
making supervised learning more data efficient.
Finally, in order to handle partial observability that is usually encountered in complex
games, we propose an extension to Monte Carlo Tree Search that plans in the
Belief Markov Decision Process. We found that this algorithm doesn’t outperform
the state of the art models on our chosen domain. Our error analysis indicates that the
method struggles to handle the high uncertainty of the conditions required for the game
to end. Furthermore, our relaxed belief model can cause rollouts in the belief space to
be inaccurate, especially in complex games.
We assess the proposed methods in an agent playing the highly complex board
game Settlers of Catan. Building on previous research, our strongest agent combines
planning at decision time with prior knowledge extracted from an available corpus of
general human play; but unlike this prior work, our human corpus consists of only
60 games, as opposed to many thousands. Our agent defeats the current state of the
art agent by a large margin, showing that the proposed modifications aid in exploiting
general human play in highly complex games
Optimizing quantum circuit layouts
Un dels problemes amb els quals s'enfronta la computació quàntica és el de l'optimització de la compilació d'un circuit quàntic. El procés de compilació inclou bàsicament dues etapes: síntesi del circuit a executar en termes de les portes quàntiques suportades pel processador, i adaptació del circuit a executar a les limitacions de connectivitat imposades pel processador. En aquest treball, he abordat el segon d'aquests problemes, conegut amb el nom de Quantum Circuit Layout (QCL). Per a la seva resolució, he intentat usar tècniques de Reinforcement Learning (RL), que requereixen modelitzar prèviament el problema en termes d'un Markov Decision Process (MDP). En concret, descric dos MDP's finits la solució dels quals proporciona una solució a una part del problema del QCL. El problema principal és dissenyar un mètode que permeti efectivament resoldre aquests MDP's, ni que sigui de manera aproximada. En el treball es discuteixen dues aproximacions al problema. La primera d'elles utilitza una variant de l'algoritme usat per AlphaZero, dissenyat amb l'objectiu d'entrenar a una màquina per tal que aprengui a jugar als jocs d'Escacs, Shogi i Go. La segona utilitza una aproximació més estàndard coneguda com a Deep Q-Learning (DQL).One of the challenges in quantum computing is the problem of optimizing quantum circuit compilation. The compilation process involves two main stages: synthesizing the circuit to be executed in terms of the quantum gates supported by the processor, and adapting the circuit to the connectivity limitations imposed by the processor. In this work, I have addressed the second of these problems, known as Quantum Circuit Layout (QCL). To tackle this problem, I have attempted to use Reiforcement Learning (RL) techniques, which require modeling the problem as a Markov Decision Process (MDP). Specifically, I describe two finite MDPs whose solution provides a solution to a part of the QCL problem. The main problem is to design a method that effectively solves these MDPs, even if it is only an approximate solution. In the thesis two approaches to the problem are discussed. The first one uses a variant of the algorithm used in AlphaZero, designed to train a machine to learn how to play Chess, Shogi, and Go. The second approach uses a more standard approximation known as Deep Q-Learning (DQL)
- …