76 research outputs found

    Quantum bayesian reinforcement learning

    Get PDF
    Dissertação de mestrado em Engineering PhysicsReinforcement learning has had many recent achievements and is becoming increasingly more relevant in the scientific community. As such, this work uses quantum computing to find potential advantages over classical reinforcement learning algorithms, using Bayesian networks to model the considered decision making environments. For this purpose, this work makes use of quantum rejection sampling, a quantum approximate inference algorithm for Bayesian networks proposed by Low et al. [2014] with a quadratic speedup over its classical counterpart for sparse networks. It is shown that this algorithm can only provide quantum speedups for partially observable environments, and a quantum-classical hybrid lookahead al gorithm is presented to solve these kinds of problems. Moreover, this work also includes both sample and computational complexity analysis of both this quantum lookahead algorithm and its classical alternative. While the sample complexity is shown to be identical for both algorithms, the quantum approach provides up to a quadratic speedup in computational complexity. Finally, the potential advantages of this new algo rithm are experimentally tested in different small experiments. The results show that this speedup can be leveraged either to improve the rational decision-making skills of agents or to reduce their decision-making time due to the reduction in computational complexity.A aprendizagem por reforço tem recentemente alcançado muito sucesso e a tornar-se cada vez mais relevante na comunidade científica. Este trabalho tira proveito da computação quântica para encontrar potenciais vantagens do seu uso comparativamente a algoritmos clássicos de aprendizagem de reforço. Nesta procura por vantagens, são utilizadas redes Bayesianas para modelar os ambientes de tomada de decisão considerados. Para este propósito, é utilizado o algoritmo de quantum rejection sampling, um algoritmo para inferência aproximada em redes Bayesianas proposto por Low et al. [2014] com um speedup quadrático comparativamente ao equivalente clássico para redes esparsas. É mostrado que este algoritmo quântico de inferência apenas tem vantagem na sua aplicação a ambientes parcialmente ob serváveis, e é apresentado um algoritmo híbrido clássico-quântico de lookahead para resolver este tipo de problemas. Para além disto, é também incluída uma análise da complexidade de amostragem e complex idade computacional de ambos os algoritmos. Enquanto a complexidade de amostragem é idêntica para as duas abordagens, o algoritmo quântico apresenta um speedup na complexidade computacional que é quadrático no melhor dos casos. Por fim, as potenciais vantagens deste novo algoritmo são testadas em experiências de pequena dimensão. Os resultados mostram que este speedup pode ser utilizado tanto para melhorar a capacidade de tomada de decisão de agentes como para diminuir o tempo de tomada de decisão dos mesmos devido à redução da complexidade computacional

    Deep learning that scales: leveraging compute and data

    Get PDF
    Deep learning has revolutionized the field of artificial intelligence in the past decade. Although the development of these techniques spans over several years, the recent advent of deep learning is explained by an increased availability of data and compute that have unlocked the potential of deep neural networks. They have become ubiquitous in domains such as natural language processing, computer vision, speech processing, and control, where enough training data is available. Recent years have seen continuous progress driven by ever-growing neural networks that benefited from large amounts of data and computing power. This thesis is motivated by the observation that scale is one of the key factors driving progress in deep learning research, and aims at devising deep learning methods that scale gracefully with the available data and compute. We narrow down this scope into two main research directions. The first of them is concerned with designing hardware-aware methods which can make the most of the computing resources in current high performance computing facilities. We then study bottlenecks preventing existing methods from scaling up as more data becomes available, providing solutions that contribute towards enabling training of more complex models. This dissertation studies the aforementioned research questions for two different learning paradigms, each with its own algorithmic and computational characteristics. The first part of this thesis studies the paradigm where the model needs to learn from a collection of examples, extracting as much information as possible from the given data. The second part is concerned with training agents that learn by interacting with a simulated environment, which introduces unique challenges such as efficient exploration and simulation

    Reinforcement learning from internal, partially correct, and multiple demonstrations

    Get PDF
    Typically, a reinforcement learning agent interacts with the environment and learns how to select an action to gain cumulative reward in one trajectory of a task. However, classic reinforcement learning emphasises knowledge free learning processes. The agent only learns from state-action-reward-next state samples. The learning process has the problem of sample inefficiency and needs a huge number of interactions to converge upon an optimal policy. One of the solutions to deal with this challenge is to employ human behaviour records in the same task as demonstrations for the agent to speed up the learning process. Demonstrations are not, however, from the optimal policy and may be in conflict in many states especially when demonstrations come from multiple resources. Meanwhile, the agent's behaviour in the learning process can be used as demonstration data. To address the research gaps mentioned above, three novel techniques, including; introspective reinforcement learning, two-level Q-learning, and the radius restrained weighted vote, are proposed in this thesis. Introspective reinforcement learning uses a priority queue as a filter to select qualified agent behaviours during the learning process as demonstrations. It applies reward shaping to give the agent an extra reward when it performs similar behaviours as demonstrations in the filter. The two-level-Q-learning deals with the issue of conflicting demonstrations. Two Q-tables (or Q-net in function approximation) for storing state-expert value and state-action value are proposed respectively. The two-level-Q-learning allows the agent not only to learn a strategy from selected actions but also to learn to distribute credits to experts through trial and error. The Radius restrained weighted vote can derive a guidance policy from demonstrations which satisfy a restriction through a hyper-parameter radius. The Radius restrained weighted vote applied the Gaussian distances between the current state and demonstrations as weights of the votes. Softmax was applied to the total number of weighted votes from all candidate demonstrations to derive the guidance policy

    New Frameworks for Structured Policy Learning

    Get PDF
    Sequential decision making applications are playing an increasingly important role in everyday life. Research interest in machine learning approaches to sequential decision making has surged thanks to recent empirical successes of reinforcement learning and imitation learning techniques, partly fueled by recent advances in deep learning-based function approximation. However in many real-world sequential decision making applications, relying purely on black box policy learning is often insufficient, due to practical requirements of data efficiency, interpretability, safety guarantees, etc. These challenges collectively make it difficult for many existing policy learning methods to find success in realistic applications. In this dissertation, we present recent advances in structured policy learning, which are new machine learning frameworks that integrate policy learning with principled notions of domain knowledge, which spans value-based, policy-based, and model-based structures. Our framework takes flexible reduction-style approaches that can integrate structure with reinforcement learning, imitation learning and robust control techniques. In addition to methodological advances, we demonstrate several successful applications of the new policy learning frameworks.</p

    Pessimistic Bayesianism for conservative optimization and imitation

    Get PDF
    Subject to several assumptions, sufficiently advanced reinforcement learners would likely face an incentive and likely have an ability to intervene in the provision of their reward, with catastrophic consequences. In this thesis, I develop a theory of pessimism and show how it can produce safe advanced artificial agents. Not only do I demonstrate that the assumptions mentioned above can be avoided; I prove theorems which demonstrate safety. First, I develop an idealized pessimistic reinforcement learner. For any given novel event that a mentor would never cause, a sufficiently pessimistic reinforcement learner trained with the help of that mentor would probably avoid causing it. This result is without precedent in the literature. Next, on similar principles, I develop an idealized pessimistic imitation learner. If the probability of an event when the demonstrator acts can be bounded above, then the probability can be bounded above when the imitator acts instead; this kind of result is unprecedented when the imitator learns online and the environment never resets. In an environment that never resets, no one has previously demonstrated, to my knowledge, that an imitation learner even exists. Finally, both of the agents above demand more efficient algorithms for high-quality uncertainty quantification, so I have developed a new kernel for Gaussian process modelling that allows for log-linear time complexity and linear space complexity, instead of a naïve cubic time complexity and quadratic space complexity. This is not the first Gaussian process with this time complexity—inducing points methods have linear complexity—but we do outperform such methods significantly on regression benchmarks, as one might expect given the much higher dimensionality of our kernel. This thesis shows the viability of pessimism with respect to well-quantified epistemic uncertainty as a path to safe artificial agency

    Machine learning in the real world with multiple objectives

    Full text link
    Machine learning (ML) is ubiquitous in many real-world applications. Existing ML systems are based on optimizing a single quality metric such as prediction accuracy. These metrics typically do not fully align with real-world design constraints such as computation, latency, fairness, and acquisition costs that we encounter in real-world applications. In this thesis, we develop ML methods for optimizing prediction accuracy while accounting for such real-world constraints. In particular, we introduce multi-objective learning in two different setups: resource-efficient prediction and algorithmic fairness in language models. First, we focus on decreasing the test-time computational costs of prediction systems. Budget constraints arise in many machine learning problems. Computational costs limit the usage of many models on small devices such as IoT or mobile phones and increase the energy consumption in cloud computing. We design systems that allow on-the-fly modification of the prediction model for each input sample. These sample-adaptive systems allow us to leverage wide variability in sample complexity where we learn policies for selecting cheap models for low complexity instances and using descriptive models only for complex ones. We utilize multiple--objective approach where one minimizes the system cost while preserving predictive accuracy. We demonstrate significant speed-ups in the fields of computer vision, structured prediction, natural language processing, and deep learning. In the context of fairness, we first demonstrate that a naive application of ML methods runs the risk of amplifying social biases present in data. This danger is particularly acute for methods based on word embeddings, which are increasingly gaining importance in many natural language processing applications of ML. We show that word embeddings trained on Google News articles exhibit female/male gender stereotypes. We demonstrate that geometrically, gender bias is captured by unique directions in the word embedding vector space. To remove bias we formulate a empirical risk objective with fairness constraints to remove stereotypes from embeddings while maintaining desired associations. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduces gender bias in embeddings, while preserving its useful properties such as the ability to cluster related concepts

    Learning in the Real World: Constraints on Cost, Space, and Privacy

    Get PDF
    The sheer demand for machine learning in fields as varied as: healthcare, web-search ranking, factory automation, collision prediction, spam filtering, and many others, frequently outpaces the intended use-case of machine learning models. In fact, a growing number of companies hire machine learning researchers to rectify this very problem: to tailor and/or design new state-of-the-art models to the setting at hand. However, we can generalize a large set of the machine learning problems encountered in practical settings into three categories: cost, space, and privacy. The first category (cost) considers problems that need to balance the accuracy of a machine learning model with the cost required to evaluate it. These include problems in web-search, where results need to be delivered to a user in under a second and be as accurate as possible. The second category (space) collects problems that require running machine learning algorithms on low-memory computing devices. For instance, in search-and-rescue operations we may opt to use many small unmanned aerial vehicles (UAVs) equipped with machine learning algorithms for object detection to find a desired search target. These algorithms should be small to fit within the physical memory limits of the UAV (and be energy efficient) while reliably detecting objects. The third category (privacy) considers problems where one wishes to run machine learning algorithms on sensitive data. It has been shown that seemingly innocuous analyses on such data can be exploited to reveal data individuals would prefer to keep private. Thus, nearly any algorithm that runs on patient or economic data falls under this set of problems. We devise solutions for each of these problem categories including (i) a fast tree-based model for explicitly trading off accuracy and model evaluation time, (ii) a compression method for the k-nearest neighbor classifier, and (iii) a private causal inference algorithm that protects sensitive data
    • …
    corecore