225 research outputs found

    my Human Brain Project (mHBP)

    Get PDF
    How can we make an agent that thinks like us humans? An agent that can have proprioception, intrinsic motivation, identify deception, use small amounts of energy, transfer knowledge between tasks and evolve? This is the problem that this thesis is focusing on. Being able to create a piece of software that can perform tasks like a human being, is a goal that, if achieved, will allow us to extend our own capabilities to a very high level, and have more tasks performed in a predictable fashion. This is one of the motivations for this thesis. To address this problem, we have proposed a modular architecture for Reinforcement Learning computation and developed an implementation to have this architecture exercised. This software, that we call mHBP, is created in Python using Webots as an environment for the agent, and Neo4J, a graph database, as memory. mHBP takes the sensory data or other inputs, and produces, based on the body parts / tools that the agent has available, an output consisting of actions to perform. This thesis involves experimental design with several iterations, exploring a theoretical approach to RL based on graph databases. We conclude, with our work in this thesis, that it is possible to represent episodic data in a graph, and is also possible to interconnect Webots, Python and Neo4J to support a stable architecture for Reinforcement Learning. In this work we also find a way to search for policies using the Neo4J querying language: Cypher. Another key conclusion of this work is that state representation needs to have further research to find a state definition that enables policy search to produce more useful policies. The article “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” at Research Gate with doi 10.13140/RG.2.2.30323.76327 is an outcome of this thesis.Como podemos criar um agente que pense como nós humanos? Um agente que tenha propriocepção, motivação intrínseca, seja capaz de identificar ilusão, usar pequenas quantidades de energia, transferir conhecimento entre tarefas e evoluir? Este é o problema em que se foca esta tese. Ser capaz de criar uma peça de software que desempenhe tarefas como um ser humano é um objectivo que, se conseguido, nos permitirá estender as nossas capacidades a um nível muito alto, e conseguir realizar mais tarefas de uma forma previsível. Esta é uma das motivações desta tese. Para endereçar este problema, propomos uma arquitectura modular para computação de aprendizagem por reforço e desenvolvemos uma implementação para exercitar esta arquitetura. Este software, ao qual chamamos mHBP, foi criado em Python usando o Webots como um ambiente para o agente, e o Neo4J, uma base de dados de grafos, como memória. O mHBP recebe dados sensoriais ou outros inputs, e produz, baseado nas partes do corpo / ferramentas que o agente tem disponíveis, um output que consiste em ações a desempenhar. Uma boa parte desta tese envolve desenho experimental com diversas iterações, explorando uma abordagem teórica assente em bases de dados de grafos. Concluímos, com o trabalho nesta tese, que é possível representar episódios em um grafo, e que é, também, possível interligar o Webots, com o Python e o Neo4J para suportar uma arquitetura estável para a aprendizagem por reforço. Neste trabalho, também, encontramos uma forma de procurar políticas usando a linguagem de pesquisa do Neo4J: Cypher. Outra conclusão chave deste trabalho é que a representação de estados necessita de mais investigação para encontrar uma definição de estado que permita à pesquisa de políticas produzir políticas que sejam mais úteis. O artigo “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” no Research Gate com o doi 10.13140/RG.2.2.30323.76327 é um sub-produto desta tese

    A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

    Get PDF
    The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research

    Equity on the Extended Continental Shelf? How an Obscure Provision in UNCLOS Provides New Challenges for Ocean Governance

    Get PDF
    One of the major novelties of the United Nations Convention on the Law of the Sea, 1982 is the legitimizing of coastal State claims to large areas of continental margins in all oceans by virtue of Article 76. In addition to exclusive economic zones (EEZs) of 200 nautical miles, coastal States whose continental margins extend beyond the EEZ limit are able to further claim the seabed and subsoil beyond the EEZ limit to 350 nautical miles from the base lines of the territorial sea or 100 nautical miles from the 2,500 metre isobath. The UN Convention established a procedure for this purpose, commencing with scientific and technical submissions to the Commission on the Limits of the Continental Shelf established in the treaty. To date, the Commission has received 65 submissions and a further 45 communications containing preliminary information

    Robot Learning From Randomized Simulations: A Review

    Get PDF
    The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. Unfortunately, it is prohibitively expensive to generate such data sets on a physical platform. Therefore, state-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive and subsequently transfer the knowledge to the real robot (sim-to-real). Despite becoming increasingly realistic, all simulators are by construction based on models, hence inevitably imperfect. This raises the question of how simulators can be modified to facilitate learning robot control policies and overcome the mismatch between simulation and reality, often called the “reality gap.” We provide a comprehensive review of sim-to-real research for robotics, focusing on a technique named “domain randomization” which is a method for learning from randomized simulations

    Exploring the Boundaries of Patent Commercialization Models via Litigation

    Get PDF
    This thesis explores direct patent commercialization via patent assertion, particularly patent infringement litigation, a complex nonmarket activity whose successful undertaking requires knowledge, creativity, and financial resources, as well as a colorable infringement case. Despite these complexities, firms have increasingly employed patents as competitive tools via patent assertions, particularly in the United States. This thesis explores the business models that have been created to facilitate the direct monetization of patents. Since secrecy underpins the patent assertion strategies studied, the thesis is based on rich and enhanced secondary data. In particular, a data chaining technique has been developed to assemble relevant but disparate data into a larger coherent data set that is amenable to combination and pairing with other forms of relevant public data. This research has discovered that one particularly successful business model that employs a leveraging strategy, known as the non-practicing entity (“NPE”), has itself spawned at least two other business models, the highly capitalized “patent mass aggregator” and the “patent privateer.” The patent privateer, newly discovered in this research, is particularly interesting because it provides a way for firms to employ patents to attack competitors by forming specialized NPEs in a manner that essentially expands the boundaries of the firm. This research has also examined plaintiff firm management processes during litigations brought under leveraging and proprietary strategies, the two patent litigation strategies in which firms affirmatively initiate infringement litigations. In particular, this research investigates the commercial contexts that drive patent assertion strategies to explore the effective limits of the patent right in a litigation context. The investigation concludes that a variety of robust business models and management processes may be quite successful in extracting value from patents in the US
    corecore