271 research outputs found

    On a problem by Beidar concerning the central closure

    Get PDF
    AbstractWe give an example of a prime ring with zero center such that its central closure is a simple ring with an identity element. It solves a problem posed by Beidar

    A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

    Full text link
    Reward engineering is an important aspect of reinforcement learning. Whether or not the user's intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often require parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in reinforcement learning. TL formula can be translated to a real-valued function that measures its level of satisfaction against a trajectory. We take advantage of this function and propose temporal logic policy search (TLPS), a model-free learning technique that finds a policy that satisfies the TL specification. A set of simulated experiments are conducted to evaluate the proposed approach

    A WIDE DISTRIBUTION OF A NEW VRN-B1c ALLELE OF WHEAT TRITICUM AESTIVUM L. IN RUSSIA, UKRAINE AND ADJACENT REGIONS: A LINK WITH THE HEADING TIME AND ADAPTIVE POTENTIAL

    Get PDF
    The adaptation of common wheat (T. aestivum L.) to diverse environmental conditions is greatly under the control of genes involved in determination of vernalization response (Vrn-1 genes). It was found that the variation in common wheat heading time is affected not only by combination of Vrn-1 homoeoalleles but also by multiple alleles at a separate Vrn-1 locus. Previously, we described the Vrn-B1c allele from T.aestivum cv. 'Saratovskaya 29' and found significant differences in the structure of the first (1st) intron of this allele when compared to another highly abundant Vrn-B1a allele, specifically, the deletion of 0.8 kb coupled with the duplication of 0.4 kb. We suggested that the changes in the intron 1 of Vrn-B1c allele caused earlier ear emergence in the near-isogenic line and cultivars, carrying this allele. In this study we investigate the distribution of the Vrn-B1c allele in a wide set of spring wheat cultivars from Russia, Ukraine and adjacent regions. The analysis revealed that 40% of Russian and 53% of Ukranian spring wheat cultivars contain the Vrn-B1c allele. The high distribution of the Vrn-B1c allele can be explained by a frequent using of 'Saratovskaya 29' in the breeding process inside the studied area. From the other hand, the predominance of the Vrn-B1c allele among cultivars cultivated in West Siberia and Kazakhstan may be due to the selective advantage of this allele for the region where there is a high risk of early fall frosts

    Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets

    Full text link
    Imitation learning has traditionally been applied to learn a single task from demonstrations thereof. The requirement of structured and isolated demonstrations limits the scalability of imitation learning approaches as they are difficult to apply to real-world scenarios, where robots have to be able to execute a multitude of tasks. In this paper, we propose a multi-modal imitation learning framework that is able to segment and imitate skills from unlabelled and unstructured demonstrations by learning skill segmentation and imitation learning jointly. The extensive simulation results indicate that our method can efficiently separate the demonstrations into individual skills and learn to imitate them using a single multi-modal policy. The video of our experiments is available at http://sites.google.com/view/nips17intentionganComment: Paper accepted to NIPS 201

    Ітеративність як тип предикатної множинності (на матеріалі Гомерової «Іліади») (Iterativity as a type of predicate plurality (in the books of Homer’s «Iliad»)

    Get PDF
    У статті здійснено структурний аналіз категорії ітеративності, як однієї з різновидів функціонально-семантичного поля предикатної множинності. Виявлено засоби вираження повторювальної семантики на лексичному, граматичному та синтаксичному рівнях, проаналізовано частотність їх вживання, досліджено особливості відтворення ітеративного значення. (In this research the structural analysis of the iterativity’s category as one of the part of functional-semantic field of predicate plurality is realized. The main components of predicate plurality are: iterativity, distributivity and multiplicativity. The article shows some ways of expressing iterative semantic at the different linguistic levels: lexical, grammatical, syntactical, also, there is analyzed the frequency of using these ways. The most extended are adverbial modifiers of cyclicity, interval, usitativity which are representatives of the lexical level. On the grammatical level, there are some time-forms of the verbs, especially, in the past and suffix. The subordinate clauses of time and condition and constructions with infinitive express the iterative meaning on the syntactical level. In this article are investigated the peculiarities of expressing of the iterative meaning in Ancient Greek.

    Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search

    Full text link
    In principle, reinforcement learning and policy search methods can enable robots to learn highly complex and general skills that may allow them to function amid the complexity and diversity of the real world. However, training a policy that generalizes well across a wide range of real-world conditions requires far greater quantity and diversity of experience than is practical to collect with a single robot. Fortunately, it is possible for multiple robots to share their experience with one another, and thereby, learn a policy collectively. In this work, we explore distributed and asynchronous policy learning as a means to achieve generalization and improved training times on challenging, real-world manipulation tasks. We propose a distributed and asynchronous version of Guided Policy Search and use it to demonstrate collective policy learning on a vision-based door opening task using four robots. We show that it achieves better generalization, utilization, and training times than the single robot alternative.Comment: Submitted to the IEEE International Conference on Robotics and Automation 201

    Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning

    Full text link
    Reinforcement learning (RL) algorithms for real-world robotic applications need a data-efficient learning process and the ability to handle complex, unknown dynamical systems. These requirements are handled well by model-based and model-free RL approaches, respectively. In this work, we aim to combine the advantages of these two types of methods in a principled manner. By focusing on time-varying linear-Gaussian policies, we enable a model-based algorithm based on the linear quadratic regulator (LQR) that can be integrated into the model-free framework of path integral policy improvement (PI2). We can further combine our method with guided policy search (GPS) to train arbitrary parameterized policies such as deep neural networks. Our simulation and real-world experiments demonstrate that this method can solve challenging manipulation tasks with comparable or better performance than model-free methods while maintaining the sample efficiency of model-based methods. A video presenting our results is available at https://sites.google.com/site/icml17pilqrComment: Paper accepted to the International Conference on Machine Learning (ICML) 201

    Learning Latent Space Dynamics for Tactile Servoing

    Full text link
    To achieve a dexterous robotic manipulation, we need to endow our robot with tactile feedback capability, i.e. the ability to drive action based on tactile sensing. In this paper, we specifically address the challenge of tactile servoing, i.e. given the current tactile sensing and a target/goal tactile sensing --memorized from a successful task execution in the past-- what is the action that will bring the current tactile sensing to move closer towards the target tactile sensing at the next time step. We develop a data-driven approach to acquire a dynamics model for tactile servoing by learning from demonstration. Moreover, our method represents the tactile sensing information as to lie on a surface --or a 2D manifold-- and perform a manifold learning, making it applicable to any tactile skin geometry. We evaluate our method on a contact point tracking task using a robot equipped with a tactile finger. A video demonstrating our approach can be seen in https://youtu.be/0QK0-Vx7WkIComment: Accepted to be published at the International Conference on Robotics and Automation (ICRA) 2019. The final version for publication at ICRA 2019 is 7 pages (i.e. 6 pages of technical content (including text, figures, tables, acknowledgement, etc.) and 1 page of the Bibliography/References), while this arXiv version is 8 pages (added Appendix and some extra details
    corecore