273 research outputs found
On a problem by Beidar concerning the central closure
AbstractWe give an example of a prime ring with zero center such that its central closure is a simple ring with an identity element. It solves a problem posed by Beidar
A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks
Reward engineering is an important aspect of reinforcement learning. Whether
or not the user's intentions can be correctly encapsulated in the reward
function can significantly impact the learning outcome. Current methods rely on
manually crafted reward functions that often require parameter tuning to obtain
the desired behavior. This operation can be expensive when exploration requires
systems to interact with the physical world. In this paper, we explore the use
of temporal logic (TL) to specify tasks in reinforcement learning. TL formula
can be translated to a real-valued function that measures its level of
satisfaction against a trajectory. We take advantage of this function and
propose temporal logic policy search (TLPS), a model-free learning technique
that finds a policy that satisfies the TL specification. A set of simulated
experiments are conducted to evaluate the proposed approach
A WIDE DISTRIBUTION OF A NEW VRN-B1c ALLELE OF WHEAT TRITICUM AESTIVUM L. IN RUSSIA, UKRAINE AND ADJACENT REGIONS: A LINK WITH THE HEADING TIME AND ADAPTIVE POTENTIAL
The adaptation of common wheat (T. aestivum L.) to diverse environmental conditions is greatly under the control of genes involved in determination of vernalization response (Vrn-1 genes). It was found that the variation in common wheat heading time is affected not only by combination of Vrn-1 homoeoalleles but also by multiple alleles at a separate Vrn-1 locus. Previously, we described the Vrn-B1c allele from T.aestivum cv. 'Saratovskaya 29' and found significant differences in the structure of the first (1st) intron of this allele when compared to another highly abundant Vrn-B1a allele, specifically, the deletion of 0.8 kb coupled with the duplication of 0.4 kb. We suggested that the changes in the intron 1 of Vrn-B1c allele caused earlier ear emergence in the near-isogenic line and cultivars, carrying this allele. In this study we investigate the distribution of the Vrn-B1c allele in a wide set of spring wheat cultivars from Russia, Ukraine and adjacent regions. The analysis revealed that 40% of Russian and 53% of Ukranian spring wheat cultivars contain the Vrn-B1c allele. The high distribution of the Vrn-B1c allele can be explained by a frequent using of 'Saratovskaya 29' in the breeding process inside the studied area. From the other hand, the predominance of the Vrn-B1c allele among cultivars cultivated in West Siberia and Kazakhstan may be due to the selective advantage of this allele for the region where there is a high risk of early fall frosts
Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
Imitation learning has traditionally been applied to learn a single task from
demonstrations thereof. The requirement of structured and isolated
demonstrations limits the scalability of imitation learning approaches as they
are difficult to apply to real-world scenarios, where robots have to be able to
execute a multitude of tasks. In this paper, we propose a multi-modal imitation
learning framework that is able to segment and imitate skills from unlabelled
and unstructured demonstrations by learning skill segmentation and imitation
learning jointly. The extensive simulation results indicate that our method can
efficiently separate the demonstrations into individual skills and learn to
imitate them using a single multi-modal policy. The video of our experiments is
available at http://sites.google.com/view/nips17intentionganComment: Paper accepted to NIPS 201
Ітеративність як тип предикатної множинності (на матеріалі Гомерової «Іліади») (Iterativity as a type of predicate plurality (in the books of Homer’s «Iliad»)
У статті здійснено структурний аналіз категорії ітеративності, як однієї з різновидів функціонально-семантичного поля предикатної множинності. Виявлено засоби вираження повторювальної семантики на лексичному, граматичному та синтаксичному рівнях, проаналізовано частотність їх вживання, досліджено особливості відтворення ітеративного значення.
(In this research the structural analysis of the iterativity’s category as one of the part of functional-semantic field of predicate plurality is realized. The main components of predicate plurality are: iterativity, distributivity and multiplicativity. The article shows some ways of expressing iterative semantic at the different linguistic levels: lexical, grammatical, syntactical,
also, there is analyzed the frequency of using these ways. The most extended are adverbial modifiers of cyclicity, interval, usitativity which are representatives of the lexical level. On the grammatical level, there are some time-forms of the verbs, especially, in the past and suffix. The subordinate clauses of time and condition and constructions with infinitive express the iterative meaning on the syntactical level. In this article are investigated the peculiarities of expressing of the iterative meaning
in Ancient Greek.
Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
In principle, reinforcement learning and policy search methods can enable
robots to learn highly complex and general skills that may allow them to
function amid the complexity and diversity of the real world. However, training
a policy that generalizes well across a wide range of real-world conditions
requires far greater quantity and diversity of experience than is practical to
collect with a single robot. Fortunately, it is possible for multiple robots to
share their experience with one another, and thereby, learn a policy
collectively. In this work, we explore distributed and asynchronous policy
learning as a means to achieve generalization and improved training times on
challenging, real-world manipulation tasks. We propose a distributed and
asynchronous version of Guided Policy Search and use it to demonstrate
collective policy learning on a vision-based door opening task using four
robots. We show that it achieves better generalization, utilization, and
training times than the single robot alternative.Comment: Submitted to the IEEE International Conference on Robotics and
Automation 201
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
Reinforcement learning (RL) algorithms for real-world robotic applications
need a data-efficient learning process and the ability to handle complex,
unknown dynamical systems. These requirements are handled well by model-based
and model-free RL approaches, respectively. In this work, we aim to combine the
advantages of these two types of methods in a principled manner. By focusing on
time-varying linear-Gaussian policies, we enable a model-based algorithm based
on the linear quadratic regulator (LQR) that can be integrated into the
model-free framework of path integral policy improvement (PI2). We can further
combine our method with guided policy search (GPS) to train arbitrary
parameterized policies such as deep neural networks. Our simulation and
real-world experiments demonstrate that this method can solve challenging
manipulation tasks with comparable or better performance than model-free
methods while maintaining the sample efficiency of model-based methods. A video
presenting our results is available at
https://sites.google.com/site/icml17pilqrComment: Paper accepted to the International Conference on Machine Learning
(ICML) 201
Learning Latent Space Dynamics for Tactile Servoing
To achieve a dexterous robotic manipulation, we need to endow our robot with
tactile feedback capability, i.e. the ability to drive action based on tactile
sensing. In this paper, we specifically address the challenge of tactile
servoing, i.e. given the current tactile sensing and a target/goal tactile
sensing --memorized from a successful task execution in the past-- what is the
action that will bring the current tactile sensing to move closer towards the
target tactile sensing at the next time step. We develop a data-driven approach
to acquire a dynamics model for tactile servoing by learning from
demonstration. Moreover, our method represents the tactile sensing information
as to lie on a surface --or a 2D manifold-- and perform a manifold learning,
making it applicable to any tactile skin geometry. We evaluate our method on a
contact point tracking task using a robot equipped with a tactile finger. A
video demonstrating our approach can be seen in https://youtu.be/0QK0-Vx7WkIComment: Accepted to be published at the International Conference on Robotics
and Automation (ICRA) 2019. The final version for publication at ICRA 2019 is
7 pages (i.e. 6 pages of technical content (including text, figures, tables,
acknowledgement, etc.) and 1 page of the Bibliography/References), while this
arXiv version is 8 pages (added Appendix and some extra details
- …