Search CORE

2,985 research outputs found

Recommended from our members

Towards Informed Exploration for Deep Reinforcement Learning

Author: Tang Haoran
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact

eScholarship - University of California

Planning through Automatic Portfolio Configuration: The PbP Approach

Author: Gerevini Alfonso Emilio
Saetti Alessandro
Vallati Mauro
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2014
Field of study

In the field of domain-independent planning, several powerful planners implementing different techniques have been developed. However, no one of these systems outperforms all others in every known benchmark domain. In this work, we propose a multi-planner approach that automatically configures a portfolio of planning techniques for each given domain. The configuration process for a given domain uses a set of training instances to: (i) compute and analyze some alternative sets of macro-actions for each planner in the portfolio identifying a (possibly empty) useful set, (ii) select a cluster of planners, each one with the identified useful set of macro-actions, that is expected to perform best, and (iii) derive some additional information for configuring the execution scheduling of the selected planners at planning time. The resulting planning system, called PbP (Portfolio- based Planner), has two variants focusing on speed and plan quality. Different versions of PbP entered and won the learning track of the sixth and seventh International Planning Competitions. In this paper, we experimentally analyze PbP considering planning speed and plan quality in depth. We provide a collection of results that help to understand PbP�s behavior, and demonstrate the effectiveness of our approach to configuring a portfolio of planners with macro-actions

Crossref

Archivio istituzionale della ricerca - Università di Brescia

University of Huddersfield Repository

Huddersfield Research Portal

On the Online Generation of Effective Macro-operators

Author: Chrpa Lukáš
McCluskey T.L.
Vallati Mauro
Publication venue: AAAI Press
Publication date: 01/07/2015
Field of study

Macro-operator (“macro”, for short) generation is a well-known technique that is used to speed-up the planning process. Most published work on using macros in automated planning relies on an offline learning phase where training plans, that is, solutions of simple problems, are used to generate the macros. However, there might not always be a place to accommodate training. In this paper we propose OMA, an efficient method for generating useful macros without an offline learning phase, by utilising lessons learnt from existing macro learning techniques. Empirical evaluation with IPC benchmarks demonstrates performance improvement in a range of state-of-the-art planning engines, and provides insights into what macros can be generated without training

University of Huddersfield Repository

Huddersfield Research Portal

Water Window Ptychographic Imaging with Characterized Coherent X-rays

Author: Dzhigaev Dmitry
Gorniak Thomas
Gorobtsov Oleg
Rose Max
Rosenhahn Axel
Senkbeil Tobias
Shabalin Anatoly
Skopintsev Petr
Vartanyants Ivan
Viefhaus Jens
von Gundlach Andreas
Publication venue: 'International Union of Crystallography (IUCr)'
Publication date: 01/01/2015
Field of study

We report on a ptychographical coherent diffractive imaging experiment in the water window with focused soft X-rays at

500~\mathrm{eV}

. An X-ray beam with high degree of coherence was selected for ptychography at the P04 beamline of the PETRA III synchrotron radiation source. We measured the beam coherence with the newly developed non-redundant array method. A pinhole

2.6~\mathrm{\mu m}

in size selected the coherent part of the beam and was used for ptychographic measurements of a lithographically manufactured test sample and fossil diatom. The achieved resolution was

53~\mathrm{nm}

for the test sample and only limited by the size of the detector. The diatom was imaged at a resolution better than

90~\mathrm{nm}

.Comment: 22 pages. 7 figure

arXiv.org e-Print Archive

DESY Publication Database

PubMed Central

DESY

Learning Useful Macro-actions for Planning with N-Grams

Author: Dulac Adrien
Fiorino Humbert
Janiszek David
Pellier Damien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/11/2013
Field of study

International audienceAutomated planning has achieved significant breakthroughs in recent years. Nonetheless, attempts to improve search algorithm efficiency remain the primary focus of most research. However, it is also possible to build on previous searches and learn from previously found solutions. Our approach consists in learning macro-actions and adding them into the planner's domain. A macro-action is an action sequence selected for application at search time and applied as a single indivisible action. Carefully chosen macros can drastically improve the planning performances by reducing the search space depth. However, macros also increase the branching factor. Therefore, the use of macros entails a utility problem: a trade-off has to be addressed between the benefit of adding macros to speed up the goal search and the overhead caused by increasing the branching factor in the search space. In this paper, we propose an online domain and planner-independent approach to learn 'useful' macros, i.e. macros that address the utility problem. These useful macros are obtained by statistical and heuristic filtering of a domain specific macro library. The library is created from the most frequent action sequences derived from an n-gram analysis on successful plans previously computed by the planner. The relevance of this approach is proven by experiments on International Planning Competition domains

Crossref

Hal - Université Grenoble Alpes

HAL Descartes

Configuration and Learning Techniques for Efficient Automated Planning System

Author: Vallati Mauro
Publication venue
Publication date
Field of study

University of Huddersfield Repository

Remote information management of an automated manufacturing system

Author: Pretorius Linda
Publication venue: Bloemfontein : Central University of Technology, Free State
Publication date: 01/01/2007
Field of study

Thesis (M. Tech.) -- Central University of Technology, Free State, 2007With technology advancing, more and more people turn to the World Wide Web to conduct business. This may include buying and selling on the Web, advertising and monitoring of business activities. There is a big need for software and systems that enable remote monitoring and controlling of business activities. The Mechatronics Research Group of the Faculty of Engineering, Information and Communication Technology at the Central University of Technology, Free State, has identified a similar need. This research group has created an Automated Manufacturing System around which research topics revolve. They want to monitor this Automated Manufacturing System from remote locations like their offices or, if possible, from home. The Remote Information Management (RIM) System was developed, using the Rapid Application Development (RAD) Methodology. The reasons why this methodology was used, is because it is the best to use in a changing environment, when the system needs to be developed very quickly and when most of the data is already available. This is a good description of the Automated Manufacturing System’s environment. The RAD methodology consists of four stages: Requirements Planning, User Design, Rapid Construction and Transition. Project Management is used throughout these stages to ensure that the project goes according to plan. Development of the RIM system went through all four stages and project management was applied. The final system consisted of a Web Page with Web Camera views of the Automated Manufacturing System. The application that was developed using National Instruments LabVIEW, Microsoft Visual C++, and Microsoft Excel, is embedded in this Web Page. This application is called a Virtual Instrument (VI). The VI shows real-time data from the Automated Manufacturing System. Control over the VI can be granted and will allow the remote user to create reports on how many different products was produced and system downtimes. A system like the RIM System has advantages in the business world. It can enable telecommuting and will allow employees and managers to monitor (and even control) manufacturing systems, or any system connected to a PLC, from remote locations

Central University Of Technology Free State - LibraryCUT, South Africa