Search CORE

93 research outputs found

Learning to play using low-complexity rule-based policies: Illustrations through Ms. Pac-Man

Author: Lorincz A
Szita I
Publication venue
Publication date: 01/01/2007
Field of study

In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either hand-crafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist

CiteSeerX

ELTE Digital Institutional Repository (EDIT)

Application of synthetic solid culture medium to improve the detection of antimicrobial drug residues in foodstuffs

Author: Bernáth S.
Erdősi O.
Hullár I.
Szili Zs.
Szita G.
Szita J.
Publication venue: 'Akademiai Kiado Zrt.'
Publication date: 01/01/2014
Field of study

A selective synthetic solid minimal medium (BS agar) was developed to detect antimicrobial drug-residues in foodstuffs using Bacillus subtilis indicator culture. This medium contains an ammonium salt as nitrogen source and either glucose or sodium pyruvate as carbon sources.Its selectivity is based on the fact that Bacillus subtilis is still able to grow if the minimal medium consists of simple inorganic substances as nitrogen sources, and glucose or pyruvate as carbon supply. Using these new synthetic media for microbiological assays assessing certain antimicrobials, the diameter of the inhibition zones were 1.4–4 times wider than on the Mueller-Hinton agar.The advantages of the BS agars are their standard compositions, the absence of inhibitors, the reproducible quality and the low costs

Crossref

Repository of the Academy's Library

A Human-Interactive Course of Action Planner for Aircraft Carrier Deck Operations

Author: Ratliff Nathan
Rochlin Gene I
Syed Umar
Szita István
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/03/2011
Field of study

Aircraft carrier deck operations present a complex and uncertain environment in which time-critical scheduling and planning must be done, and to date all course of action planning is done solely by human operators who rely on experience and training to safely negotiate off -nominal situations. A computer decision support system could provide the operator with both a vital resource in emergency scenarios as well as suggestions to improve e fficiency during normal operations. Such a decision support system would generate a schedule of coordinated deck operations for all active aircraft (taxi, refuel, take o ff, queue in Marshal stack, land, etc.) that is optimized for effi ciency, amenable to the operator, and robust to the many types of uncertainty inherent in the aircraft carrier deck environment. This paper describes the design, implementation, and testing of a human-interactive aircraft carrier deck course of action planner. The planning problem is cast in the MDP framework such that a wide range of current literature can be used to fi nd an optimal policy. It is designed such that human operators can specify priority aircraft and suggest scheduling orders. Inverse reinforcement learning techniques are applied that allow the planner to learn from recorded expert demonstrations. Results are presented that compare various types of human and learned policies, and show qualitative and quantitative matching between expert demonstrations and learned policies.United States. Office of Naval Research (Science of Autonomy Program

CiteSeerX

DSpace@MIT

Crossref

Recommended from our members

Electromagnetic Actuated stirring in Microbioreactors enabling easier multiplexing & flexible device design

Author: 4th Micro and Nano Flows Conference (MNF2014)
Davies MJ
Munro I
Szita N
Tan CKL
Tracey MC
Publication venue: Brunel University London
Publication date: 01/01/2014
Field of study

This paper was presented at the 4th Micro and Nano Flows Conference (MNF2014), which was held at University College, London, UK. The conference was organised by Brunel University and supported by the Italian Union of Thermofluiddynamics, IPEM, the Process Intensification Network, the Institution of Mechanical Engineers, the Heat Transfer Society, HEXAG - the Heat Exchange Action Group, and the Energy Institute, ASME Press, LCN London Centre for Nanotechnology, UCL University College London, UCL Engineering, the International NanoScience Community, www.nanopaprika.eu.The development of a novel electromagnetically (EM) actuated stirring method, for use in microbioreactors, is reported. Mixing in microbioreactors is critical to ensure even distribution of nutrients to microorganisms and cells. Magnetically driven stirrer bars or peristaltic mixing are the most commonly utilised mixing methods employed in completely liquid-filled microbioreactors. However the circular reactor shape required for mixing with a stirrer bar and frequently used for peristaltically mixed microbioreactors presents difficulties for bubble-free priming in a microfluidic bioreactor. Moreover the circular shape and the hardware required for both types of mixing reduces the potential packing density of multiplexed reactors. We present a new method of mixing, displaying design flexibility by demonstrating mixing in circular and diamond-shaped reactors and a duplex diamond reactor and fermentation of the gram-positive bacteria S. carnosus in a diamond-shaped microbioreactor system. The results of the optimisation of this mixing method for performing fermentations alongside both batch and continuous culture fermentations are presented

Brunel University Research Archive

A geographical study on Pseudaulacaspis pentagona and its parasitoids in Hungarian highway margins using pheromone traps and molecular markers

Author: Bayoumy M. H.
Fetyko K.
Konczné Benedicty Z.
Kozár F.
Szita É.
Tobias I.
Publication venue: 'National Documentation Centre (EKT)'
Publication date: 08/01/2011
Field of study

Πραγματοποιήθηκε μελέτη διασποράς του Pseudaulacaspis pentagona (Targioni Tozzetti) (Hemiptera: Diaspididae) και των παρασιτοειδών του σε 32 σημεία κατά μήκος των Ουγγρικών αυτοκινητοδρόμων (M0, M1, M3, M5, και M7) με τη χρήση φερομονικών παγίδων κατά τα έτη 2009 και 2010. Κατά τη μελέτη νέα σημεία συμπεριλήφθηκαν και σε συνδυασμό με προηγούμενα δεδομένα έγινε προσπάθεια κατανόησης της τάσης διασποράς του εντόμου. Ο αριθμός αρσενικών στους αυτοκινητοδρόμους ήταν χαμηλότερος σε σχέση με σημεία κοντά σε πόλεις (M0). Τα δεδομένα συνάδουν με αυτά άλλων μελετών όπου συμπεραίνεται ότι το είδος P. pentagona διασπείρεται μέσω οχημάτων κατά μήκος των αυτοκινητοδρόμων (“transport vector”). Η σημαντική μείωση των συλλήψεων αρσενικών από το 2007 έως το 2010 μπορεί να σηματοδοτεί την έναρξη μιας περιόδου χαμηλού πληθυσμού του εντόμου στην Ουγγαρία. Οκτώ διαφορετικά είδη παρασιτοειδών βρέθηκαν στις φερομονικές παγίδες. Το παρασιτοειδές Coccophagus sp. ήταν το κυρίαρχο είδος, ωστόσο ενδεχομένως να προέρχεται από άλλο κοκκοειδές. Η ταυτότητα των αρρένων κοκκοειδών καθώς και ενός παρασιτοειδούς επαληθεύτηκε με μοριακούς δείκτες.A study has been conducted to monitor geographical spread of the white peach scale Pseudaulacaspis pentagona (Targioni Tozzetti) (Hemiptera: Diaspididae) (WPS) and its parasitoid populations in 32 stops of the Hungarian highways (M0. M1, M3, M5 and M7) using pheromone traps during 2009 and 2010. In addition to the data collected in the current study, previous data were used to investigate the population trend of this pest from 2007 to 2010. The number of males recorded in traps placed on highways was much lower than in the sites close to urban areas (M0). Our data support results of previous studies which suggest the spreading of white peach scale by vehicles (“transport vector”). The significant decrease in the WPS male catches from 2007 to 2010 might indicate the lowering of the population levels of this pest in the area of the study. Eight hymenopterous parasitoid species were captured in pheromone traps. Coccophagus sp. was the predominate species in pheromone traps of WPS in M7, how-ever they may be associated with another coccid species. The identity of scale males and some parasitoids was proved by molecular markers

National Documentation Centre - EKT journals

Online Meta-learning by Parallel Algorithm Competition

Author: Baker James E.
Bertsekas D. P.
Downey Carlton
Gabillon V.
Goodfellow Ian
Mnih Volodymyr
Snoek Jasper
Snoek Jasper
Springenberg Jost T.
Sutton S.
Sutton S.
Szita I.
Unemi T.
Wu Jian
Publication venue
Publication date: 24/02/2017
Field of study

The efficiency of reinforcement learning algorithms depends critically on a few meta-parameters that modulates the learning updates and the trade-off between exploration and exploitation. The adaptation of the meta-parameters is an open question in reinforcement learning, which arguably has become more of an issue recently with the success of deep reinforcement learning in high-dimensional state spaces. The long learning times in domains such as Atari 2600 video games makes it not feasible to perform comprehensive searches of appropriate meta-parameter values. We propose the Online Meta-learning by Parallel Algorithm Competition (OMPAC) method. In the OMPAC method, several instances of a reinforcement learning algorithm are run in parallel with small differences in the initial values of the meta-parameters. After a fixed number of episodes, the instances are selected based on their performance in the task at hand. Before continuing the learning, Gaussian noise is added to the meta-parameters with a predefined probability. We validate the OMPAC method by improving the state-of-the-art results in stochastic SZ-Tetris and in standard Tetris with a smaller, 10

\times

10, board, by 31% and 84%, respectively, and by improving the results for deep Sarsa(

\lambda

) agents in three Atari 2600 games by 62% or more. The experiments also show the ability of the OMPAC method to adapt the meta-parameters according to the learning progress in different tasks.Comment: 15 pages, 10 figures. arXiv admin note: text overlap with arXiv:1702.0311

arXiv.org e-Print Archive

Crossref

Bayesian Reward Filtering

Author: I. Szita
M.A. Carreira-Perpinan
R.S. Sutton
V.N. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref

Eco-friendly Production of Chemicals 1. Improvement of Enzymatic Production of Acetophenone by Direct Extraction

Author: Blaga A-C
Cascaval D
Galaction A-I
Kloetzer L
Petrila-Cocuz I-B
Szita N
Publication venue: GH ASACHI TECHNICAL UNIV IASI
Publication date: 01/08/2016
Field of study

Acetophenone can be enzymatically produced by conversion of methylbenzylamine using transaminase. The enzymatic process is strongly affected by the product inhibition, thus requiring the acetophenone removal from the media during its synthesis. In this purpose, the individual and selective extraction of acetophenone and methylbenzylamine with the biocompatible solvent nheptane containing 1-octanol, D2EHPA or TOA has been analyzed at three values of pH (5, 7, and 9). Regardless of the solvent used and pH-value, the highest efficiency has been reached for extraction of acetophenone, the difference between the extraction yields of acetophenone and methylbenzylamine being amplified during the separation of these compounds from their mixture. On the basis of the experimental selectivity factors and taking into consideration both the possible loss of substrate from the media and the pH required for enzymatic reaction, pH = 7, it has been concluded that the optimum solvent combination is the mixture between n-heptane and 1-octanol. This solvent mixture allowed reaching high selectivity factor of 315, corresponding to the extraction yield of acetophenone of 94.5 % and of methylbenzylamine of only 0.3 %

UCL Discovery