4 research outputs found

    Skinner operant conditioning model and robot bionic self-learning control

    Get PDF
    Fuzzy Skinner Operant Conditioning Automaton (FSOCA) sastavljen je na temelju Operant Conditioning mehanizma primjenom teorije neizrazitih skupova. Osnovno obilježje automata FSOCA je sljedeće: neizraziti rezultati stanja pomoću Gausove funkcije koriste se kao skupovi neizrazitog stanja; neizrazita pravila preslikavanja (fuzzy mapping rules) kod fuzzy-conditioning-operacije zamjenjuju stohastičke "conditioning-operant" skupove preslikavanja. Stoga se automat FSOCA može koristiti za opisivanje, simuliranje i dizajniranje raznih samo-organizirajućih radnji fuzzy nesigurnog sustava. Automat FSOCA najprije usvaja online algoritam grupiranja (clustering) u svrhu podjele ulaznog prostora (input space) te koristi intenzitet pobude pravila preslikavanja kako bi odlučio treba li generirati novo pravilo preslikavanja da bi broj pravila preslikavanja bio ekonomičan. Dizajnirani FSOCA automat primijenjen je za reguliranje balansiranja gibanja robota s dva kotača. Kako se učenje nastavlja, odabrana vjerojatnoća fuzzy operanta koji optimalno slijedi postepeno će se povećavati, entropijsko djelovanje fuzzy operanta će se postepeno smanjivati pa će se automatski generirati i izbrisati neizrazita pravila preslikavanja. Nakon otprilike sedamnaest krugova obuke, odabrane vjerojatnosti neizrazitog posljedičnog optimalnog operanta postupno teže prema jednoj, entropija djelovanja neizrazitog operanta postupno se smanjuje i broj neizrazitih pravila preslikavanja postaje optimalan. Tako robot postupno uči vještinu balansiranja gibanja.A Fuzzy Skinner Operant Conditioning Automaton (FSOCA) is constructed based on Operant Conditioning Mechanism with Fuzzy Set theory. The main character of FSOCA automaton is: the fuzzed results of state by Gaussian function are used as fuzzy state sets; the fuzzy mapping rules of fuzzy-conditioning-operation replace the stochastic "conditioning-operant" mapping sets. So the FSOCA automaton can be used to describe, simulate and design various self-organization actions of a fuzzy uncertain system. The FSOCA automaton firstly adopts online clustering algorithm to divide the input space and uses the excitation intensity of mapping rule to decide whether a new mapping rule needs to be generated in order to ensure that the number of mapping rules is economical. The designed FSOCA automaton is applied to motion balanced control of two-wheeled robot. With the learning proceeding, the selected probability of the optimal consequent fuzzy operant will gradually increase, the fuzzy operant action entropy will gradually decrease and the fuzzy mapping rules will automatically be generated and deleted. After about seventeen rounds of training, the selected probabilities of fuzzy consequent optimal operant gradually tend to one, the fuzzy operant action entropy gradually tends to minimum and the number of fuzzy mapping rules is optimum. So the robot gradually learns the motion balance skill

    Combination of online clustering and Q-value based GA for reinforcement fuzzy system design

    No full text
    This paper proposes a combination of online clustering and Q-value based genetic algorithm (GA) learning scheme for fuzzy system design (CQGAF) with reinforcements. The CQGAF fulfills GA-based fuzzy system design under reinforcement learning environment where only weak reinforcement signals such as "success" and "failure" are available. In CQGAF, there are no fuzzy rules initially. They are generated automatically. The precondition part of a fuzzy system is online constructed by an aligned clustering-based approach. By this clustering, a flexible partition is achieved. Then, the consequent part is designed by Q-value based genetic reinforcement learning. Each individual in the GA population encodes the consequent part parameters of a fuzzy system and is associated with a Q-value. The Q-value estimates the discounted cumulative reinforcement information performed by the individual and is used as a fitness value for GA evolution. At each time step, an individual is selected according to the Q-values, and then a corresponding fuzzy system is built and applied to the environment with a critic received. With this critic, Q-1earning with eligibility trace is executed. After each trial, GA is performed to search for better consequent parameters based on the learned Q-values. Thus, in CQGAF, evolution is performed immediately after the end of one trial in contrast to general GA where many trials are performed before evolution. The feasibility of CQGAF is demonstrated through simulations in cart-pole balancing, magnetic levitation, and chaotic system control problems with only binary reinforcement signals
    corecore