24 research outputs found

    A gene regulatory network model for control

    Get PDF
    The activity of a biological cell is regulated by interactions between genes and proteins. In artificial intelligence, this has led to the creation of developmental gene regulatory network (GRN) models which aim to exploit these mechanisms to algorithmically build complex designs. The emerging field of GRNs for control aims to instead exploit these natural mechanisms and this ability to encode a large variety of behaviours within a single evolvable genetic program for the solution of control problems. This work aims to extend the application domain of GRN models to previously unsolved control problems; the focus will here be on reinforcement learning problems, in which the dynamics of the system controlled are kept from the controller and only sparse feedback is given to it. This category of problems closely matches the challenges faced by natural evolution in generating biological GRNs. Starting with an existing GRN model, the fractal GRN (FGRN) model, a successful application to a standard control problem will be presented, followed by multiple improvements to the FGRN model and its associated genetic algorithm, resulting in better performances in terms of both reliability and speed. Limitations will be identified in the FGRN model, leading to the introduction of the Input-Merge- Regulate-Output (IMRO) architecture for GRN models, an implementation of which will show both quantitative and qualitative improvements over the FGRN model, solving harder control problems. The resulting model also displays useful features which should facilitate further extension and real-world use of the system

    Echo state model of non-Markovian reinforcement learning, An

    Get PDF
    Department Head: Dale H. Grit.2008 Spring.Includes bibliographical references (pages 137-142).There exists a growing need for intelligent, autonomous control strategies that operate in real-world domains. Theoretically the state-action space must exhibit the Markov property in order for reinforcement learning to be applicable. Empirical evidence, however, suggests that reinforcement learning also applies to domains where the state-action space is approximately Markovian, a requirement for the overwhelming majority of real-world domains. These domains, termed non-Markovian reinforcement learning domains, raise a unique set of practical challenges. The reconstruction dimension required to approximate a Markovian state-space is unknown a priori and can potentially be large. Further, spatial complexity of local function approximation of the reinforcement learning domain grows exponentially with the reconstruction dimension. Parameterized dynamic systems alleviate both embedding length and state-space dimensionality concerns by reconstructing an approximate Markovian state-space via a compact, recurrent representation. Yet this representation extracts a cost; modeling reinforcement learning domains via adaptive, parameterized dynamic systems is characterized by instability, slow-convergence, and high computational or spatial training complexity. The objectives of this research are to demonstrate a stable, convergent, accurate, and scalable model of non-Markovian reinforcement learning domains. These objectives are fulfilled via fixed point analysis of the dynamics underlying the reinforcement learning domain and the Echo State Network, a class of parameterized dynamic system. Understanding models of non-Markovian reinforcement learning domains requires understanding the interactions between learning domains and their models. Fixed point analysis of the Mountain Car Problem reinforcement learning domain, for both local and nonlocal function approximations, suggests a close relationship between the locality of the approximation and the number and severity of bifurcations of the fixed point structure. This research suggests the likely cause of this relationship: reinforcement learning domains exist within a dynamic feature space in which trajectories are analogous to states. The fixed point structure maps dynamic space onto state-space. This explanation suggests two testable hypotheses. Reinforcement learning is sensitive to state-space locality because states cluster as trajectories in time rather than space. Second, models using trajectory-based features should exhibit good modeling performance and few changes in fixed point structure. Analysis of performance of lookup table, feedforward neural network, and Echo State Network (ESN) on the Mountain Car Problem reinforcement learning domain confirm these hypotheses. The ESN is a large, sparse, randomly-generated, unadapted recurrent neural network, which adapts a linear projection of the target domain onto the hidden layer. ESN modeling results on reinforcement learning domains show it achieves performance comparable to lookup table and neural network architectures on the Mountain Car Problem with minimal changes to fixed point structure. Also, the ESN achieves lookup table caliber performance when modeling Acrobot, a four-dimensional control problem, but is less successful modeling the lower dimensional Modified Mountain Car Problem. These performance discrepancies are attributed to the ESN’s excellent ability to represent complex short term dynamics, and its inability to consolidate long temporal dependencies into a static memory. Without memory consolidation, reinforcement learning domains exhibiting attractors with multiple dynamic scales are unlikely to be well-modeled via ESN. To mediate this problem, a simple ESN memory consolidation method is presented and tested for stationary dynamic systems. These results indicate the potential to improve modeling performance in reinforcement learning domains via memory consolidation

    Energy-Economical Heuristically Based Control of Compass Gait Walking on Stochastically Varying Terrain

    Get PDF
    Investigation uses simulation to explore the inherent tradeoffs ofcontrolling high-speed and highly robust walking robots while minimizing energy consumption. Using a novel controller which optimizes robustness, energy economy, and speed of a simulated robot on rough terrain, the user can adjust their priorities between these three outcome measures and systematically generate a performance curveassessing the tradeoffs associated with these metrics

    Energy-Economical Heuristically Based Control of Compass Gait Walking on Stochastically Varying Terrain

    Get PDF
    Investigation uses simulation to explore the inherent tradeoffs ofcontrolling high-speed and highly robust walking robots while minimizing energy consumption. Using a novel controller which optimizes robustness, energy economy, and speed of a simulated robot on rough terrain, the user can adjust their priorities between these three outcome measures and systematically generate a performance curveassessing the tradeoffs associated with these metrics

    Intelligent model-based control of complex multi-link mechanisms

    Get PDF
    Complex under-actuated multilink mechanism involves a system whose number of control inputs is smaller than the dimension of the configuration space. The ability to control such a system through the manipulation of its natural dynamics would allow for the design of more energy-efficient machines with the ability to achieve smooth motions similar to those found in the natural world. This research aims to understand the complex nature of the Robogymnast, a triple link underactuated pendulum built at Cardiff University with the purpose of studying the behaviour of non-linear systems and understanding the challenges in developing its control system. A mathematical model of the robot was derived from the Euler-Lagrange equations. The design of the control system was based on the discrete-time linear model around the downward position and a sampling time of 2.5 milliseconds. Firstly, Invasive Weed Optimization (IWO) was used to optimize the swing-up motion of the robot by determining the optimum values of parameters that control the input signals of the Robogymnast’s two motors. The values obtained from IWO were then applied to both simulation and experiment. The results showed that the swing-up motion of the Robogymnast from the stable downward position to the inverted configuration to be successfully achieved. Secondly, due to the complex nature and nonlinearity of the Robogymnast, a novel approach of modelling the Robogymnast using a multi-layered Elman neural ii network (ENN) was proposed. The ENN model was then tested with various inputs and its output were analysed. The results showed that the ENN model to be capable of providing a better representation of the actual system compared to the mathematical model. Thirdly, IWO is used to investigate the optimum Q values of the Linear Quadratic Regulator (LQR) for inverted balance control of the Robogymnast. IWO was used to obtain the optimal Q values required by the LQR to maintain the Robogymnast in an upright configuration. Two fitness criteria were investigated: cost function J and settling time T. A controller was developed using values obtained from each fitness criteria. The results showed that LQRT performed faster but LQRJ was capable of stabilizing the Robogymnast from larger deflection angles. Finally, fitness criteria J and T were used simultaneously to obtain the optimal Q values for the LQR. For this purpose, two multi-objective optimization methods based on the IWO, namely the Weighted Criteria Method IWO (WCMIWO) and the Fuzzy Logic IWO Hybrid (FLIWOH) were developed. Two LQR controllers were first developed using the parameters obtained from the two optimization methods. The same process was then repeated with disturbance applied to the Robogymnast states to develop another two LQR controllers. The response of the controllers was then tested in different scenarios using simulation and their performance was evaluated. The results showed that all four controllers were able to balance the Robogymnast with the fastest settling time achieved by WMCIWO with disturbance followed by in the ascending order: FLIWOH with disturbance, FLIWOH, and WCMIWO

    Sample efficiency, transfer learning and interpretability for deep reinforcement learning

    Get PDF
    Deep learning has revolutionised artificial intelligence, where the application of increased compute to train neural networks on large datasets has resulted in improvements in real-world applications such as object detection, text-to-speech synthesis and machine translation. Deep reinforcement learning (DRL) has similarly shown impressive results in board and video games, but less so in real-world applications such as robotic control. To address this, I have investigated three factors prohibiting further deployment of DRL: sample efficiency, transfer learning, and interpretability. To decrease the amount of data needed to train DRL systems, I have explored various storage strategies and exploration policies for episodic control (EC) algorithms, resulting in the application of online clustering to improve the memory efficiency of EC algorithms, and the maximum entropy mellowmax policy for improving the sample efficiency and final performance of the same EC algorithms. To improve performance during transfer learning, I have shown that a multi-headed neural network architecture trained using hierarchical reinforcement learning can retain the benefits of positive transfer between tasks while mitigating the interference effects of negative transfer. I additionally investigated the use of multi-headed architectures to reduce catastrophic forgetting under the continual learning setting. While the use of multiple heads worked well within a simple environment, it was of limited use within a more complex domain, indicating that this strategy does not scale well. Finally, I applied a wide range of quantitative and qualitative techniques to better interpret trained DRL agents. In particular, I compared the effects of training DRL agents both with and without visual domain randomisation (DR), a popular technique to achieve simulation-to-real transfer, providing a series of tests that can be applied before real-world deployment. One of the major findings is that DR produces more entangled representations within trained DRL agents, indicating quantitatively that they are invariant to nuisance factors associated with the DR process. Additionally, while my environment allowed agents trained without DR to succeed without requiring complex recurrent processing, all agents trained with DR appear to integrate information over time, as evidenced through ablations on the recurrent state.Open Acces

    Advanced Strategies for Robot Manipulators

    Get PDF
    Amongst the robotic systems, robot manipulators have proven themselves to be of increasing importance and are widely adopted to substitute for human in repetitive and/or hazardous tasks. Modern manipulators are designed complicatedly and need to do more precise, crucial and critical tasks. So, the simple traditional control methods cannot be efficient, and advanced control strategies with considering special constraints are needed to establish. In spite of the fact that groundbreaking researches have been carried out in this realm until now, there are still many novel aspects which have to be explored

    Reinforcement Learning in the Real World: Strategies for Computing Resource Allocation and Simulation to Reality Conversion

    Get PDF
    Recent advances in machine learning and robotics are automating several processes in the real world. For instance, robots are now able to solve complicated tasks that until recently only humans were capable of doing. A specific branch of machine learning called reinforcement learning (RL), has shown remarkable results on learning tasks by merely allowing a controller to interact with the environment while provided with positive and negative reinforcement signals. Such methods, however, come with a high cost: the amount of data to train such behaviours can be prohibitive. One possible solution is to use simulators to collect the data but this this creates the "reality gap" problem where control policies initially trained on simulation do not transfer well when deployed to its target environment. In this context, this thesis addresses the problem of using RL in the real world by incorporating prior information into the training process that allows such methods to make better decisions when presented with real data. As the first contribution, this thesis provides a method to learn energy-efficient policies where the learned behaviour is optimised for both accuracy and energy consumption. The method uses the signal collected in the real environment and decides whether to make decisions using a vision based or motion based sensor. The approach highlights the importance of considering the uncertainty of real-world processes when optimising for a specific resource. For instance, the system battery may have different discharge rates based on the temperature of the environment. This chapter serves as a motivation for the remaining of the work. The second contribution of this thesis addresses the specific problem of minimising the Sim-to-Real gap. The proposed method incorporates prior information about the real world in order to find the most suitable simulation environment to train a RL policy. This is performed by using Bayesian Likelihood-Free Inference methods where our initial prior is refined as it is presented with real-world data. The framework allows for a more structured approach to the aforementioned problem as it incorporates the uncertainty of the real environment into the controller fine tuning process. Lastly, this thesis connects simulation parameter inference with policy training. We present a method for simultaneously optimising the policy as the simulator continuously improves its accuracy in representing the real environment. The end-to-end approach significantly reduces the time required to learn a policy that has similar performance between simulation and real world. The framework highlights the importance of treating simulator parameter inference and controller optimisation as a unified problem where both parts are equally important for the overall performance of the system
    corecore