59 research outputs found

    Distinctive properties of biological neural networks and recent advances in bottom-up approaches toward a better biologically plausible neural network

    Get PDF
    Although it may appear infeasible and impractical, building artificial intelligence (AI) using a bottom-up approach based on the understanding of neuroscience is straightforward. The lack of a generalized governing principle for biological neural networks (BNNs) forces us to address this problem by converting piecemeal information on the diverse features of neurons, synapses, and neural circuits into AI. In this review, we described recent attempts to build a biologically plausible neural network by following neuroscientifically similar strategies of neural network optimization or by implanting the outcome of the optimization, such as the properties of single computational units and the characteristics of the network architecture. In addition, we proposed a formalism of the relationship between the set of objectives that neural networks attempt to achieve, and neural network classes categorized by how closely their architectural features resemble those of BNN. This formalism is expected to define the potential roles of top-down and bottom-up approaches for building a biologically plausible neural network and offer a map helping the navigation of the gap between neuroscience and AI engineering

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    A Framework for Aggregation of Multiple Reinforcement Learning Algorithms

    Get PDF
    Aggregation of multiple Reinforcement Learning (RL) algorithms is a new and effective technique to improve the quality of Sequential Decision Making (SDM). The quality of a SDM depends on long-term rewards rather than the instant rewards. RL methods are often adopted to deal with SDM problems. Although many RL algorithms have been developed, none is consistently better than the others. In addition, the parameters of RL algorithms significantly influence learning performances. There is no universal rule to guide the choice of algorithms and the setting of parameters. To handle this difficulty, a new multiple RL system - Aggregated Multiple Reinforcement Learning System (AMRLS) is developed. In AMRLS, each RL algorithm (learner) learns individually in a learning module and provides its output to an intelligent aggregation module. The aggregation module dynamically aggregates these outputs and provides a final decision. Then, all learners take the action and update their policies individually. The two processes are performed alternatively. AMRLS can deal with dynamic learning problems without the need to search for the optimal learning algorithm or the optimal values of learning parameters. It is claimed that several complementary learning algorithms can be integrated in AMRLS to improve the learning performance in terms of success rate, robustness, confidence, redundance, and complementariness. There are two strategies for learning an optimal policy with RL methods. One is based on Value Function Learning (VFL), which learns an optimal policy expressed as a value function. The Temporal Difference RL (TDRL) methods are examples of this strategy. The other is based on Direct Policy Search (DPS), which directly searches for the optimal policy in the potential policy space. The Genetic Algorithms (GAs)-based RL (GARL) are instances of this strategy. A hybrid learning architecture of GARL and TDRL, HGATDRL, is proposed to combine them together to improve the learning ability. AMRLS and HGATDRL are tested on several SDM problems, including the maze world problem, pursuit domain problem, cart-pole balancing system, mountain car problem, and flight control system. Experimental results show that the proposed framework and method can enhance the learning ability and improve learning performance of a multiple RL system

    Fuzzy neural network control for mechanical arm based on adaptive friction compensation

    Get PDF
    When tracking the trajectory of the mechanical arm in a joint space, the system is affected by friction non-linearity, unknown dynamic parameters and external disturbances that makes it difficult to improve the control accuracy of the mechanical arm. To solve the above problems, this paper introduces LuGre friction model and designs a new joint space trajectory tracking controller based on the adaptive fuzzy neural network. The controller is capable to make the adaptive adjustment of the center and width of the basis function, can approach the nonlinear link having the LuGre friction on line, and uses the sliding mode control term to reduce the approximation error. The introduction of LuGre model into the mechanical arm system can more truly simulate the friction link of the system, which is of great significance to the high precision control of the mechanical arm. The Lyapunov method is used to prove the stability of the closed-loop system. The simulation results show that the designed adaptive fuzzy neural network can effectively compensate the non-linear links including friction without precise system parameters, and the controller has strong robustness to load changes, thus realizing high-precision trajectory tracking of the mechanical arm in joint space

    Fault Detection, Isolation and Identification of Autonomous Underwater Vehicles Using Dynamic Neural Networks and Genetic Algorithms

    Get PDF
    The main objective of this thesis is to propose and develop a fault detection, isolation and identification scheme based on dynamic neural networks (DNNs) and genetic algorithm (GA) for thrusters of the autonomous underwater vehicles (AUVs) which provide the force for performing the formation missions. In order to achieve the fault detection task, in this thesis two level of fault detection are proposed, I) Agent-level fault detection (ALFD) and II) Formation-level fault detection (FLFD). The proposed agent-level fault detection scheme includes a dynamic neural network which is trained with absolute measurements and states of each thruster in the AUV. The genetic algorithm is used in order to train the DNN. The results from simulations indicate that although the ALFD scheme can detect the high severity faults, for low severity faults the accuracy is not satisfy our expectations. Therefore, a formation-level fault detection scheme is developed. In the proposed formation-level fault detection scheme, a fault detection unit consist of two dynamic neural networks corresponding to its adjacent neighbors, is employed in each AUV to detect the fault in formation. Each DNN of the fault detection unit is trained with one relative and one absolute measurements. Similar to ALFD scheme, these two DNNs are trained with GA. The simulation results and confusion matrix analysis indicate that our proposed FLFD can detect both low severity and high severity faults with high level of accuracy compare to ALFD scheme. In order to indicate the type and severity of the occurred fault the agent-level and formation-level fault isolation and identification schemes are developed and their performances are compared. In the proposed fault isolation and identification schemes, two neural networks are employed for isolating the type of the fault in the thruster of the AUV and determining the severity of the occurred fault. In the fist step, a multi layer perceptron (MLP) neural network categorize the type of the fault into thruster blocking, flooded thruster and loss of effectiveness in rotor and in the next step a MLP neural network classify the severity into low, medium and high. The neural networks in fault isolation and identification schemes are trained based on genetic algorithm with various data sets which are obtained through different faulty operating condition of the AUV. The simulation results and the confusion matrix analysis indicate that the proposed formation-level fault isolation and identification schemes have a better performance comparing to agent-level schemes and they are capable of isolating and identifying the faults with high level of accuracy and precision

    Biped dynamic walking using reinforcement learning

    Get PDF
    This thesis presents a study of biped dynamic walking using reinforcement learning. A hardware biped robot was built. It uses low gear ratio DC motors in order to provide free leg movements. The Self Scaling Reinforcement learning algorithm was developed in order to deal with the problem of reinforcement learning in continuous action domains. A new learning architecture was designed to solve complex control problems. It uses different modules that consist of simple controllers and small neural networks. The architecture allows for easy incorporation of modules that represent new knowledge, or new requirements for the desired task. Control experiments were carried out using a simulator and the physical biped. The biped learned dynamic walking on flat surfaces without any previous knowledge about its dynamic model

    Reinforcement learning in continuous state and action spaces

    Get PDF
    Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action spaces, which can make learning a good decision policy even more involved. In this chapter we discuss how to automatically find good decision policies in continuous domains. Because analytically computing a good policy from a continuous model can be infeasible, in this chapter we mainly focus on methods that explicitly update a representation of a value function, a policy or both. We discuss considerations in choosing an appropriate representation for these functions and discuss gradient-based and gradient-free ways to update the parameters. We show how to apply these methods to reinforcement-learning problems and discuss many specific algorithms. Amongst others, we cover gradient-based temporal-difference learning, evolutionary strategies, policy-gradient algorithms and actor-critic methods. We discuss the advantages of different approaches and compare the performance of a state-of-the-art actor-critic method and a state-of-the-art evolutionary strategy empirically

    Can Control Hierarchies be Developed and Optimised Progressively?

    Get PDF
    Hierarchical structures are used in robots to achieve effective results in control problems. Hierarchical structures are found in a wide array of applications of AI and robotics, making them a key aspect of control. Even though they hold an integral part in control, such structures are typically produced heuristically, resulting in inconsistent performance. This means that effective control tasks or controllers perform poorly due to the hierarchy being badly defined, limiting what controllers can do. Complex control problems that require adaptive behaviour or autonomy remain challenging for control theorists, with complex problem domains making the heuristic process of producing complex hierarchies harder. It is evident that the heuristic process must have some form of procedure that could be turned into a methodology. By formalising or automating this process, control hierarchies can be produced with consistently effective results without relying on the heuristic production of a control engineer which can easily fail. This thesis proposes an algorithmic approach (inspired by Perceptual Control Theory) known as \ac{DOSA}. \ac{DOSA} produces heirarchies automatically using real world experience and the inputs the system has access to. This thesis shows that DOSA consistently reproduces effective hierarchies that exist in the literature, when billions of possible hierarchies were available. Furthermore, this thesis investigates the value of using hierarchies in general and their benefits in control problems. The computational complexity of hierarchies is compared, showing that while hierarchies do not have a computational advantage, the parameter optimisation procedure is aided greatly by hierarchical parameter optimisation. The thesis then proceeds to study th hierarchical optimisation of parameters and how hierarchies allow this process to be performed more consistently for better results, concluding that hierarchical parameter optimisation produces more consistent controllers that also transfer better to an unseen problem domain. Parameter optimisation is a challenge that also limits otherwise effective controllers and limits the use of larger structures in control. The research described in this thesis formalises the process of generating hierarchical controllers as well as hierarchically optimising them, providing a comprehensive methodology to automate the production of robust controllers for complex problems

    The design and intelligent control of an autonomous mobile robot

    Get PDF
    This thesis presents an investigation into the problems of exploration, map building and collision free navigation for intelligent autonomous mobile robots. The project began with an extensive review of currently available literature in the field of mobile robot research, which included intelligent control techniques and their application. It became clear that there was scope for further development with regard to map building and exploration in new and unstructured environments. Animals have an innate propensity to exhibit such abilities, and so the analogous use of artificial neural networks instead of actual neural systems was examined for use as a method of robot mapping. A simulated behaviour based mobile robot was used in conjunction with a growing cell structure neural network to map out new environments. When using the direct application of this algorithm, topological irregularities were observed to be the direct result of correlations within the input data stream. A modification to this basic system was shown to correct the problem, but further developments would be required to produce a generic solution. The mapping algorithms gained through this approach, although more similar to biological systems, are computationally inefficient in comparison to the methods which were subsequently developed. A novel mapping method was proposed based on the robot creating new location vectors, or nodes, when it exceeded a distance threshold from its mapped area. Network parameters were developed to monitor the state of growth of the network and aid the robot search process. In simulation, the combination of the novel mapping and search process were shown to be able to construct maps which could be subsequently used for collision free navigation. To develop greater insights into the control problem and to validate the simulation work the control structures were ported to a prototype mobile robot. The mobile robot was of circular construction, with a synchro-drive wheel configuration, and was equipped with eight ultrasonic distance sensors and an odometric positioning system. It was self-sufficient, incorporating all its power and computational resources. The experiments observed the effects of odometric drift and demonstrated methods of re-correction which were shown to be effective. Both the novel mapping method, and a new algorithm based on an exhaustive mesh search, were shown to be able to explore different environments and subsequently achieve collision free navigation. This was shown in all cases by monitoring the estimates in the positional error which remained within fixed bounds
    corecore