1,552 research outputs found

    Application of Newton's method to action selection in continuous state- and action-space reinforcement learning

    Get PDF
    An algorithm based on Newton’s Method is proposed for action selection in continuous state- and action-space reinforcement learning without a policy network or discretization. The proposed method is validated on two benchmark problems: Cart-Pole and double Cart-Pole on which the proposed method achieves comparable or improved performance with less parameters to tune and in less training episodes than CACLA, which has previously been shown to outperform many other continuous state- and action-space reinforcement learning algorithms

    Comparative Evaluation for Effectiveness Analysis of Policy Based Deep Reinforcement Learning Approaches

    Get PDF
    Deep Reinforcement Learning (DRL) has proven to be a very strong technique with results in various applications in recent years. Especially the achievements in the studies in the field of robotics show that much more progress will be made in this field. Undoubtedly, policy choices and parameter settings play an active role in the success of DRL. In this study, an analysis has been made on the policies used by examining the DRL studies conducted in recent years. Policies used in the literature are grouped under three different headings: value-based, policy-based and actor-critic. However, the problem of moving a common target using Newton's law of motion of collaborative agents is presented. Trainings are carried out in a frictionless environment with two agents and one object using four different policies. Agents try to force an object in the environment by colliding it and try to move it out of the area it is in. Two-dimensional surface is used during the training phase. As a result of the training, each policy is reported separately and its success is observed. Test results are discussed in section 5. Thus, policies are tested together with an application by providing information about the policies used in deep reinforcement learning approaches

    Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks

    Full text link
    This paper discusses a system that accelerates reinforcement learning by using transfer from related tasks. Without such transfer, even if two tasks are very similar at some abstract level, an extensive re-learning effort is required. The system achieves much of its power by transferring parts of previously learned solutions rather than a single complete solution. The system exploits strong features in the multi-dimensional function produced by reinforcement learning in solving a particular task. These features are stable and easy to recognize early in the learning process. They generate a partitioning of the state space and thus the function. The partition is represented as a graph. This is used to index and compose functions stored in a case base to form a close approximation to the solution of the new task. Experiments demonstrate that function composition often produces more than an order of magnitude increase in learning rate compared to a basic reinforcement learning algorithm

    Near-Optimal Control of a Quadcopter Using Reinforcement Learning

    Get PDF
    This paper presents a novel control method for quadcopters that achieves near-optimal tracking control for input-affine nonlinear quadcopter dynamics. The method uses a reinforcement learning algorithm called Single Network Adaptive Critics (SNAC), which approximates a solution to the discrete-time Hamilton-Jacobi-Bellman (DT-HJB) equation using a single neural network trained offline. The control method involves two SNAC controllers, with the outer loop controlling the linear position and velocities (position control) and the inner loop controlling the angular position and velocities (attitude control). The resulting quadcopter controller provides optimal feedback control and tracks a trajectory for an infinite-horizon, and it is compared with commercial optimal control software. Furthermore, the closed-loop controller can control the system with any initial conditions within the domain of training. Overall, this research demonstrates the benefits of using SNAC for nonlinear control, showing its ability to achieve near-optimal tracking control while reducing computational complexity. This paper provides insights into a new approach for controlling quadcopters, with potential applications in various fields such as aerial surveillance, delivery, and search and rescue

    Interactive pace approach to learning in physics : method and materials

    Get PDF
    This study has the following general objectives: first, to present in detail the techniques that the author has developed and used in designed the course and the final materials produced for am introductory physics course in Mechanics; second, to state and examine the important blended ingredients used to get a new strategy for teaching-learning in which and individualized pace together with a group interaction is used; third, to analyze a combination of structural and operational course where concept, structure formation, and problem solving are emphasized

    Incremental synthesis of optimcal control laws using learning algorithms

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 1993.Includes bibliographical references (p. 151-154).Stephen C. Atkins.M.S

    Anomaly detection and dynamic decision making for stochastic systems

    Full text link
    Thesis (Ph.D.)--Boston UniversityThis dissertation focuses on two types of problems, both of which are related to systems with uncertainties. The first problem concerns network system anomaly detection. We present several stochastic and deterministic methods for anomaly detection of networks whose normal behavior is not time-varying. Our methods cover most of the common techniques in the anomaly detection field. We evaluate all methods in a simulated network that consists of nominal data, three flow-level anomalies and one packet-level attack. Through analyzing the results, we summarize the advantages and the disadvantages of each method. As a next step, we propose two robust stochastic anomaly detection methods for networks whose normal behavior is time-varying. We develop a procedure for learning the underlying family of patterns that characterize a time-varying network. This procedure first estimates a large class of patterns from network data and then refines it to select a representative subset. The latter part formulates the refinement problem using ideas from set covering via integer programming. Then we propose two robust methods, one model-free and one model-based, to evaluate whether a sequence of observations is drawn from the learned patterns. Simulation results show that the robust methods have significant advantages over the alternative stationary methods in time-varying networks. The final anomaly detection setting we consider targets the detection of botnets before they launch an attack. Our method analyzes the social graph of the nodes in a network and consists of two stages: (i) network anomaly detection based on large deviations theory and (ii) community detection based on a refined modularity measure. We apply our method on real-world botnet traffic and compare its performance with other methods. The second problem considered by this dissertation concerns sequential decision mak- ings under uncertainty, which can be modeled by a Markov Decision Processes (MDPs). We focus on methods with an actor-critic structure, where the critic part estimates the gradient of the overall objective with respect to tunable policy parameters and the actor part optimizes a policy with respect to these parameters. Most existing actor- critic methods use Temporal Difference (TD) learning to estimate the gradient and steepest gradient ascent to update the policies. Our first contribution is to propose an actor-critic method that uses a Least Squares Temporal Difference (LSTD) method, which is known to converge faster than the TD methods. Our second contribution is to develop a new Newton-like actor-critic method that performs better especially for ill-conditioned problems. We evaluate our methods in problems motivated from robot motion control

    Non-conventional control of the flexible pole-cart balancing problem

    Get PDF
    Emerging techniques of intelligent or learning control seem attractive for applications in manufacturing and robotics. It is however important to understand the capabilities of such control systems. In the past the inverted pendulum has been used as a test case. The thesis begins with an examination of whether the inverted pendulum or polecart balancing problem is a representative problem for experimentation for learning controllers for complex nonlinear systems. Results of previous research concerning the inverted pendulum problem are presented to show that this problem is not sufficiently testing. This thesis therefore concentrates on the control of the inverted pendulum with an additional degree of freedom as a testing demonstrator problem for learning control system experimentation. A flexible pole is used in place of a rigid one. The transverse displacement of the flexible pole adds a degree of freedom to the system. The dynamics of this new system are more complex as the system needs additional parameters to be defIned due to the pole's elastic deflection. This problem also has many of the signifIcant features associated with flexible robots with lightweight links as applied in manufacturing. Novel neural network and fuzzy control systems are presented that control such a system both in simulation and real time. A fuzzy-genetic approach is also demonstrated that allows the creation of fuzzy control systems without the use of extensive knowledge

    Space exploration: The interstellar goal and Titan demonstration

    Get PDF
    Automated interstellar space exploration is reviewed. The Titan demonstration mission is discussed. Remote sensing and automated modeling are considered. Nuclear electric propulsion, main orbiting spacecraft, lander/rover, subsatellites, atmospheric probes, powered air vehicles, and a surface science network comprise mission component concepts. Machine, intelligence in space exploration is discussed

    The alignment of the National Senior Certificate Examinations (November 2014 - March 2018) and the Curriculum and Assessment Policy Statement Grade 12 Physical Sciences : Physics (P1) in South Africa

    Get PDF
    The Department of Basic Education (DBE) has associated the poor pass rate in the National Senior Certificate (NSC) Grade 12 Physical Sciences examinations to the learners’ lack of practical skills and the inability of learners to solve problems by integrating knowledge from the different topics in Physical Sciences. The CAPS (Curriculum and Assessment Policy Statement) is central to the planning, organising and teaching of Physical Sciences. Even though more than a third of the learners achieved below 30% in the NSC Grade 12 Physical Sciences: Physics (P1) November 2017 examination, there was a lack of references made to the CAPS, rationalising the poor performance. A disjointed alignment between the CAPS and the P1 is a possible cause for the poor performance. Since there have been no previous studies that investigated the alignment between the CAPS and the P1, this study aims to fill that gap. This study used a positivist research paradigm and a case study research strategy. A purposive sampling procedure selected the CAPS Grades 10 – 12 Physical Sciences document; the Physical Sciences Examination Guidelines Grade 12 documents and the final and supplementary P1 examinations in the period starting November 2014 to March 2018 as the documents for analysis. A summative content analysis research technique was conducted using the Surveys of Enacted Curriculum (SEC) research method. The SEC method employed the use of the four topics of Grade 12 Physics and the four non-hierarchical levels of cognitive demand as described in the modified version of Bloom’s taxonomy. The physics topics included mechanics; waves, sound and light; electricity and magnetism; and optical phenomena. The cognitive demand levels included recall; comprehension; application and analysis; and synthesis and evaluation. This study found a 100 percent categorical coherence, a 67.3 percent balance of representation, a 79.4 percent cognitive complexity and an average Porter’s alignment index of 0.77 between the CAPS and the P1. The overall Cohen’s kappa for all the documents analysed was 0.88. The findings of this study indicate that the mechanics topic was under-emphasised whilst the application and analysis cognitive demand was over-emphasised in the P1. The CAPS and the P1 did not utilise the highest cognitive demand, synthesis and evaluation which may be interpreted as an environment that fosters lower order thinking. To change this environment of lower order thinking and simultaneously increase the alignment between the CAPS and the P1 this study recommends that firstly, the CAPS decreases the recall based content of the mechanics topic. Secondly, the CAPS and the P1 increase the synthesis and evaluation cognitive demand-based content at the expense of the recall cognitive demand-based content. Thirdly, the CAPS must include the content of the school-based physics practical assessments while decreasing the focus on physics definitions. The ultimate aim is an improvement in the pass rates of the NSC Grade 12 Physical Sciences examinations.Science and Technology EducationM. Sc. (Mathematics, Science and Technology Education (Physics Education)
    • …
    corecore