452 research outputs found

    Designing Decentralized controllers for distributed-air-jet MEMS-based micromanipulators by reinforcement learning.

    No full text
    International audienceDistributed-air-jet MEMS-based systems have been proposed to manipulate small parts with high velocities and without any friction problems. The control of such distributed systems is very challenging and usual approaches for contact arrayed system don't produce satisfactory results. In this paper, we investigate reinforcement learning control approaches in order to position and convey an object. Reinforcement learning is a popular approach to find controllers that are tailored exactly to the system without any prior model. We show how to apply reinforcement learning in a decentralized perspective and in order to address the global-local trade-off. The simulation results demonstrate that the reinforcement learning method is a promising way to design control laws for such distributed systems

    Multi-Agent Learning for Security and Sustainability

    Get PDF
    This thesis studies the application of multi-agent learning in complex domains where safety and sustainability are crucial. We target some of the main obstacles in the deployment of multi-agent learning techniques in such domains. These obstacles consist of modelling complex environments with multi-agent interaction, designing robust learning processes and modelling adversarial agents. The main goal of using modern multi-agent learning methods is to improve the effectiveness of behaviour in such domains, and hence increase sustainability and security. This thesis investigates three complex real-world domains: space debris removal, critical domains with risky states and spatial security domains such as illegal rhino poaching. We first tackle the challenge of modelling a complex multi-agent environment. The focus is on the space debris removal problem, which poses a major threat to the sustainability of earth orbit. We develop a high-fidelity space debris simulator that allows us to simulate the future evolution of the space debris environment. Using the data from the simulator we propose a surrogate model, which enables fast evaluation of different strategies chosen by the space actors. We then analyse the dynamics of strategic decision making among multiple space actors, comparing different models of agent interaction: static vs. dynamic and centralised vs. decentralised. The outcome of our work can help future decision makers to design debris removal strategies, and consequently mitigate the threat of space debris. Next, we study how we can design a robust learning process in critical domains with risky states, where destabilisation of local components can lead to severe impact on the whole network. We propose a novel robust operator κ which can be combined with reinforcement learning methods, leading to learning safe policies, mitigating the threat of external attack, or failure in the system. Finally, we investigate the challenge of learning an effective behaviour while facing adversarial attackers in spatial security domains such as illegal rhino poaching. We assume that such attackers can be occasionally observed. Our approach consists of combining Bayesian inference with temporal difference learning, in order to build a model of the attacker behaviour. Our method can effectively use the partial observability of the attacker’s location and approximate the performance of a full observability case. This thesis therefore presents novel methods and tackles several important obstacles in deploying multi-agent learning algorithms in the real-world, which further narrows the reality gap between theoretical models and real-world applications

    DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization

    Full text link
    Visual reinforcement learning (RL) has shown promise in continuous control tasks. Despite its progress, current algorithms are still unsatisfactory in virtually every aspect of the performance such as sample efficiency, asymptotic performance, and their robustness to the choice of random seeds. In this paper, we identify a major shortcoming in existing visual RL methods that is the agents often exhibit sustained inactivity during early training, thereby limiting their ability to explore effectively. Expanding upon this crucial observation, we additionally unveil a significant correlation between the agents' inclination towards motorically inactive exploration and the absence of neuronal activity within their policy networks. To quantify this inactivity, we adopt dormant ratio as a metric to measure inactivity in the RL agent's network. Empirically, we also recognize that the dormant ratio can act as a standalone indicator of an agent's activity level, regardless of the received reward signals. Leveraging the aforementioned insights, we introduce DrM, a method that uses three core mechanisms to guide agents' exploration-exploitation trade-offs by actively minimizing the dormant ratio. Experiments demonstrate that DrM achieves significant improvements in sample efficiency and asymptotic performance with no broken seeds (76 seeds in total) across three continuous control benchmark environments, including DeepMind Control Suite, MetaWorld, and Adroit. Most importantly, DrM is the first model-free algorithm that consistently solves tasks in both the Dog and Manipulator domains from the DeepMind Control Suite as well as three dexterous hand manipulation tasks without demonstrations in Adroit, all based on pixel observations

    Reinforcement Learning Applied to Trading Systems: A Survey

    Full text link
    Financial domain tasks, such as trading in market exchanges, are challenging and have long attracted researchers. The recent achievements and the consequent notoriety of Reinforcement Learning (RL) have also increased its adoption in trading tasks. RL uses a framework with well-established formal concepts, which raises its attractiveness in learning profitable trading strategies. However, RL use without due attention in the financial area can prevent new researchers from following standards or failing to adopt relevant conceptual guidelines. In this work, we embrace the seminal RL technical fundamentals, concepts, and recommendations to perform a unified, theoretically-grounded examination and comparison of previous research that could serve as a structuring guide for the field of study. A selection of twenty-nine articles was reviewed under our classification that considers RL's most common formulations and design patterns from a large volume of available studies. This classification allowed for precise inspection of the most relevant aspects regarding data input, preprocessing, state and action composition, adopted RL techniques, evaluation setups, and overall results. Our analysis approach organized around fundamental RL concepts allowed for a clear identification of current system design best practices, gaps that require further investigation, and promising research opportunities. Finally, this review attempts to promote the development of this field of study by facilitating researchers' commitment to standards adherence and helping them to avoid straying away from the RL constructs' firm ground.Comment: 38 page

    Location-Enabled IoT (LE-IoT): A Survey of Positioning Techniques, Error Sources, and Mitigation

    Get PDF
    The Internet of Things (IoT) has started to empower the future of many industrial and mass-market applications. Localization techniques are becoming key to add location context to IoT data without human perception and intervention. Meanwhile, the newly-emerged Low-Power Wide-Area Network (LPWAN) technologies have advantages such as long-range, low power consumption, low cost, massive connections, and the capability for communication in both indoor and outdoor areas. These features make LPWAN signals strong candidates for mass-market localization applications. However, there are various error sources that have limited localization performance by using such IoT signals. This paper reviews the IoT localization system through the following sequence: IoT localization system review -- localization data sources -- localization algorithms -- localization error sources and mitigation -- localization performance evaluation. Compared to the related surveys, this paper has a more comprehensive and state-of-the-art review on IoT localization methods, an original review on IoT localization error sources and mitigation, an original review on IoT localization performance evaluation, and a more comprehensive review of IoT localization applications, opportunities, and challenges. Thus, this survey provides comprehensive guidance for peers who are interested in enabling localization ability in the existing IoT systems, using IoT systems for localization, or integrating IoT signals with the existing localization sensors

    Control oriented modelling of an integrated attitude and vibration suppression architecture for large space structures

    Get PDF
    This thesis is divided into two parts. The main focus of the research, namely active vibration control for large flexible spacecraft, is exposed in Part I and, in parallel, the topic of machine learning techniques for modern space applications is described in Part II. In particular, this thesis aims at proposing an end-to-end general architecture for an integrated attitude-vibration control system, starting from the design of structural models to the synthesis of the control laws. To this purpose, large space structures based on realistic missions are investigated as study cases, in accordance with the tendency of increasing the size of the scientific instruments to improve their sensitivity, being the drawback an increase of its overall flexibility. An active control method is therefore investigated to guarantee satisfactory pointing and maximum deformation by avoiding classical stiffening methods. Therefore, the instrument is designed to be supported by an active deployable frame hosting an optimal minimum set of collocated smart actuators and sensors. Different spatial configurations for the placement of the distributed network of active devices are investigated, both at closed-loop and open-loop levels. Concerning closed-loop techniques, a method to optimally place the poles of the system via a Direct Velocity Feedback (DVF) controller is proposed to identify simultaneously the location and number of active devices for vibration control with an in-cascade optimization technique. Then, two general and computationally efficient open-loop placement techniques, namely Gramian and Modal Strain Energy (MSE)-based methods, are adopted as opposed to heuristic algorithms, which imply high computational costs and are generally not suitable for high-dimensional systems, to propose a placement architecture for generically shaped tridimensional space structures. Then, an integrated robust control architecture for the spacecraft is presented as composed of both an attitude control scheme and a vibration control system. To conclude the study, attitude manoeuvres are performed to excite main flexible modes and prove the efficacy of both attitude and vibration control architectures. Moreover, Part II is dedicated to address the problem of improving autonomy and self-awareness of modern spacecraft, by using machine-learning based techniques to carry out Failure Identification for large space structures and improving the pointing performance of spacecraft (both flexible satellite with sloshing models and small rigid platforms) when performing repetitive Earth Observation manoeuvres

    Adoption and Diffusion of At-Home Medical Tests

    Get PDF
    The purpose of this study is to understand the at-home medical test market including the medical and regulatory requirements to create at-home medical tests, as well as the market factors that influence consumer adoption in the context of the COVID-19 pandemic. To address testing shortages of COVID-19 tests companies created at-home tests which were rapidly approved by the FDA, bringing at-home testing to the forefront. History of at-home health testing is reviewed, the medical requirements for creating them, and how the pandemic has affected such testing. Tables are also included to demonstrate currently available tests and potential future tests. The research draws attention to two categories of at-home tests 1) collection kits and 2) testing kits, both presenting opportunities for test developers. Companies interested in bringing at-home medical tests to the market must decide if they will utilize a preexisting laboratory test or develop a new test and if the tests will be physician ordered or sold directly to the consumer. Our investigation focuses on the effect COVID-19 has had on the at-home testing market which has been explored through traditional marketing concepts, the Rogers (2003) adoption and diffusion of innovations’ framework and critical success factors

    Enhancing Exploration and Safety in Deep Reinforcement Learning

    Get PDF
    A Deep Reinforcement Learning (DRL) agent tries to learn a policy maximizing a long-term objective by trials and errors in large state spaces. However, this learning paradigm requires a non-trivial amount of interactions in the environment to achieve good performance. Moreover, critical applications, such as robotics, typically involve safety criteria to consider while designing novel DRL solutions. Hence, devising safe learning approaches with efficient exploration is crucial to avoid getting stuck in local optima, failing to learn properly, or causing damages to the surrounding environment. This thesis focuses on developing Deep Reinforcement Learning algorithms to foster efficient exploration and safer behaviors in simulation and real domains of interest, ranging from robotics to multi-agent systems. To this end, we rely both on standard benchmarks, such as SafetyGym, and robotic tasks widely adopted in the literature (e.g., manipulation, navigation). This variety of problems is crucial to assess the statistical significance of our empirical studies and the generalization skills of our approaches. We initially benchmark the sample efficiency versus performance trade-off between value-based and policy-gradient algorithms. This part highlights the benefits of using non-standard simulation environments (i.e., Unity), which also facilitates the development of further optimization for DRL. We also discuss the limitations of standard evaluation metrics (e.g., return) in characterizing the actual behaviors of a policy, proposing the use of Formal Verification (FV) as a practical methodology to evaluate behaviors over desired specifications. The second part introduces Evolutionary Algorithms (EAs) as a gradient-free complimentary optimization strategy. In detail, we combine population-based and gradient-based DRL to diversify exploration and improve performance both in single and multi-agent applications. For the latter, we discuss how prior Multi-Agent (Deep) Reinforcement Learning (MARL) approaches hinder exploration, proposing an architecture that favors cooperation without affecting exploration
    • …
    corecore