213 research outputs found

    DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization

    Full text link
    Visual reinforcement learning (RL) has shown promise in continuous control tasks. Despite its progress, current algorithms are still unsatisfactory in virtually every aspect of the performance such as sample efficiency, asymptotic performance, and their robustness to the choice of random seeds. In this paper, we identify a major shortcoming in existing visual RL methods that is the agents often exhibit sustained inactivity during early training, thereby limiting their ability to explore effectively. Expanding upon this crucial observation, we additionally unveil a significant correlation between the agents' inclination towards motorically inactive exploration and the absence of neuronal activity within their policy networks. To quantify this inactivity, we adopt dormant ratio as a metric to measure inactivity in the RL agent's network. Empirically, we also recognize that the dormant ratio can act as a standalone indicator of an agent's activity level, regardless of the received reward signals. Leveraging the aforementioned insights, we introduce DrM, a method that uses three core mechanisms to guide agents' exploration-exploitation trade-offs by actively minimizing the dormant ratio. Experiments demonstrate that DrM achieves significant improvements in sample efficiency and asymptotic performance with no broken seeds (76 seeds in total) across three continuous control benchmark environments, including DeepMind Control Suite, MetaWorld, and Adroit. Most importantly, DrM is the first model-free algorithm that consistently solves tasks in both the Dog and Manipulator domains from the DeepMind Control Suite as well as three dexterous hand manipulation tasks without demonstrations in Adroit, all based on pixel observations

    Adaptive Game Input Using Knowledge of Player Capability: Designing for Individuals with Different Abilities

    Get PDF
    The application of video games has been shown to be valuable in medical interventions, such as the use of Active Video Games in physical therapy. Because patients requiring physical therapy present with both highly variable physical capabilities and unique therapeutic goals, developers of rehabilitation intervention games face the challenge of creating flexible games that they can individualize to each player’s particular needs. This thesis proposes an approach to this problem by identifying and addressing two issues concerning therapy AVG game design. First, regarding the difficulties of individualizing software, a particular complication in the development of AVGs for therapy is the increased complexity of writing input routines based on human body motion, which provides a much larger and more complex domain than traditional, discrete-input game controllers. Second, the primary difficulty in individualizing a therapy game experience to an individual player is that developers must program software with static routines that cannot be modified once compiled and released. Overcoming this aspect of software development is a prime concern that adaptive games research aims to address. The System for Unified Kinematic Input (SUKI) is a software library that addresses both of these concerns. SUKI enables games to adapt to players’ specific therapeutic goals by mapping arbitrary human body movement input to game mechanics at runtime, allowing user-defined body motions to drive gameplay without requiring any change to the software. Additionally, the SUKI library implements a dynamic profile system that alters the game’s configuration based on known physical capabilities of the player and updates this profile based on the player’s demonstrated ability during play. Within the context of the study of adaptive games, the following research presents the details of this approach and demonstrates the versatility and extensibility that it can provide in adapting AVG games to meet individual player needs.https://doi.org/10.17918/D8R94VM.S., Digital Media -- Drexel University, 201

    Visual system identification: learning physical parameters and latent spaces from pixels

    Get PDF
    In this thesis, we develop machine learning systems that are able to leverage the knowledge of equations of motion (scene-specific or scene-agnostic) to perform object discovery, physical parameter estimation, position and velocity estimation, camera pose estimation, and learn structured latent spaces that satisfy physical dynamics rules. These systems are unsupervised, learning from unlabelled videos, and use as inductive biases the general equations of motion followed by objects of interest in the scene. This is an important task as in many complex real world environments ground-truth states are not available, although there is physical knowledge of the underlying system. Our goals with this approach, i.e. integration of physics knowledge with unsupervised learning models, are to improve vision-based prediction, enable new forms of control, increase data-efficiency and provide model interpretability, all of which are key areas of interest in machine learning. With the above goals in mind, we start by asking the following question: given a scene in which the objects’ motions are known up to some physical parameters (e.g. a ball bouncing off the floor with unknown restitution coefficient), how do we build a model that uses such knowledge to discover the objects in the scene and estimate these physical parameters? Our first model, PAIG (Physics-as-Inverse-Graphics), approaches this problem from a vision-as-inverse-graphics perspective, describing the visual scene as a composition of objects defined by their location and appearance, which are rendered onto the frame in a graphics manner. This is a known approach in the unsupervised learning literature, where the fundamental problem then becomes that of derendering, that is, inferring and discovering these locations and appearances for each object. In PAIG we introduce a key rendering component, the Coordinate-Consistent Decoder, which enables the integration of the known equations of motion with an inverse-graphics autoencoder architecture (trainable end-to-end), to perform simultaneous object discovery and physical parameter estimation. Although trained on simple simulated 2D scenes, we show that knowledge of the physical equations of motion of the objects in the scene can be used to greatly improve future prediction and provide physical scene interpretability. Our second model, V-SysId, tackles the limitations shown by the PAIG architecture, namely the training difficulty, the restriction to simulated 2D scenes, and the need for noiseless scenes without distractors. Here, we approach the problem from rst principles by asking the question: are neural networks a necessary component to solve this problem? Can we use simpler ideas from classical computer vision instead? With V- SysId, we approach the problem of object discovery and physical parameter estimation from a keypoint extraction, tracking and selection perspective, composed of 3 separate stages: proposal keypoint extraction and tracking, 3D equation tting and camera pose estimation from 2D trajectories, and entropy-based trajectory selection. Since all the stages use lightweight algorithms and optimisers, V-SysId is able to perform joint object discovery, physical parameter and camera pose estimation from even a single video, drastically improving data-efficiency. Additionally, due to the fact that it does not use a rendering/derendering approach, it can be used in real 3D scenes with many distractor objects. We show that this approach enables a number of interest applications, such as vision-based robot end-effector localisation and remote breath rate measurement. Finally, we move into the area of structured recurrent variational models from vision, where we are motivated by the following observation: in existing models, applying a force in the direction from a start point and an end point (in latent space), does not result in a movement from the start point towards the end point, even on the simplest unconstrained environments. This means that the latent space learned by these models does not follow Newton’s law, where the acceleration vector has the same direction as the force vector (in point-mass systems), and prevents the use of PID controllers, which are the simplest and most well understood type of controller. We solve this problem by building inductive biases from Newtonian physics into the latent variable model, which we call NewtonianVAE. Crucially, Newtonian correctness in the latent space brings about the ability to perform proportional (or PID) control, as opposed to the more computationally expensive model predictive control (MPC). PID controllers are ubiquitous in industrial applications, but had thus far lacked integration with unsupervised vision models. We show that the NewtonianVAE learns physically correct latent spaces in simulated 2D and 3D control systems, which can be used to perform goal-based discovery and control in imitation learning, and path following via Dynamic Motion Primitives

    Enabling Audiovisual User Interfaces

    Get PDF
    “Enabling Audiovisual User Interfaces” was a 2-year project, supported by a Marie Curie EU fellowship, conducted at EAVI, Goldsmiths. During the project, I investigated how human-computer interactions can be audiovisualized in order to improve user experience and usability. To address this issue, a new UI paradigm was proposed – AVUI (AudioVisual User Interface). AVUI links interaction, sound and image, building upon the concept of Graphical User Interface (GUI) by adding interconnected sound and image. The research hypothesis was: the introduction of AVUI, integrating interrelated sonic and visual feedback, reacting to user interactions, would lead to more usable, accessible, playful and engaging UIs, as compared to a traditional GUI – particularly in use cases where accessibility and/or engagement were determinant. I applied AVUIs to case studies, which was the object of user testing. After reaching conclusions from these, I proposed an AVUI framework, including software modules and a set of best practices. Dissemination activities were also implemented

    Descriptive and explanatory tools for human movement and state estimation in humanoid robotics

    Get PDF
    Le sujet principal de cette thèse est le mouvement des systèmes anthropomorphes, et plus particulièrement la locomotion bipède des humains et des robots humanoïdes. Pour caractériser et comprendre la locomotion bipède, il est instructif d'en étudier les causes, qui résident dans le contrôle et l'organisation du mouvement, et les conséquences qui en résultent, que sont le mouvement et les interactions physiques avec l'environnement. Concernant les causes, par exemple, quels sont les principes qui régissent l'organisation des ordres moteurs pour élaborer une stratégie de déplacement spécifique ? Puis, quelles grandeurs physiques pouvons-nous calculer pour décrire au mieux le mouvement résultant de ces commandes motrices ? Ces questions sont en partie abordées par la proposition d'une extension mathématique de l'approche du Uncontrolled Manifold au contrôle moteur de tâches dynamiques, puis par la présentation d'un nouveau descripteur de la locomotion anthropomorphe. En lien avec ce travail analytique vient le problème de l'estimation de l'état pour les systèmes anthropomorphes. La difficulté d'un tel problème vient du fait que les mesures apportent un bruit qui n'est pas toujours séparable des données informatives, et que l'état du système n'est pas nécessairement observable. Pour se débarrasser du bruit, des techniques de filtrage classiques peuvent être employées, mais elles sont susceptibles d'altérer le contenu des signaux d'intérêt. Pour faire face à ce problème, nous présentons une méthode récursive, basée sur le filtrage complémentaire, pour estimer la position du centre de masse et la variation du moment cinétique d'un système en contact, deux quantités centrales de la locomotion bipède. Une autre idée pour se débarrasser du bruit de mesure est de réaliser qu'il résulte en une estimation irréaliste de la dynamique du système. En exploitant les équations du mouvement, qui dictent la dynamique temporelle du système, et en estimant une trajectoire plutôt qu'un point unique, nous présentons ensuite une estimation du maximum de vraisemblance en utilisant l'algorithme de programmation différentielle dynamique pour effectuer une estimation optimale de l'état centroidal des systèmes en contact. Finalement, une réflexion pluridisciplinaire est présentée, sur le rôle fonctionnel et computationnel joué par la tête chez les animaux. La pertinence de son utilisation en robotique mobile y est discutée, pour l'estimation d'état et la perception multisensorielle.The substantive subject of this thesis is the motion of anthropomorphic systems, and more particularly the bipedal locomotion of humans and humanoid robots. To characterize and understand bipedal locomotion, it is instructive to study its motor causes and its resulting physical consequences, namely, the interactions with the environment. Concerning the causes, for instance, what are the principles that govern the organization of motor orders in humans for elaborating a specific displacement strategy? And then, which physical quantities can we compute for best describing the motion resulting from these motor orders ? These questions are in part addressed by the proposal of a mathematical extension of the Uncontrolled Manifold approach for the motor control of dynamic tasks and through the presentation of a new descriptor of anthropomorphic locomotion. In connection with this analytical work, comes the problem of state estimation in anthropomorphic systems. The difficulty of such a problem comes from the fact that the measurements carry noise which is not always separable from the informative data, and that the state of the system is not necessarily observable. To get rid of the noise, classical filtering techniques can be employed but they are likely to distort the signals. To cope with this issue, we present a recursive method, based on complementary filtering, to estimate the position of the center of mass and the angular momentum variation of the human body, two central quantities of human locomotion. Another idea to get rid of the measurements noise is to acknowledge the fact that it results in an unrealistic estimation of the motion dynamics. By exploiting the equations of motion, which dictate the temporal dynamics of the system, and by estimating a trajectory versus a single point, we then present maximum likelihood estimation using the dynamic differential programming algorithm to perform optimal centroidal state estimation for systems in contact. Finally, a multidisciplinary reflection on the functional and computational role played by the head in animals is presented. The relevance of using this solution in mobile robotics is discussed, particularly for state estimation and multisensory perception
    corecore