2,546 research outputs found

    High Level Learning Using the Temporal Features of Human Demonstrated Sequential Tasks

    Get PDF
    Modelling human-led demonstrations of high-level sequential tasks is fundamental to a number of practical inference applications including vision-based policy learning and activity recognition. Demonstrations of these tasks are captured as videos with long durations and similar spatial contents. Learning from this data is challenging since inference cannot be conducted solely on spatial feature presence and must instead consider how spatial features play out across time. To be successful these temporal representations must generalize to variations in the duration of activities and be able to capture relationships between events expressed across the scale of an entire video. Contemporary deep learning architectures that represent time (convolution-based and Recurrent Neural Networks) do not address these concerns. Representations learned by these models describe temporal features in terms of fixed durations such as minutes, seconds, and frames. They are also developed sequentially and must use unreasonably large models to capture temporal features expressed at scale. Probabilistic temporal models have been successful in representing the temporal information of videos in a duration invariant manner that is robust to scale, however, this has only been accomplished through the use of user-defined spatial features. Such abstractions make unrealistic assumptions about the content being expressed in these videos, the quality of the perception model, and they also limit the potential applications of trained models. To that end, I present D-ITR-L, a temporal wrapper that extends the spatial features extracted from a typically CNN architecture and transforms them into temporal features. D-ITR-L-derived temporal features are duration invariant and can identify temporal relationships between events at the scale of a full video. Validation of this claim is conducted through various vision-based policy learning and action recognition settings. Additionally, these studies show that challenging visual domains such as human-led demonstration of high-level sequential tasks can be effectively represented when using a D-ITR-L-based model

    High Level Learning Using the Temporal Features of Human Demonstrated Sequential Tasks

    Get PDF
    Modelling human-led demonstrations of high-level sequential tasks is fundamental to a number of practical inference applications including vision-based policy learning and activity recognition. Demonstrations of these tasks are captured as videos with long durations and similar spatial contents. Learning from this data is challenging since inference cannot be conducted solely on spatial feature presence and must instead consider how spatial features play out across time. To be successful these temporal representations must generalize to variations in the duration of activities and be able to capture relationships between events expressed across the scale of an entire video. Contemporary deep learning architectures that represent time (convolution-based and Recurrent Neural Networks) do not address these concerns. Representations learned by these models describe temporal features in terms of fixed durations such as minutes, seconds, and frames. They are also developed sequentially and must use unreasonably large models to capture temporal features expressed at scale. Probabilistic temporal models have been successful in representing the temporal information of videos in a duration invariant manner that is robust to scale, however, this has only been accomplished through the use of user-defined spatial features. Such abstractions make unrealistic assumptions about the content being expressed in these videos, the quality of the perception model, and they also limit the potential applications of trained models. To that end, I present D-ITR-L, a temporal wrapper that extends the spatial features extracted from a typically CNN architecture and transforms them into temporal features. D-ITR-L-derived temporal features are duration invariant and can identify temporal relationships between events at the scale of a full video. Validation of this claim is conducted through various vision-based policy learning and action recognition settings. Additionally, these studies show that challenging visual domains such as human-led demonstration of high-level sequential tasks can be effectively represented when using a D-ITR-L-based model

    Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems

    Get PDF
    Recent successes combine reinforcement learning algorithms and deep neural networks, despite reinforcement learning not being widely applied to robotics and real world scenarios. This can be attributed to the fact that current state-of-the-art, end-to-end reinforcement learning approaches still require thousands or millions of data samples to converge to a satisfactory policy and are subject to catastrophic failures during training. Conversely, in real world scenarios and after just a few data samples, humans are able to either provide demonstrations of the task, intervene to prevent catastrophic actions, or simply evaluate if the policy is performing correctly. This research investigates how to integrate these human interaction modalities to the reinforcement learning loop, increasing sample efficiency and enabling real-time reinforcement learning in robotics and real world scenarios. This novel theoretical foundation is called Cycle-of-Learning, a reference to how different human interaction modalities, namely, task demonstration, intervention, and evaluation, are cycled and combined to reinforcement learning algorithms. Results presented in this work show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms and that learning from a combination of human demonstrations and interventions is faster and more sample efficient when compared to traditional supervised learning algorithms. Finally, Cycle-of-Learning develops an effective transition between policies learned using human demonstrations and interventions to reinforcement learning. The theoretical foundation developed by this research opens new research paths to human-agent teaming scenarios where autonomous agents are able to learn from human teammates and adapt to mission performance metrics in real-time and in real world scenarios.Comment: PhD thesis, Aerospace Engineering, Texas A&M (2020). For more information, see https://vggoecks.com

    Context-aware learning for robot-assisted endovascular catheterization

    Get PDF
    Endovascular intervention has become a mainstream treatment of cardiovascular diseases. However, multiple challenges remain such as unwanted radiation exposures, limited two-dimensional image guidance, insufficient force perception and haptic cues. Fast evolving robot-assisted platforms improve the stability and accuracy of instrument manipulation. The master-slave system also removes radiation to the operator. However, the integration of robotic systems into the current surgical workflow is still debatable since repetitive, easy tasks have little value to be executed by the robotic teleoperation. Current systems offer very low autonomy, potential autonomous features could bring more benefits such as reduced cognitive workloads and human error, safer and more consistent instrument manipulation, ability to incorporate various medical imaging and sensing modalities. This research proposes frameworks for automated catheterisation with different machine learning-based algorithms, includes Learning-from-Demonstration, Reinforcement Learning, and Imitation Learning. Those frameworks focused on integrating context for tasks in the process of skill learning, hence achieving better adaptation to different situations and safer tool-tissue interactions. Furthermore, the autonomous feature was applied to next-generation, MR-safe robotic catheterisation platform. The results provide important insights into improving catheter navigation in the form of autonomous task planning, self-optimization with clinical relevant factors, and motivate the design of intelligent, intuitive, and collaborative robots under non-ionizing image modalities.Open Acces

    Towards Target-Driven Visual Navigation in Indoor Scenes via Generative Imitation Learning

    Full text link
    We present a target-driven navigation system to improve mapless visual navigation in indoor scenes. Our method takes a multi-view observation of a robot and a target as inputs at each time step to provide a sequence of actions that move the robot to the target without relying on odometry or GPS at runtime. The system is learned by optimizing a combinational objective encompassing three key designs. First, we propose that an agent conceives the next observation before making an action decision. This is achieved by learning a variational generative module from expert demonstrations. We then propose predicting static collision in advance, as an auxiliary task to improve safety during navigation. Moreover, to alleviate the training data imbalance problem of termination action prediction, we also introduce a target checking module to differentiate from augmenting navigation policy with a termination action. The three proposed designs all contribute to the improved training data efficiency, static collision avoidance, and navigation generalization performance, resulting in a novel target-driven mapless navigation system. Through experiments on a TurtleBot, we provide evidence that our model can be integrated into a robotic system and navigate in the real world. Videos and models can be found in the supplementary material.Comment: 11 pages, accepted by IEEE Robotics and Automation Letter

    Human-Robot Collaborations in Industrial Automation

    Get PDF
    Technology is changing the manufacturing world. For example, sensors are being used to track inventories from the manufacturing floor up to a retail shelf or a customer’s door. These types of interconnected systems have been called the fourth industrial revolution, also known as Industry 4.0, and are projected to lower manufacturing costs. As industry moves toward these integrated technologies and lower costs, engineers will need to connect these systems via the Internet of Things (IoT). These engineers will also need to design how these connected systems interact with humans. The focus of this Special Issue is the smart sensors used in these human–robot collaborations

    Toward Data-Driven Digital Therapeutics Analytics: Literature Review and Research Directions

    Full text link
    With the advent of Digital Therapeutics (DTx), the development of software as a medical device (SaMD) for mobile and wearable devices has gained significant attention in recent years. Existing DTx evaluations, such as randomized clinical trials, mostly focus on verifying the effectiveness of DTx products. To acquire a deeper understanding of DTx engagement and behavioral adherence, beyond efficacy, a large amount of contextual and interaction data from mobile and wearable devices during field deployment would be required for analysis. In this work, the overall flow of the data-driven DTx analytics is reviewed to help researchers and practitioners to explore DTx datasets, to investigate contextual patterns associated with DTx usage, and to establish the (causal) relationship of DTx engagement and behavioral adherence. This review of the key components of data-driven analytics provides novel research directions in the analysis of mobile sensor and interaction datasets, which helps to iteratively improve the receptivity of existing DTx.Comment: This paper has been accepted by the IEEE/CAA Journal of Automatica Sinic
    • …
    corecore