Search CORE

2,189 research outputs found

Active and Physics-Based Human Pose Reconstruction

Author: Gärtner Erik
Publication venue: Department of Computer Science, Lund University
Publication date: 01/01/2023
Field of study

Perceiving humans is an important and complex problem within computervision. Its significance is derived from its numerous applications, suchas human-robot interaction, virtual reality, markerless motion capture,and human tracking for autonomous driving. The difficulty lies in thevariability in human appearance, physique, and plausible body poses. Inreal-world scenes, this is further exacerbated by difficult lightingconditions, partial occlusions, and the depth ambiguity stemming fromthe loss of information during the 3d to 2d projection. Despite thesechallenges, significant progress has been made in recent years,primarily due to the expressive power of deep neural networks trained onlarge datasets. However, creating large-scale datasets with 3dannotations is expensive, and capturing the vast diversity of the realworld is demanding. Traditionally, 3d ground truth is captured usingmotion capture laboratories that require large investments. Furthermore,many laboratories cannot easily accommodate athletic and dynamicmotions. This thesis studies three approaches to improving visualperception, with emphasis on human pose estimation, that can complementimprovements to the underlying predictor or training data.The first two papers present active human pose estimation, where areinforcement learning agent is tasked with selecting informativeviewpoints to reconstruct subjects efficiently. The papers discard thecommon assumption that the input is given and instead allow the agent tomove to observe subjects from desirable viewpoints, e.g., those whichavoid occlusions and for which the underlying pose estimator has a lowprediction error.The third paper introduces the task of embodied visual active learning,which goes further and assumes that the perceptual model is notpre-trained. Instead, the agent is tasked with exploring its environmentand requesting annotations to refine its visual model. Learning toexplore novel scenarios and efficiently request annotation for new datais a step towards life-long learning, where models can evolve beyondwhat they learned during the initial training phase. We study theproblem for segmentation, though the idea is applicable to otherperception tasks.Lastly, the final two papers propose improving human pose estimation byintegrating physical constraints. These regularize the reconstructedmotions to be physically plausible and serve as a complement to currentkinematic approaches. Whether a motion has been observed in the trainingdata or not, the predictions should obey the laws of physics. Throughintegration with a physical simulator, we demonstrate that we can reducereconstruction artifacts and enforce, e.g., contact constraints

Lund University Publications

Computing fast search heuristics for physics-based mobile robot motion planning

Author: Ferri Federico
Publication venue
Publication date: 07/09/2018
Field of study

Mobile robots are increasingly being employed to assist responders in search and rescue missions. Robots have to navigate in dangerous areas such as collapsed buildings and hazardous sites, which can be inaccessible to humans. Tele-operating the robots can be stressing for the human operators, which are also overloaded with mission tasks and coordination overhead, so it is important to provide the robot with some degree of autonomy, to lighten up the task for the human operator and also to ensure robot safety. Moving robots around requires reasoning, including interpretation of the environment, spatial reasoning, planning of actions (motion), and execution. This is particularly challenging when the environment is unstructured, and the terrain is \textit{harsh}, i.e. not flat and cluttered with obstacles. Approaches reducing the problem to a 2D path planning problem fall short, and many of those who reason about the problem in 3D don't do it in a complete and exhaustive manner. The approach proposed in this thesis is to use rigid body simulation to obtain a more truthful model of the reality, i.e. of the interaction between the robot and the environment. Such a simulation obeys the laws of physics, takes into account the geometry of the environment, the geometry of the robot, and any dynamic constraints that may be in place. The physics-based motion planning approach by itself is also highly intractable due to the computational load required to perform state propagation combined with the exponential blowup of planning; additionally, there are more technical limitations that disallow us to use things such as state sampling or state steering, which are known to be effective in solving the problem in simpler domains. The proposed solution to this problem is to compute heuristics that can bias the search towards the goal, so as to quickly converge towards the solution. With such a model, the search space is a rich space, which can only contain states which are physically reachable by the robot, and also tells us enough information about the safety of the robot itself. The overall result is that by using this framework the robot engineer has a simpler job of encoding the \textit{domain knowledge} which now consists only of providing the robot geometric model plus any constraints

Archivio della ricerca- Università di Roma La Sapienza

Sim2real transfer learning for 3D human pose estimation: motion to the rescue

Author: Doersch Carl
Zisserman Andrew
Publication venue
Publication date: 14/11/2019
Field of study

Synthetic visual data can provide practically infinite diversity and rich labels, while avoiding ethical issues with privacy and bias. However, for many tasks, current models trained on synthetic data generalize poorly to real data. The task of 3D human pose estimation is a particularly interesting example of this sim2real problem, because learning-based approaches perform reasonably well given real training data, yet labeled 3D poses are extremely difficult to obtain in the wild, limiting scalability. In this paper, we show that standard neural-network approaches, which perform poorly when trained on synthetic RGB images, can perform well when the data is pre-processed to extract cues about the person's motion, notably as optical flow and the motion of 2D keypoints. Therefore, our results suggest that motion can be a simple way to bridge a sim2real gap when video is available. We evaluate on the 3D Poses in the Wild dataset, the most challenging modern benchmark for 3D pose estimation, where we show full 3D mesh recovery that is on par with state-of-the-art methods trained on real 3D sequences, despite training only on synthetic humans from the SURREAL dataset.Comment: Accepted at NeurIPS 201

arXiv.org e-Print Archive

Oxford University Research Archive

Vision-based legged robot navigation: localisation, local planning, learning

Author: Mattamala Aravena Matias
Publication venue
Publication date: 07/02/2024
Field of study

The recent advances in legged locomotion control have made legged robots walk up staircases, go deep into underground caves, and walk in the forest. Nevertheless, autonomously achieving this task is still a challenge. Navigating and acomplishing missions in the wild relies not only on robust low-level controllers but also higher-level representations and perceptual systems that are aware of the robot's capabilities. This thesis addresses the navigation problem for legged robots. The contributions are four systems designed to exploit unique characteristics of these platforms, from the sensing setup to their advanced mobility skills over different terrain. The systems address localisation, scene understanding, and local planning, and advance the capabilities of legged robots in challenging environments. The first contribution tackles localisation with multi-camera setups available on legged platforms. It proposes a strategy to actively switch between the cameras and stay localised while operating in a visual teach and repeat context---in spite of transient changes in the environment. The second contribution focuses on local planning, effectively adding a safety layer for robot navigation. The approach uses a local map built on-the-fly to generate efficient vector field representations that enable fast and reactive navigation. The third contribution demonstrates how to improve local planning in natural environments by learning robot-specific traversability from demonstrations. The approach leverages classical and learning-based methods to enable online, onboard traversability learning. These systems are demonstrated via different robot deployments on industrial facilities, underground mines, and parklands. The thesis concludes by presenting a real-world application: an autonomous forest inventory system with legged robots. This last contribution presents a mission planning system for autonomous surveying as well as a data analysis pipeline to extract forestry attributes. The approach was experimentally validated in a field campaign in Finland, evidencing the potential that legged platforms offer for future applications in the wild

Oxford University Research Archive

Octopus-inspired multi-arm robotic swimming

Author: Kazakidi A
Sfakiotakis M
Tsakiris D P
Publication venue: 'IOP Publishing'
Publication date: 13/05/2015
Field of study

The outstanding locomotor and manipulation characteristics of the octopus have recently inspired the development, by our group, of multi-functional robotic swimmers, featuring both manipulation and locomotion capabilities, which could be of significant engineering interest in underwater applications. During its little-studied arm-swimming behavior, as opposed to the better known jetting via the siphon, the animal appears to generate considerable propulsive thrust and rapid acceleration, predominantly employing movements of its arms. In this work, we capture the fundamental characteristics of the corresponding complex pattern of arm motion by a sculling profile, involving a fast power stroke and a slow recovery stroke. We investigate the propulsive capabilities of a multi-arm robotic system under various swimming gaits, namely patterns of arm coordination, which achieve the generation of forward, as well as backward, propulsion and turning. A lumped-element model of the robotic swimmer, which considers arm compliance and the interaction with the aquatic environment, was used to study the characteristics of these gaits, the effect of various kinematic parameters on propulsion, and the generation of complex trajectories. This investigation focuses on relatively high-stiffness arms. Experiments employing a compliant-body robotic prototype swimmer with eight compliant arms, all made of polyurethane, inside a water tank, successfully demonstrated this novel mode of underwater propulsion. Speeds of up to 0.26 body lengths per second (approximately 100 mm s(-1)), and propulsive forces of up to 3.5 N were achieved, with a non-dimensional cost of transport of 1.42 with all eight arms and of 0.9 with only two active arms. The experiments confirmed the computational results and verified the multi-arm maneuverability and simultaneous object grasping capability of such systems

Crossref

University of Strathclyde Institutional Repository

Aberystwyth Research Portal

Recommended from our members

Vision-based Manipulation In-the-Wild

Author: Chi Cheng
Publication venue
Publication date: 01/01/2024
Field of study

Deploying robots in real-world environments involves immense engineering complexity, potentially surpassing the resources required for autonomous vehicles due to the increased dimensionality and task variety. To maximize the chances of successful real-world deployment, finding a simple solution that minimizes engineering complexity at every level, from hardware to algorithm to operations, is crucial. In this dissertation, we consider a vision-based manipulation system that can be deployed in-the-wild when trained to imitate sufficient quantity and diversity of human demonstration data on the desired task. At deployment time, the robot is driven by a single diffusion-based visuomotor policy, with raw RGB images as input and robot end-effector pose as output. Compared to existing policy representations, Diffusion Policy handles multimodal action distributions gracefully, being scalable to high-dimensional action spaces and exhibiting impressive training stability. These properties allow a single software system to be used for multiple tasks, with data collected by multiple demonstrators, deployed to multiple robot embodiments, and without significant hyper-parameter tuning. We developed a Universal Manipulation Interface (UMI), a portable, low-cost, and information-rich data collection system to enable direct manipulation skill learning from in-the-wild human demonstrations. UMI provides an intuitive interface for non-expert users by using hand-held grippers with mounted GoPro cameras. Compared to existing robotic data collection systems, UMI enables robotic data collection without needing a robot, drastically reducing the engineering and operational complexity. Trained with UMI data, the resulting diffusion policies can be deployed across multiple robot platforms in unseen environments for novel objects and to complete dynamic, bimanual, precise, and long-horizon tasks. The Diffusion Policy and UMI combination provides a simple full-stack solution to many manipulation problems. The turn-around time of building a single-task manipulation system (such as object tossing and cloth folding) can be reduced from a few months to a few days

Columbia University Academic Commons

Learning vision-based agile flight: From simulation to the real world

Author: Kaufmann Elia
Publication venue
Publication date: 01/01/2022
Field of study

ZORA

HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios

Author: Busam Benjamin
Garattoni Lorenzo
Jung HyunJun
Meier Sven
Navab Nassir
Rizzoli Giulia
Roth Daniel
Ruhkamp Patrick
Schieber Hannah
Wang Pengyuan
Wu Shun-Cheng
Zhai Guangyao
Zhao Hongcheng
Publication venue
Publication date: 26/04/2023
Field of study

Estimating the 6D pose of objects is a major 3D computer vision problem. Since the promising outcomes from instance-level approaches, research heads also move towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category-level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB and Depth (RGBD+P), 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large-scale scenes with extensive viewpoint coverage and occlusions, 5) Checkerboard-free environment throughout the entire scene, and 6) Additionally annotated dense 6D parallel-jaw grasps. Furthermore, we also provide benchmark results of state-of-the-art category-level pose estimation networks

arXiv.org e-Print Archive

Learning-based methods for planning and control of humanoid robots

Author: VICECONTE PAOLO MARIA
Publication venue
Publication date: 19/05/2023
Field of study

Nowadays, humans and robots are more and more likely to coexist as time goes by. The anthropomorphic nature of humanoid robots facilitates physical human-robot interaction, and makes social human-robot interaction more natural. Moreover, it makes humanoids ideal candidates for many applications related to tasks and environments designed for humans. No matter the application, an ubiquitous requirement for the humanoid is to possess proper locomotion skills. Despite long-lasting research, humanoid locomotion is still far from being a trivial task. A common approach to address humanoid locomotion consists in decomposing its complexity by means of a model-based hierarchical control architecture. To cope with computational constraints, simplified models for the humanoid are employed in some of the architectural layers. At the same time, the redundancy of the humanoid with respect to the locomotion task as well as the closeness of such a task to human locomotion suggest a data-driven approach to learn it directly from experience. This thesis investigates the application of learning-based techniques to planning and control of humanoid locomotion. In particular, both deep reinforcement learning and deep supervised learning are considered to address humanoid locomotion tasks in a crescendo of complexity. First, we employ deep reinforcement learning to study the spontaneous emergence of balancing and push recovery strategies for the humanoid, which represent essential prerequisites for more complex locomotion tasks. Then, by making use of motion capture data collected from human subjects, we employ deep supervised learning to shape the robot walking trajectories towards an improved human-likeness. The proposed approaches are validated on real and simulated humanoid robots. Specifically, on two versions of the iCub humanoid: iCub v2.7 and iCub v3

Archivio della ricerca- Università di Roma La Sapienza