2,189 research outputs found

    Active and Physics-Based Human Pose Reconstruction

    Get PDF
    Perceiving humans is an important and complex problem within computervision. Its significance is derived from its numerous applications, suchas human-robot interaction, virtual reality, markerless motion capture,and human tracking for autonomous driving. The difficulty lies in thevariability in human appearance, physique, and plausible body poses. Inreal-world scenes, this is further exacerbated by difficult lightingconditions, partial occlusions, and the depth ambiguity stemming fromthe loss of information during the 3d to 2d projection. Despite thesechallenges, significant progress has been made in recent years,primarily due to the expressive power of deep neural networks trained onlarge datasets. However, creating large-scale datasets with 3dannotations is expensive, and capturing the vast diversity of the realworld is demanding. Traditionally, 3d ground truth is captured usingmotion capture laboratories that require large investments. Furthermore,many laboratories cannot easily accommodate athletic and dynamicmotions. This thesis studies three approaches to improving visualperception, with emphasis on human pose estimation, that can complementimprovements to the underlying predictor or training data.The first two papers present active human pose estimation, where areinforcement learning agent is tasked with selecting informativeviewpoints to reconstruct subjects efficiently. The papers discard thecommon assumption that the input is given and instead allow the agent tomove to observe subjects from desirable viewpoints, e.g., those whichavoid occlusions and for which the underlying pose estimator has a lowprediction error.The third paper introduces the task of embodied visual active learning,which goes further and assumes that the perceptual model is notpre-trained. Instead, the agent is tasked with exploring its environmentand requesting annotations to refine its visual model. Learning toexplore novel scenarios and efficiently request annotation for new datais a step towards life-long learning, where models can evolve beyondwhat they learned during the initial training phase. We study theproblem for segmentation, though the idea is applicable to otherperception tasks.Lastly, the final two papers propose improving human pose estimation byintegrating physical constraints. These regularize the reconstructedmotions to be physically plausible and serve as a complement to currentkinematic approaches. Whether a motion has been observed in the trainingdata or not, the predictions should obey the laws of physics. Throughintegration with a physical simulator, we demonstrate that we can reducereconstruction artifacts and enforce, e.g., contact constraints

    Computing fast search heuristics for physics-based mobile robot motion planning

    Get PDF
    Mobile robots are increasingly being employed to assist responders in search and rescue missions. Robots have to navigate in dangerous areas such as collapsed buildings and hazardous sites, which can be inaccessible to humans. Tele-operating the robots can be stressing for the human operators, which are also overloaded with mission tasks and coordination overhead, so it is important to provide the robot with some degree of autonomy, to lighten up the task for the human operator and also to ensure robot safety. Moving robots around requires reasoning, including interpretation of the environment, spatial reasoning, planning of actions (motion), and execution. This is particularly challenging when the environment is unstructured, and the terrain is \textit{harsh}, i.e. not flat and cluttered with obstacles. Approaches reducing the problem to a 2D path planning problem fall short, and many of those who reason about the problem in 3D don't do it in a complete and exhaustive manner. The approach proposed in this thesis is to use rigid body simulation to obtain a more truthful model of the reality, i.e. of the interaction between the robot and the environment. Such a simulation obeys the laws of physics, takes into account the geometry of the environment, the geometry of the robot, and any dynamic constraints that may be in place. The physics-based motion planning approach by itself is also highly intractable due to the computational load required to perform state propagation combined with the exponential blowup of planning; additionally, there are more technical limitations that disallow us to use things such as state sampling or state steering, which are known to be effective in solving the problem in simpler domains. The proposed solution to this problem is to compute heuristics that can bias the search towards the goal, so as to quickly converge towards the solution. With such a model, the search space is a rich space, which can only contain states which are physically reachable by the robot, and also tells us enough information about the safety of the robot itself. The overall result is that by using this framework the robot engineer has a simpler job of encoding the \textit{domain knowledge} which now consists only of providing the robot geometric model plus any constraints

    Sim2real transfer learning for 3D human pose estimation: motion to the rescue

    Full text link
    Synthetic visual data can provide practically infinite diversity and rich labels, while avoiding ethical issues with privacy and bias. However, for many tasks, current models trained on synthetic data generalize poorly to real data. The task of 3D human pose estimation is a particularly interesting example of this sim2real problem, because learning-based approaches perform reasonably well given real training data, yet labeled 3D poses are extremely difficult to obtain in the wild, limiting scalability. In this paper, we show that standard neural-network approaches, which perform poorly when trained on synthetic RGB images, can perform well when the data is pre-processed to extract cues about the person's motion, notably as optical flow and the motion of 2D keypoints. Therefore, our results suggest that motion can be a simple way to bridge a sim2real gap when video is available. We evaluate on the 3D Poses in the Wild dataset, the most challenging modern benchmark for 3D pose estimation, where we show full 3D mesh recovery that is on par with state-of-the-art methods trained on real 3D sequences, despite training only on synthetic humans from the SURREAL dataset.Comment: Accepted at NeurIPS 201

    Vision-based legged robot navigation: localisation, local planning, learning

    Get PDF
    The recent advances in legged locomotion control have made legged robots walk up staircases, go deep into underground caves, and walk in the forest. Nevertheless, autonomously achieving this task is still a challenge. Navigating and acomplishing missions in the wild relies not only on robust low-level controllers but also higher-level representations and perceptual systems that are aware of the robot's capabilities. This thesis addresses the navigation problem for legged robots. The contributions are four systems designed to exploit unique characteristics of these platforms, from the sensing setup to their advanced mobility skills over different terrain. The systems address localisation, scene understanding, and local planning, and advance the capabilities of legged robots in challenging environments. The first contribution tackles localisation with multi-camera setups available on legged platforms. It proposes a strategy to actively switch between the cameras and stay localised while operating in a visual teach and repeat context---in spite of transient changes in the environment. The second contribution focuses on local planning, effectively adding a safety layer for robot navigation. The approach uses a local map built on-the-fly to generate efficient vector field representations that enable fast and reactive navigation. The third contribution demonstrates how to improve local planning in natural environments by learning robot-specific traversability from demonstrations. The approach leverages classical and learning-based methods to enable online, onboard traversability learning. These systems are demonstrated via different robot deployments on industrial facilities, underground mines, and parklands. The thesis concludes by presenting a real-world application: an autonomous forest inventory system with legged robots. This last contribution presents a mission planning system for autonomous surveying as well as a data analysis pipeline to extract forestry attributes. The approach was experimentally validated in a field campaign in Finland, evidencing the potential that legged platforms offer for future applications in the wild

    Octopus-inspired multi-arm robotic swimming

    Get PDF
    The outstanding locomotor and manipulation characteristics of the octopus have recently inspired the development, by our group, of multi-functional robotic swimmers, featuring both manipulation and locomotion capabilities, which could be of significant engineering interest in underwater applications. During its little-studied arm-swimming behavior, as opposed to the better known jetting via the siphon, the animal appears to generate considerable propulsive thrust and rapid acceleration, predominantly employing movements of its arms. In this work, we capture the fundamental characteristics of the corresponding complex pattern of arm motion by a sculling profile, involving a fast power stroke and a slow recovery stroke. We investigate the propulsive capabilities of a multi-arm robotic system under various swimming gaits, namely patterns of arm coordination, which achieve the generation of forward, as well as backward, propulsion and turning. A lumped-element model of the robotic swimmer, which considers arm compliance and the interaction with the aquatic environment, was used to study the characteristics of these gaits, the effect of various kinematic parameters on propulsion, and the generation of complex trajectories. This investigation focuses on relatively high-stiffness arms. Experiments employing a compliant-body robotic prototype swimmer with eight compliant arms, all made of polyurethane, inside a water tank, successfully demonstrated this novel mode of underwater propulsion. Speeds of up to 0.26 body lengths per second (approximately 100 mm s(-1)), and propulsive forces of up to 3.5 N were achieved, with a non-dimensional cost of transport of 1.42 with all eight arms and of 0.9 with only two active arms. The experiments confirmed the computational results and verified the multi-arm maneuverability and simultaneous object grasping capability of such systems

    Learning vision-based agile flight: From simulation to the real world

    Get PDF

    HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios

    Full text link
    Estimating the 6D pose of objects is a major 3D computer vision problem. Since the promising outcomes from instance-level approaches, research heads also move towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category-level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB and Depth (RGBD+P), 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large-scale scenes with extensive viewpoint coverage and occlusions, 5) Checkerboard-free environment throughout the entire scene, and 6) Additionally annotated dense 6D parallel-jaw grasps. Furthermore, we also provide benchmark results of state-of-the-art category-level pose estimation networks

    Learning-based methods for planning and control of humanoid robots

    Get PDF
    Nowadays, humans and robots are more and more likely to coexist as time goes by. The anthropomorphic nature of humanoid robots facilitates physical human-robot interaction, and makes social human-robot interaction more natural. Moreover, it makes humanoids ideal candidates for many applications related to tasks and environments designed for humans. No matter the application, an ubiquitous requirement for the humanoid is to possess proper locomotion skills. Despite long-lasting research, humanoid locomotion is still far from being a trivial task. A common approach to address humanoid locomotion consists in decomposing its complexity by means of a model-based hierarchical control architecture. To cope with computational constraints, simplified models for the humanoid are employed in some of the architectural layers. At the same time, the redundancy of the humanoid with respect to the locomotion task as well as the closeness of such a task to human locomotion suggest a data-driven approach to learn it directly from experience. This thesis investigates the application of learning-based techniques to planning and control of humanoid locomotion. In particular, both deep reinforcement learning and deep supervised learning are considered to address humanoid locomotion tasks in a crescendo of complexity. First, we employ deep reinforcement learning to study the spontaneous emergence of balancing and push recovery strategies for the humanoid, which represent essential prerequisites for more complex locomotion tasks. Then, by making use of motion capture data collected from human subjects, we employ deep supervised learning to shape the robot walking trajectories towards an improved human-likeness. The proposed approaches are validated on real and simulated humanoid robots. Specifically, on two versions of the iCub humanoid: iCub v2.7 and iCub v3
    • …
    corecore