4,665 research outputs found

    Keyframe-based visual–inertial odometry using nonlinear optimization

    Get PDF
    Combining visual and inertial measurements has become popular in mobile robotics, since the two sensing modalities offer complementary characteristics that make them the ideal choice for accurate visual–inertial odometry or simultaneous localization and mapping (SLAM). While historically the problem has been addressed with filtering, advancements in visual estimation suggest that nonlinear optimization offers superior accuracy, while still tractable in complexity thanks to the sparsity of the underlying problem. Taking inspiration from these findings, we formulate a rigorously probabilistic cost function that combines reprojection errors of landmarks and inertial terms. The problem is kept tractable and thus ensuring real-time operation by limiting the optimization to a bounded window of keyframes through marginalization. Keyframes may be spaced in time by arbitrary intervals, while still related by linearized inertial terms. We present evaluation results on complementary datasets recorded with our custom-built stereo visual–inertial hardware that accurately synchronizes accelerometer and gyroscope measurements with imagery. A comparison of both a stereo and monocular version of our algorithm with and without online extrinsics estimation is shown with respect to ground truth. Furthermore, we compare the performance to an implementation of a state-of-the-art stochastic cloning sliding-window filter. This competitive reference implementation performs tightly coupled filtering-based visual–inertial odometry. While our approach declaredly demands more computation, we show its superior performance in terms of accuracy

    Learning Human Pose Estimation Features with Convolutional Networks

    Full text link
    This paper introduces a new architecture for human pose estimation using a multi- layer convolutional network architecture and a modified learning technique that learns low-level features and higher-level weak spatial models. Unconstrained human pose estimation is one of the hardest problems in computer vision, and our new architecture and learning schema shows significant improvement over the current state-of-the-art results. The main contribution of this paper is showing, for the first time, that a specific variation of deep learning is able to outperform all existing traditional architectures on this task. The paper also discusses several lessons learned while researching alternatives, most notably, that it is possible to learn strong low-level feature detectors on features that might even just cover a few pixels in the image. Higher-level spatial models improve somewhat the overall result, but to a much lesser extent then expected. Many researchers previously argued that the kinematic structure and top-down information is crucial for this domain, but with our purely bottom up, and weak spatial model, we could improve other more complicated architectures that currently produce the best results. This mirrors what many other researchers, like those in the speech recognition, object recognition, and other domains have experienced

    The Three-Dimensional Shapes of Galaxy Clusters

    Full text link
    While clusters of galaxies are considered one of the most important cosmological probes, the standard spherical modelling of the dark matter and the intracluster medium is only a rough approximation. Indeed, it is well established both theoretically and observationally that galaxy clusters are much better approximated as triaxial objects. However, investigating the asphericity of galaxy clusters is still in its infancy. We review here this topic which is currently gathering a growing interest from the cluster community. We begin by introducing the triaxial geometry. Then we discuss the topic of deprojection and demonstrate the need for combining different probes of the cluster's potential. We discuss the different works that have been addressing these issues. We present a general parametric framework intended to simultaneously fit complementary data sets (X-ray, Sunyaev Zel'dovich and lensing data). We discuss in details the case of Abell 1689 to show how different models/data sets lead to different haloe parameters. We present the results obtained from fitting a 3D NFW model to X-ray, SZ, and lensing data for 4 strong lensing clusters. We argue that a triaxial model generally allows to lower the inferred value of the concentration parameter compared to a spherical analysis. This may alleviate tensions regarding, e.g. the over-concentration problem. However, we stress that predictions from numerical simulations rely on a spherical analysis of triaxial halos. Given that triaxial analysis will have a growing importance in the observational side, we advocate the need for simulations to be analysed in the very same way, allowing reliable and meaningful comparisons. Besides, methods intended to derive the three dimensional shape of galaxy clusters should be extensively tested on simulated multi-wavelength observations.Comment: (Biased) Review paper. Comments welcome. Accepted for publication in Space Science Reviews. This is a product of the work done by an international team at the International Space Science Institute (ISSI) in Bern on "Astrophysics and Cosmology with Galaxy Clusters: the X-ray and lensing view

    Towards Object-Centric Scene Understanding

    Get PDF
    Visual perception for autonomous agents continues to attract community attention due to the disruptive technologies and the wide applicability of such solutions. Autonomous Driving (AD), a major application in this domain, promises to revolutionize our approach to mobility while bringing critical advantages in limiting accident fatalities. Fueled by recent advances in Deep Learning (DL), more computer vision tasks are being addressed using a learning paradigm. Deep Neural Networks (DNNs) succeeded consistently in pushing performances to unprecedented levels and demonstrating the ability of such approaches to generalize to an increasing number of difficult problems, such as 3D vision tasks. In this thesis, we address two main challenges arising from the current approaches. Namely, the computational complexity of multi-task pipelines, and the increasing need for manual annotations. On the one hand, AD systems need to perceive the surrounding environment on different levels of detail and, subsequently, take timely actions. This multitasking further limits the time available for each perception task. On the other hand, the need for universal generalization of such systems to massively diverse situations requires the use of large-scale datasets covering long-tailed cases. Such requirement renders the use of traditional supervised approaches, despite the data readily available in the AD domain, unsustainable in terms of annotation costs, especially for 3D tasks. Driven by the AD environment nature and the complexity dominated (unlike indoor scenes) by the presence of other scene elements (mainly cars and pedestrians) we focus on the above-mentioned challenges in object-centric tasks. We, then, situate our contributions appropriately in fast-paced literature, while supporting our claims with extensive experimental analysis leveraging up-to-date state-of-the-art results and community-adopted benchmarks
    • …
    corecore