250 research outputs found
Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization
In this study, we address multi-robot localization issues, with a specific
focus on cooperative localization and observability analysis of relative pose
estimation. Cooperative localization involves enhancing each robot's
information through a communication network and message passing. If odometry
data from a target robot can be transmitted to the ego robot, observability of
their relative pose estimation can be achieved through range-only or
bearing-only measurements, provided both robots have non-zero linear
velocities. In cases where odometry data from a target robot are not directly
transmitted but estimated by the ego robot, both range and bearing measurements
are necessary to ensure observability of relative pose estimation. For
ROS/Gazebo simulations, we explore four sensing and communication structures.
We compare extended Kalman filtering (EKF) and pose graph optimization (PGO)
estimation using different robust loss functions (filtering and smoothing with
varying batch sizes of sliding windows) in terms of estimation accuracy. In
hardware experiments, two Turtlebot3 equipped with UWB modules are used for
real-world inter-robot relative pose estimation, applying both EKF and PGO and
comparing their performance.Comment: 20 pages, 21 figure
Learning to Rank: Online Learning, Statistical Theory and Applications.
Learning to rank is a supervised machine learning problem, where the output space is the special structured space of emph{permutations}. Learning to rank has diverse application areas, spanning information retrieval, recommendation systems, computational biology and others.
In this dissertation, we make contributions to some of the exciting directions of research in learning to rank. In the first part, we extend the classic, online perceptron algorithm for classification to learning to rank, giving a loss bound which is reminiscent of Novikoff's famous convergence theorem for classification. In the second part, we give strategies for learning ranking functions in an online setting, with a novel, feedback model, where feedback is restricted to labels of top ranked items. The second part of our work is divided into two sub-parts; one without side information and one with side information. In the third part, we provide novel generalization error bounds for algorithms applied to various Lipschitz and/or smooth ranking surrogates. In the last part, we apply ranking losses to learn policies for personalized advertisement recommendations, partially overcoming the problem of click sparsity. We conduct experiments on various simulated and commercial datasets, comparing our strategies with baseline strategies for online learning to rank and personalized advertisement recommendation.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133334/1/sougata_1.pd
Consistent Right-Invariant Fixed-Lag Smoother with Application to Visual Inertial SLAM
State estimation problems without absolute position measurements routinely
arise in navigation of unmanned aerial vehicles, autonomous ground vehicles,
etc., whose proper operation relies on accurate state estimates and reliable
covariances. Unaware of absolute positions, these problems have immanent
unobservable directions. Traditional causal estimators, however, usually gain
spurious information on the unobservable directions, leading to over-confident
covariance inconsistent with actual estimator errors. The consistency problem
of fixed-lag smoothers (FLSs) has only been attacked by the first estimate
Jacobian (FEJ) technique because of the complexity to analyze their
observability property. But the FEJ has several drawbacks hampering its wide
adoption. To ensure the consistency of a FLS, this paper introduces the right
invariant error formulation into the FLS framework. To our knowledge, we are
the first to analyze the observability of a FLS with the right invariant error.
Our main contributions are twofold. As the first novelty, to bypass the
complexity of analysis with the classic observability matrix, we show that
observability analysis of FLSs can be done equivalently on the linearized
system. Second, we prove that the inconsistency issue in the traditional FLS
can be elegantly solved by the right invariant error formulation without
artificially correcting Jacobians. By applying the proposed FLS to the
monocular visual inertial simultaneous localization and mapping (SLAM) problem,
we confirm that the method consistently estimates covariance similarly to a
batch smoother in simulation and that our method achieved comparable accuracy
as traditional FLSs on real data.Comment: 13 pages, 4 figures, AAAI 2021 Conferenc
Towards Visual Localization, Mapping and Moving Objects Tracking by a Mobile Robot: a Geometric and Probabilistic Approach
Dans cette thĂšse, nous rĂ©solvons le problĂšme de reconstruire simultanĂ©ment une reprĂ©sentation de la gĂ©omĂ©trie du monde, de la trajectoire de l'observateur, et de la trajectoire des objets mobiles, Ă l'aide de la vision. Nous divisons le problĂšme en trois Ă©tapes : D'abord, nous donnons une solution au problĂšme de la cartographie et localisation simultanĂ©es pour la vision monoculaire qui fonctionne dans les situations les moins bien conditionnĂ©es gĂ©omĂ©triquement. Ensuite, nous incorporons l'observabilitĂ© 3D instantanĂ©e en dupliquant le matĂ©riel de vision avec traitement monoculaire. Ceci Ă©limine les inconvĂ©nients inhĂ©rents aux systĂšmes stĂ©rĂ©o classiques. Nous ajoutons enfin la dĂ©tection et suivi des objets mobiles proches en nous servant de cette observabilitĂ© 3D. Nous choisissons une reprĂ©sentation Ă©parse et ponctuelle du monde et ses objets. La charge calculatoire des algorithmes de perception est allĂ©gĂ©e en focalisant activement l'attention aux rĂ©gions de l'image avec plus d'intĂ©rĂȘt. ABSTRACT : In this thesis we give new means for a machine to understand complex and dynamic visual scenes in real time. In particular, we solve the problem of simultaneously reconstructing a certain representation of the world's geometry, the observer's trajectory, and the moving objects' structures and trajectories, with the aid of vision exteroceptive sensors. We proceeded by dividing the problem into three main steps: First, we give a solution to the Simultaneous Localization And Mapping problem (SLAM) for monocular vision that is able to adequately perform in the most ill-conditioned situations: those where the observer approaches the scene in straight line. Second, we incorporate full 3D instantaneous observability by duplicating vision hardware with monocular algorithms. This permits us to avoid some of the inherent drawbacks of classic stereo systems, notably their limited range of 3D observability and the necessity of frequent mechanical calibration. Third, we add detection and tracking of moving objects by making use of this full 3D observability, whose necessity we judge almost inevitable. We choose a sparse, punctual representation of both the world and the moving objects in order to alleviate the computational payload of the image processing algorithms, which are required to extract the necessary geometrical information out of the images. This alleviation is additionally supported by active feature detection and search mechanisms which focus the attention to those image regions with the highest interest. This focusing is achieved by an extensive exploitation of the current knowledge available on the system (all the mapped information), something that we finally highlight to be the ultimate key to success
Learned Monocular Depth Priors in Visual-Inertial Initialization
Visual-inertial odometry (VIO) is the pose estimation backbone for most AR/VR
and autonomous robotic systems today, in both academia and industry. However,
these systems are highly sensitive to the initialization of key parameters such
as sensor biases, gravity direction, and metric scale. In practical scenarios
where high-parallax or variable acceleration assumptions are rarely met (e.g.
hovering aerial robot, smartphone AR user not gesticulating with phone),
classical visual-inertial initialization formulations often become
ill-conditioned and/or fail to meaningfully converge. In this paper we target
visual-inertial initialization specifically for these low-excitation scenarios
critical to in-the-wild usage. We propose to circumvent the limitations of
classical visual-inertial structure-from-motion (SfM) initialization by
incorporating a new learning-based measurement as a higher-level input. We
leverage learned monocular depth images (mono-depth) to constrain the relative
depth of features, and upgrade the mono-depth to metric scale by jointly
optimizing for its scale and shift. Our experiments show a significant
improvement in problem conditioning compared to a classical formulation for
visual-inertial initialization, and demonstrate significant accuracy and
robustness improvements relative to the state-of-the-art on public benchmarks,
particularly under motion-restricted scenarios. We further extend this
improvement to implementation within an existing odometry system to illustrate
the impact of our improved initialization method on resulting tracking
trajectories
Egocentric Planning for Scalable Embodied Task Achievement
Embodied agents face significant challenges when tasked with performing
actions in diverse environments, particularly in generalizing across object
types and executing suitable actions to accomplish tasks. Furthermore, agents
should exhibit robustness, minimizing the execution of illegal actions. In this
work, we present Egocentric Planning, an innovative approach that combines
symbolic planning and Object-oriented POMDPs to solve tasks in complex
environments, harnessing existing models for visual perception and natural
language processing. We evaluated our approach in ALFRED, a simulated
environment designed for domestic tasks, and demonstrated its high scalability,
achieving an impressive 36.07% unseen success rate in the ALFRED benchmark and
winning the ALFRED challenge at CVPR Embodied AI workshop. Our method requires
reliable perception and the specification or learning of a symbolic description
of the preconditions and effects of the agent's actions, as well as what object
types reveal information about others. It is capable of naturally scaling to
solve new tasks beyond ALFRED, as long as they can be solved using the
available skills. This work offers a solid baseline for studying end-to-end and
hybrid methods that aim to generalize to new tasks, including recent approaches
relying on LLMs, but often struggle to scale to long sequences of actions or
produce robust plans for novel tasks
- âŠ