10,229 research outputs found
Improving the matching of deformable objects by learning to detect keypoints
We propose a novel learned keypoint detection method to increase the number
of correct matches for the task of non-rigid image correspondence. By
leveraging true correspondences acquired by matching annotated image pairs with
a specified descriptor extractor, we train an end-to-end convolutional neural
network (CNN) to find keypoint locations that are more appropriate to the
considered descriptor. For that, we apply geometric and photometric warpings to
images to generate a supervisory signal, allowing the optimization of the
detector. Experiments demonstrate that our method enhances the Mean Matching
Accuracy of numerous descriptors when used in conjunction with our detection
method, while outperforming the state-of-the-art keypoint detectors on real
images of non-rigid objects by 20 p.p. We also apply our method on the complex
real-world task of object retrieval where our detector performs on par with the
finest keypoint detectors currently available for this task. The source code
and trained models are publicly available at
https://github.com/verlab/LearningToDetect_PRL_2023Comment: This is the accepted version of the paper to appear at Pattern
Recognition Letters (PRL). The final journal version will be available at
https://doi.org/10.1016/j.patrec.2023.08.01
Policy-Based Planning for Robust Robot Navigation
This thesis proposes techniques for constructing and implementing an extensible navigation framework suitable for operating alongside or in place of traditional navigation systems. Robot navigation is only possible when many subsystems work in tandem such as localization and mapping, motion planning, control, and object tracking. Errors in any one of these subsystems can result in the robot failing to accomplish its task, oftentimes requiring human interventions that diminish the benefits theoretically provided by autonomous robotic systems.
Our first contribution is Direction Approximation through Random Trials (DART), a method for generating human-followable navigation instructions optimized for followability instead of traditional metrics such as path length. We show how this strategy can be extended to robot navigation planning, allowing the robot to compute the sequence of control policies and switching conditions maximizing the likelihood with which the robot will reach its goal. This technique allows robots to select plans based on reliability in addition to efficiency, avoiding error-prone actions or areas of the environment. We also show how DART can be used to build compact, topological maps of its environments, offering opportunities to scale to larger environments.
DART depends on the existence of a set of behaviors and switching conditions describing ways the robot can move through an environment. In the remainder of this thesis, we present methods for learning these behaviors and conditions in indoor environments. To support landmark-based navigation, we show how to train a Convolutional Neural Network (CNN) to distinguish between semantically labeled 2D
occupancy grids generated from LIDAR data. By providing the robot the ability to recognize specific classes of places based on human labels, not only do we support transitioning between control laws, but also provide hooks for human-aided instruction and direction.
Additionally, we suggest a subset of behaviors that provide DART with a sufficient set of actions to navigate in most indoor environments and introduce a method to learn these behaviors from teleloperated demonstrations. Our method learns a cost function suitable for integration into gradient-based control schemes. This enables the robot to execute behaviors in the absence of global knowledge. We present results demonstrating these behaviors working in several environments with varied structure, indicating that they generalize well to new environments.
This work was motivated by the weaknesses and brittleness of many state-of-the-art navigation systems. Reliable navigation is the foundation of any mobile robotic system. It provides access to larger work spaces and enables a wide variety of tasks. Even though navigation systems have continued to improve, catastrophic failures can still occur (e.g. due to an incorrect loop closure) that limit their reliability. Furthermore, as work areas approach the
scale of kilometers, constructing and operating on precise localization maps becomes expensive. These limitations prevent large scale deployments of robots outside of controlled settings and laboratory environments.
The work presented in this thesis is intended to augment or replace traditional navigation systems to mitigate concerns about scalability and reliability by considering the effects of navigation failures for particular actions. By considering these effects when evaluating the actions to take, our framework can adapt navigation strategies to best take advantage of the capabilities of the robot in a given environment. A natural output of our framework is a topological network of actions and switching conditions, providing compact representations of work areas suitable for fast, scalable planning.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144073/1/rgoeddel_1.pd
Detector-Free Structure from Motion
We propose a new structure-from-motion framework to recover accurate camera
poses and point clouds from unordered images. Traditional SfM systems typically
rely on the successful detection of repeatable keypoints across multiple views
as the first step, which is difficult for texture-poor scenes, and poor
keypoint detection may break down the whole SfM system. We propose a new
detector-free SfM framework to draw benefits from the recent success of
detector-free matchers to avoid the early determination of keypoints, while
solving the multi-view inconsistency issue of detector-free matchers.
Specifically, our framework first reconstructs a coarse SfM model from
quantized detector-free matches. Then, it refines the model by a novel
iterative refinement pipeline, which iterates between an attention-based
multi-view matching module to refine feature tracks and a geometry refinement
module to improve the reconstruction accuracy. Experiments demonstrate that the
proposed framework outperforms existing detector-based SfM systems on common
benchmark datasets. We also collect a texture-poor SfM dataset to demonstrate
the capability of our framework to reconstruct texture-poor scenes. Based on
this framework, we take in Image Matching Challenge
2023.Comment: Project page: https://zju3dv.github.io/DetectorFreeSfM
Large scale evaluation of local image feature detectors on homography datasets
We present a large scale benchmark for the evaluation of local feature
detectors. Our key innovation is the introduction of a new evaluation protocol
which extends and improves the standard detection repeatability measure. The
new protocol is better for assessment on a large number of images and reduces
the dependency of the results on unwanted distractors such as the number of
detected features and the feature magnification factor. Additionally, our
protocol provides a comprehensive assessment of the expected performance of
detectors under several practical scenarios. Using images from the
recently-introduced HPatches dataset, we evaluate a range of state-of-the-art
local feature detectors on two main tasks: viewpoint and illumination invariant
detection. Contrary to previous detector evaluations, our study contains an
order of magnitude more image sequences, resulting in a quantitative evaluation
significantly more robust to over-fitting. We also show that traditional
detectors are still very competitive when compared to recent deep-learning
alternatives.Comment: Accepted to BMVC 201
Faster and better: a machine learning approach to corner detection
The repeatability and efficiency of a corner detector determines how likely
it is to be useful in a real-world application. The repeatability is importand
because the same scene viewed from different positions should yield features
which correspond to the same real-world 3D locations [Schmid et al 2000]. The
efficiency is important because this determines whether the detector combined
with further processing can operate at frame rate.
Three advances are described in this paper. First, we present a new heuristic
for feature detection, and using machine learning we derive a feature detector
from this which can fully process live PAL video using less than 5% of the
available processing time. By comparison, most other detectors cannot even
operate at frame rate (Harris detector 115%, SIFT 195%). Second, we generalize
the detector, allowing it to be optimized for repeatability, with little loss
of efficiency. Third, we carry out a rigorous comparison of corner detectors
based on the above repeatability criterion applied to 3D scenes. We show that
despite being principally constructed for speed, on these stringent tests, our
heuristic detector significantly outperforms existing feature detectors.
Finally, the comparison demonstrates that using machine learning produces
significant improvements in repeatability, yielding a detector that is both
very fast and very high quality.Comment: 35 pages, 11 figure
- …