970 research outputs found
Real-Time Salient Closed Boundary Tracking via Line Segments Perceptual Grouping
This paper presents a novel real-time method for tracking salient closed
boundaries from video image sequences. This method operates on a set of
straight line segments that are produced by line detection. The tracking scheme
is coherently integrated into a perceptual grouping framework in which the
visual tracking problem is tackled by identifying a subset of these line
segments and connecting them sequentially to form a closed boundary with the
largest saliency and a certain similarity to the previous one. Specifically, we
define a new tracking criterion which combines a grouping cost and an area
similarity constraint. The proposed criterion makes the resulting boundary
tracking more robust to local minima. To achieve real-time tracking
performance, we use Delaunay Triangulation to build a graph model with the
detected line segments and then reduce the tracking problem to finding the
optimal cycle in this graph. This is solved by our newly proposed closed
boundary candidates searching algorithm called "Bidirectional Shortest Path
(BDSP)". The efficiency and robustness of the proposed method are tested on
real video sequences as well as during a robot arm pouring experiment.Comment: 7 pages, 8 figures, The 2017 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS 2017) submission ID 103
Keyframe-based monocular SLAM: design, survey, and future directions
Extensive research in the field of monocular SLAM for the past fifteen years
has yielded workable systems that found their way into various applications in
robotics and augmented reality. Although filter-based monocular SLAM systems
were common at some time, the more efficient keyframe-based solutions are
becoming the de facto methodology for building a monocular SLAM system. The
objective of this paper is threefold: first, the paper serves as a guideline
for people seeking to design their own monocular SLAM according to specific
environmental constraints. Second, it presents a survey that covers the various
keyframe-based monocular SLAM systems in the literature, detailing the
components of their implementation, and critically assessing the specific
strategies made in each proposed solution. Third, the paper provides insight
into the direction of future research in this field, to address the major
limitations still facing monocular SLAM; namely, in the issues of illumination
changes, initialization, highly dynamic motion, poorly textured scenes,
repetitive textures, map maintenance, and failure recovery
Coding local and global binary visual features extracted from video sequences
Binary local features represent an effective alternative to real-valued
descriptors, leading to comparable results for many visual analysis tasks,
while being characterized by significantly lower computational complexity and
memory requirements. When dealing with large collections, a more compact
representation based on global features is often preferred, which can be
obtained from local features by means of, e.g., the Bag-of-Visual-Word (BoVW)
model. Several applications, including for example visual sensor networks and
mobile augmented reality, require visual features to be transmitted over a
bandwidth-limited network, thus calling for coding techniques that aim at
reducing the required bit budget, while attaining a target level of efficiency.
In this paper we investigate a coding scheme tailored to both local and global
binary features, which aims at exploiting both spatial and temporal redundancy
by means of intra- and inter-frame coding. In this respect, the proposed coding
scheme can be conveniently adopted to support the Analyze-Then-Compress (ATC)
paradigm. That is, visual features are extracted from the acquired content,
encoded at remote nodes, and finally transmitted to a central controller that
performs visual analysis. This is in contrast with the traditional approach, in
which visual content is acquired at a node, compressed and then sent to a
central unit for further processing, according to the Compress-Then-Analyze
(CTA) paradigm. In this paper we experimentally compare ATC and CTA by means of
rate-efficiency curves in the context of two different visual analysis tasks:
homography estimation and content-based retrieval. Our results show that the
novel ATC paradigm based on the proposed coding primitives can be competitive
with CTA, especially in bandwidth limited scenarios.Comment: submitted to IEEE Transactions on Image Processin
View point robust visual search technique
In this thesis, we have explored visual search techniques for images taken from diferent view
points and have tried to enhance the matching capability under view point changes. We have proposed
the Homography based back-projection as post-processing stage of Compact Descriptors for
Visual Search (CDVS), the new MPEG standard; moreover, we have deined the aine adapted
scale space based aine detection, which steers the Gaussian scale space to capture the features
from aine transformed images; we have also developed the corresponding gradient based aine
descriptor. Using these proposed techniques, the image retrieval robustness to aine transformations
has been signiicantly improved.
The irst chapter of this thesis introduces the background on visual search.
In the second chapter, we propose a homography based back-projection used as the postprocessing
stage of CDVS to improve the resilience to view point changes. The theory behind
this proposal is that each perspective projection of the image of 2D object can be simulated as an
aine transformation. Each pair of aine transformations are mathematically related by homography
matrix. Given that matrix, the image can be back-projected to simulate the image of another
view point. In this way, the real matched images can then be declared as matching because the perspective
distortion has been reduced by the back-projection. An accurate homography estimation
from the images of diferent view point requires at least 4 correspondences, which could be ofered
by the CDVS pipeline. In this way, the homography based back-projection can be used to scrutinize
the images with not enough matched keypoints. If they contain some homography relations,
the perspective distortion can then be reduced exploiting the few provided correspondences. In the
experiment, this technique has been proved to be quite efective especially to the 2D object images.
The third chapter introduces the scale space, which is also the kernel to the feature detection
for the scale invariant visual search techniques. Scale space, which is made by a series of Gaussian
blurred images, represents the image structures at diferent level of details. The Gaussian smoothed
images in the scale space result in feature detection being not invariant to aine transformations.
That is the reason why scale invariant visual search techniques are sensitive to aine transformations.
Thus, in this chapter, we propose an aine adapted scale space, which employs the aine
steered Gaussian ilters to smooth the images. This scale space is lexible to diferent aine transformations
and it well represents the image structures from diferent view points. With the help of
this structure, the features from diferent view points can be well captured.
In practice, the scale invariant visual search techniques have employed a pyramid structure
to speed up the construction. By employing the aine Gaussian scale space principles, we also
propose two structures to build the aine Gaussian scale space. The structure of aine Gaussian
scale space is similar to the pyramid structure because of the similiar sampling and cascading
iii
properties. Conversely, the aine Laplacian of Gaussian (LoG) structure is completely diferent.
The Laplacian operator, under aine transformation, is hard to be aine deformed. Diferently from
a simple Laplacian operation on the scale space to build the general LoG construction, the aine
LoG can only be obtained by aine LoG convolution and the cascade implementations on the aine
scale space. Using our proposed structures, both the aine Gaussian scale space and aine LoG can
be constructed.
We have also explored the aine scale space implementation in frequency domain. In the second
chapter, we will also explore the spectrum of Gaussian image smoothing under the aine transformation,
and propose two structures. General speaking, the implementation in frequency domain is
more robust to aine transformations at the expense of a higher computational complexity.
It makes sense to adopt an aine descriptor for the aine invariant visual search. In the fourth
chapter, we will propose an aine invariant feature descriptor based on aine gradient. Currently,
the state of the art feature descriptors, including SIFT and Gradient location and orientation histogram
(GLOH), are based on the histogram of image gradient around the detected features. If
the image gradient is calculated as the diference of the adjacent pixels, it will not be aine invariant.
Thus in that chapter, we irst propose an aine gradient which will contribute the aine
invariance to the descriptor. This aine gradient will be calculated directly by the derivative of the
aine Gaussian blurred images. To simplify the processing, we will also create the corresponding
aine Gaussian derivative ilters for diferent detected scales to quickly generate the aine gradient.
With this aine gradient, we can apply the same scheme of SIFT descriptor to generate the
gradient histogram. By normalizing the histogram, the aine descriptor can then be formed. This
aine descriptor is not only aine invariant but also rotation invariant, because the direction of the
area to form the histogram is determined by the main direction of the gradient around the features.
In practice, this aine descriptor is fully aine invariant and its performance for image matching is
extremely good.
In the conclusions chapter, we draw some conclusions and describe some future work
Fast and Interpretable 2D Homography Decomposition: Similarity-Kernel-Similarity and Affine-Core-Affine Transformations
In this paper, we present two fast and interpretable decomposition methods
for 2D homography, which are named Similarity-Kernel-Similarity (SKS) and
Affine-Core-Affine (ACA) transformations respectively. Under the minimal
-point configuration, the first and the last similarity transformations in
SKS are computed by two anchor points on target and source planes,
respectively. Then, the other two point correspondences can be exploited to
compute the middle kernel transformation with only four parameters.
Furthermore, ACA uses three anchor points to compute the first and the last
affine transformations, followed by computation of the middle core
transformation utilizing the other one point correspondence. ACA can compute a
homography up to a scale with only floating-point operations (FLOPs),
without even any division operations. Therefore, as a plug-in module, ACA
facilitates the traditional feature-based Random Sample Consensus (RANSAC)
pipeline, as well as deep homography pipelines estimating -point offsets. In
addition to the advantages of geometric parameterization and computational
efficiency, SKS and ACA can express each element of homography by a polynomial
of input coordinates (th degree to th degree), extend the existing
essential Similarity-Affine-Projective (SAP) decomposition and calculate 2D
affine transformations in a unified way. Source codes are released in
https://github.com/cscvlab/SKS-Homography
A practical multirobot localization system
We present a fast and precise vision-based software intended for multiple robot localization. The core component of the software is a novel and efficient algorithm for black and white pattern detection. The method is robust to variable lighting conditions, achieves sub-pixel precision and its computational complexity is independent of the processed image size. With off-the-shelf computational equipment and low-cost cameras, the core algorithm is able to process hundreds of images per second while tracking hundreds of objects with a millimeter precision. In addition, we present the method's mathematical model, which allows to estimate the expected localization precision, area of coverage, and processing speed from the camera's intrinsic parameters and hardware's processing capacity. The correctness of the presented model and performance of the algorithm in real-world conditions is verified in several experiments. Apart from the method description, we also make its source code public at \emph{http://purl.org/robotics/whycon}; so, it can be used as an enabling technology for various mobile robotic problems
- …