19,016 research outputs found
Rotation-invariant features for multi-oriented text detection in natural images.
Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal) texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects
We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e.
translation and rotation, of texture-less rigid objects. The dataset features
thirty industry-relevant objects with no significant texture and no
discriminative color or reflectance properties. The objects exhibit symmetries
and mutual similarities in shape and/or size. Compared to other datasets, a
unique property is that some of the objects are parts of others. The dataset
includes training and test images that were captured with three synchronized
sensors, specifically a structured-light and a time-of-flight RGB-D sensor and
a high-resolution RGB camera. There are approximately 39K training and 10K test
images from each sensor. Additionally, two types of 3D models are provided for
each object, i.e. a manually created CAD model and a semi-automatically
reconstructed one. Training images depict individual objects against a black
background. Test images originate from twenty test scenes having varying
complexity, which increases from simple scenes with several isolated objects to
very challenging ones with multiple instances of several objects and with a
high amount of clutter and occlusion. The images were captured from a
systematically sampled view sphere around the object/scene, and are annotated
with accurate ground truth 6D poses of all modeled objects. Initial evaluation
results indicate that the state of the art in 6D object pose estimation has
ample room for improvement, especially in difficult cases with significant
occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less.Comment: WACV 201
MOMA: Visual Mobile Marker Odometry
In this paper, we present a cooperative odometry scheme based on the
detection of mobile markers in line with the idea of cooperative positioning
for multiple robots [1]. To this end, we introduce a simple optimization scheme
that realizes visual mobile marker odometry via accurate fixed marker-based
camera positioning and analyse the characteristics of errors inherent to the
method compared to classical fixed marker-based navigation and visual odometry.
In addition, we provide a specific UAV-UGV configuration that allows for
continuous movements of the UAV without doing stops and a minimal
caterpillar-like configuration that works with one UGV alone. Finally, we
present a real-world implementation and evaluation for the proposed UAV-UGV
configuration
- …