1,393 research outputs found
EAST: An Efficient and Accurate Scene Text Detector
Previous approaches for scene text detection have already achieved promising
performances across various benchmarks. However, they usually fall short when
dealing with challenging scenarios, even when equipped with deep neural network
models, because the overall performance is determined by the interplay of
multiple stages and components in the pipelines. In this work, we propose a
simple yet powerful pipeline that yields fast and accurate text detection in
natural scenes. The pipeline directly predicts words or text lines of arbitrary
orientations and quadrilateral shapes in full images, eliminating unnecessary
intermediate steps (e.g., candidate aggregation and word partitioning), with a
single neural network. The simplicity of our pipeline allows concentrating
efforts on designing loss functions and neural network architecture.
Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500
demonstrate that the proposed algorithm significantly outperforms
state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR
2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps
at 720p resolution.Comment: Accepted to CVPR 2017, fix equation (3
Solving Bongard Problems with a Visual Language and Pragmatic Reasoning
More than 50 years ago Bongard introduced 100 visual concept learning
problems as a testbed for intelligent vision systems. These problems are now
known as Bongard problems. Although they are well known in the cognitive
science and AI communities only moderate progress has been made towards
building systems that can solve a substantial subset of them. In the system
presented here, visual features are extracted through image processing and then
translated into a symbolic visual vocabulary. We introduce a formal language
that allows representing complex visual concepts based on this vocabulary.
Using this language and Bayesian inference, complex visual concepts can be
induced from the examples that are provided in each Bongard problem. Contrary
to other concept learning problems the examples from which concepts are induced
are not random in Bongard problems, instead they are carefully chosen to
communicate the concept, hence requiring pragmatic reasoning. Taking pragmatic
reasoning into account we find good agreement between the concepts with high
posterior probability and the solutions formulated by Bongard himself. While
this approach is far from solving all Bongard problems, it solves the biggest
fraction yet
Vision Transformer with Quadrangle Attention
Window-based attention has become a popular choice in vision transformers due
to its superior performance, lower computational complexity, and less memory
footprint. However, the design of hand-crafted windows, which is data-agnostic,
constrains the flexibility of transformers to adapt to objects of varying
sizes, shapes, and orientations. To address this issue, we propose a novel
quadrangle attention (QA) method that extends the window-based attention to a
general quadrangle formulation. Our method employs an end-to-end learnable
quadrangle regression module that predicts a transformation matrix to transform
default windows into target quadrangles for token sampling and attention
calculation, enabling the network to model various targets with different
shapes and orientations and capture rich context information. We integrate QA
into plain and hierarchical vision transformers to create a new architecture
named QFormer, which offers minor code modifications and negligible extra
computational cost. Extensive experiments on public benchmarks demonstrate that
QFormer outperforms existing representative vision transformers on various
vision tasks, including classification, object detection, semantic
segmentation, and pose estimation. The code will be made publicly available at
\href{https://github.com/ViTAE-Transformer/QFormer}{QFormer}.Comment: 15 pages, the extension of the ECCV 2022 paper (VSA: Learning
Varied-Size Window Attention in Vision Transformers
Cup products on polyhedral approximations of 3D digital images
Let I be a 3D digital image, and let Q(I) be the associated cubical complex. In this paper we show how to simplify the combinatorial structure of Q(I) and obtain a homeomorphic cellular complex P(I) with fewer cells. We introduce formulas for a diagonal approximation on a general polygon and use it to compute cup products on the cohomology H *(P(I)). The cup product encodes important geometrical information not captured by the cohomology groups. Consequently, the ring structure of H *(P(I)) is a finer topological invariant. The algorithm proposed here can be applied to compute cup products on any polyhedral approximation of an object embedded in 3-space
Combined object recognition approaches for mobile robotics
There are numerous solutions to simple object recognition problems when the machine is operating under strict environmental conditions (such as lighting). Object recognition in real-world environments poses greater difficulty however. Ideally mobile robots will function in real-world environments without the aid of fiduciary identifiers. More robust methods are therefore needed to perform object recognition reliably. A combined approach of multiple techniques improves recognition results. Active vision and peripheral-foveal vision—systems that are designed to improve the information gathered for the purposes of object recognition—are examined. In addition to active vision and peripheral-foveal vision, five object recognition methods that either make use of some form of active vision or could leverage active vision and/or peripheral-foveal vision systems are also investigated: affine-invariant image patches, perceptual organization, 3D morphable models (3DMMs), active viewpoint, and adaptive color segmentation. The current state-of-the-art in these areas of vision research and observations on areas of future research are presented. Examples of state-of-theart methods employed in other vision applications that have not been used for object recognition are also mentioned. Lastly, the future direction of the research field is hypothesized
- …