Search CORE

1,393 research outputs found

EAST: An Efficient and Accurate Scene Text Detector

Author: He Weiran
Liang Jiajun
Wang Yuzhi
Wen He
Yao Cong
Zhou Shuchang
Zhou Xinyu
Publication venue
Publication date: 10/07/2017
Field of study

Previous approaches for scene text detection have already achieved promising performances across various benchmarks. However, they usually fall short when dealing with challenging scenarios, even when equipped with deep neural network models, because the overall performance is determined by the interplay of multiple stages and components in the pipelines. In this work, we propose a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes. The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps (e.g., candidate aggregation and word partitioning), with a single neural network. The simplicity of our pipeline allows concentrating efforts on designing loss functions and neural network architecture. Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR 2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps at 720p resolution.Comment: Accepted to CVPR 2017, fix equation (3

arXiv.org e-Print Archive

Solving Bongard Problems with a Visual Language and Pragmatic Reasoning

Author: Depeweg Stefan
Jäkel Frank
Rothkopf Constantin A.
Publication venue
Publication date: 01/01/2018
Field of study

More than 50 years ago Bongard introduced 100 visual concept learning problems as a testbed for intelligent vision systems. These problems are now known as Bongard problems. Although they are well known in the cognitive science and AI communities only moderate progress has been made towards building systems that can solve a substantial subset of them. In the system presented here, visual features are extracted through image processing and then translated into a symbolic visual vocabulary. We introduce a formal language that allows representing complex visual concepts based on this vocabulary. Using this language and Bayesian inference, complex visual concepts can be induced from the examples that are provided in each Bongard problem. Contrary to other concept learning problems the examples from which concepts are induced are not random in Bongard problems, instead they are carefully chosen to communicate the concept, hence requiring pragmatic reasoning. Taking pragmatic reasoning into account we find good agreement between the concepts with high posterior probability and the solutions formulated by Bongard himself. While this approach is far from solving all Bongard problems, it solves the biggest fraction yet

arXiv.org e-Print Archive

Vision Transformer with Quadrangle Attention

Author: Tao Dacheng
Xu Yufei
Zhang Jing
Zhang Qiming
Publication venue
Publication date: 27/03/2023
Field of study

Window-based attention has become a popular choice in vision transformers due to its superior performance, lower computational complexity, and less memory footprint. However, the design of hand-crafted windows, which is data-agnostic, constrains the flexibility of transformers to adapt to objects of varying sizes, shapes, and orientations. To address this issue, we propose a novel quadrangle attention (QA) method that extends the window-based attention to a general quadrangle formulation. Our method employs an end-to-end learnable quadrangle regression module that predicts a transformation matrix to transform default windows into target quadrangles for token sampling and attention calculation, enabling the network to model various targets with different shapes and orientations and capture rich context information. We integrate QA into plain and hierarchical vision transformers to create a new architecture named QFormer, which offers minor code modifications and negligible extra computational cost. Extensive experiments on public benchmarks demonstrate that QFormer outperforms existing representative vision transformers on various vision tasks, including classification, object detection, semantic segmentation, and pose estimation. The code will be made publicly available at \href{https://github.com/ViTAE-Transformer/QFormer}{QFormer}.Comment: 15 pages, the extension of the ECCV 2022 paper (VSA: Learning Varied-Size Window Attention in Vision Transformers

arXiv.org e-Print Archive

Cup products on polyhedral approximations of 3D digital images

Author: A. Hatcher
G. Das
H. Brönnimann
H. Schulz
J.P. Serre
J.R. Munkres
L.J. Latecki
P.K. Argawal
R. Gonzalez-Diaz
R. Gonzalez-Diaz
R. Gonzalez-Diaz
R. Gonzalez-Diaz
R. Gonzalez–Diaz
S. Peltier
V.A. Kovalevsky
Publication venue
Publication date: 01/01/2011
Field of study

Let I be a 3D digital image, and let Q(I) be the associated cubical complex. In this paper we show how to simplify the combinatorial structure of Q(I) and obtain a homeomorphic cellular complex P(I) with fewer cells. We introduce formulas for a diagonal approximation on a general polygon and use it to compute cup products on the cohomology H *(P(I)). The cup product encodes important geometrical information not captured by the cohomology groups. Consequently, the ring structure of H *(P(I)) is a finer topological invariant. The algorithm proposed here can be applied to compute cup products on any polyhedral approximation of an object embedded in 3-space

Combined object recognition approaches for mobile robotics

Author: Gerard Rusty
Publication venue: Western CEDAR
Publication date: 01/01/2008
Field of study

There are numerous solutions to simple object recognition problems when the machine is operating under strict environmental conditions (such as lighting). Object recognition in real-world environments poses greater difficulty however. Ideally mobile robots will function in real-world environments without the aid of fiduciary identifiers. More robust methods are therefore needed to perform object recognition reliably. A combined approach of multiple techniques improves recognition results. Active vision and peripheral-foveal vision—systems that are designed to improve the information gathered for the purposes of object recognition—are examined. In addition to active vision and peripheral-foveal vision, five object recognition methods that either make use of some form of active vision or could leverage active vision and/or peripheral-foveal vision systems are also investigated: affine-invariant image patches, perceptual organization, 3D morphable models (3DMMs), active viewpoint, and adaptive color segmentation. The current state-of-the-art in these areas of vision research and observations on areas of future research are presented. Examples of state-of-theart methods employed in other vision applications that have not been used for object recognition are also mentioned. Lastly, the future direction of the research field is hypothesized