24,630 research outputs found
Robust Table Detection and Structure Recognition from Heterogeneous Document Images
We introduce a new table detection and structure recognition approach named
RobusTabNet to detect the boundaries of tables and reconstruct the cellular
structure of each table from heterogeneous document images. For table
detection, we propose to use CornerNet as a new region proposal network to
generate higher quality table proposals for Faster R-CNN, which has
significantly improved the localization accuracy of Faster R-CNN for table
detection. Consequently, our table detection approach achieves state-of-the-art
performance on three public table detection benchmarks, namely cTDaR TrackA,
PubLayNet and IIIT-AR-13K, by only using a lightweight ResNet-18 backbone
network. Furthermore, we propose a new split-and-merge based table structure
recognition approach, in which a novel spatial CNN based separation line
prediction module is proposed to split each detected table into a grid of
cells, and a Grid CNN based cell merging module is applied to recover the
spanning cells. As the spatial CNN module can effectively propagate contextual
information across the whole table image, our table structure recognizer can
robustly recognize tables with large blank spaces and geometrically distorted
(even curved) tables. Thanks to these two techniques, our table structure
recognition approach achieves state-of-the-art performance on three public
benchmarks, including SciTSR, PubTabNet and cTDaR TrackB2-Modern. Moreover, we
have further demonstrated the advantages of our approach in recognizing tables
with complex structures, large blank spaces, as well as geometrically distorted
or even curved shapes on a more challenging in-house dataset.Comment: Accepted by Pattern Recognition on 27 Aug. 202
Redemption from Range-view for Accurate 3D Object Detection
Most recent approaches for 3D object detection predominantly rely on
point-view or bird's-eye view representations, with limited exploration of
range-view-based methods. The range-view representation suffers from scale
variation and surface texture deficiency, both of which pose significant
limitations for developing corresponding methods. Notably, the surface texture
loss problem has been largely ignored by all existing methods, despite its
significant impact on the accuracy of range-view-based 3D object detection. In
this study, we propose Redemption from Range-view R-CNN (R2 R-CNN), a novel and
accurate approach that comprehensively explores the range-view representation.
Our proposed method addresses scale variation through the HD Meta Kernel, which
captures range-view geometry information in multiple scales. Additionally, we
introduce Feature Points Redemption (FPR) to recover the lost 3D surface
texture information from the range view, and Synchronous-Grid RoI Pooling
(S-Grid RoI Pooling), a multi-scaled approach with multiple receptive fields
for accurate box refinement. Our R2 R-CNN outperforms existing range-view-based
methods, achieving state-of-the-art performance on both the KITTI benchmark and
the Waymo Open Dataset. Our study highlights the critical importance of
addressing the surface texture loss problem for accurate 3D object detection in
range-view-based methods. Codes will be made publicly available
Grid Loss: Detecting Occluded Faces
Detection of partially occluded objects is a challenging computer vision
problem. Standard Convolutional Neural Network (CNN) detectors fail if parts of
the detection window are occluded, since not every sub-part of the window is
discriminative on its own. To address this issue, we propose a novel loss layer
for CNNs, named grid loss, which minimizes the error rate on sub-blocks of a
convolution layer independently rather than over the whole feature map. This
results in parts being more discriminative on their own, enabling the detector
to recover if the detection window is partially occluded. By mapping our loss
layer back to a regular fully connected layer, no additional computational cost
is incurred at runtime compared to standard CNNs. We demonstrate our method for
face detection on several public face detection benchmarks and show that our
method outperforms regular CNNs, is suitable for realtime applications and
achieves state-of-the-art performance.Comment: accepted to ECCV 201
- …