43 research outputs found
Polygonal Building Segmentation by Frame Field Learning
While state of the art image segmentation models typically output
segmentations in raster format, applications in geographic information systems
often require vector polygons. To help bridge the gap between deep network
output and the format used in downstream tasks, we add a frame field output to
a deep segmentation model for extracting buildings from remote sensing images.
We train a deep neural network that aligns a predicted frame field to ground
truth contours. This additional objective improves segmentation quality by
leveraging multi-task learning and provides structural information that later
facilitates polygonization; we also introduce a polygonization algorithm that
utilizes the frame field along with the raster segmentation. Our code is
available at https://github.com/Lydorn/Polygonization-by-Frame-Field-Learning.Comment: CVPR 2021 - IEEE Conference on Computer Vision and Pattern
Recognition, Jun 2021, Pittsburg / Virtual, United State
Weakly Supervised Volumetric Image Segmentation with Deformed Templates
There are many approaches that use weak-supervision to train networks to
segment 2D images. By contrast, existing 3D approaches rely on full-supervision
of a subset of 2D slices of the 3D image volume. In this paper, we propose an
approach that is truly weakly-supervised in the sense that we only need to
provide a sparse set of 3D point on the surface of target objects, an easy task
that can be quickly done. We use the 3D points to deform a 3D template so that
it roughly matches the target object outlines and we introduce an architecture
that exploits the supervision provided by coarse template to train a network to
find accurate boundaries.
We evaluate the performance of our approach on Computed Tomography (CT),
Magnetic Resonance Imagery (MRI) and Electron Microscopy (EM) image datasets.
We will show that it outperforms a more traditional approach to
weak-supervision in 3D at a reduced supervision cost.Comment: 13 Page
Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid
In modern computer vision, images are typically represented as a fixed
uniform grid with some stride and processed via a deep convolutional neural
network. We argue that deforming the grid to better align with the
high-frequency image content is a more effective strategy. We introduce
\emph{Deformable Grid} DefGrid, a learnable neural network module that predicts
location offsets of vertices of a 2-dimensional triangular grid, such that the
edges of the deformed grid align with image boundaries. We showcase our DefGrid
in a variety of use cases, i.e., by inserting it as a module at various levels
of processing. We utilize DefGrid as an end-to-end \emph{learnable geometric
downsampling} layer that replaces standard pooling methods for reducing feature
resolution when feeding images into a deep CNN. We show significantly improved
results at the same grid resolution compared to using CNNs on uniform grids for
the task of semantic segmentation. We also utilize DefGrid at the output layers
for the task of object mask annotation, and show that reasoning about object
boundaries on our predicted polygonal grid leads to more accurate results over
existing pixel-wise and curve-based approaches. We finally showcase DefGrid as
a standalone module for unsupervised image partitioning, showing superior
performance over existing approaches. Project website:
http://www.cs.toronto.edu/~jungao/def-gridComment: ECCV 202
Deformable Kernel Expansion Model for Efficient Arbitrary-shaped Scene Text Detection
Scene text detection is a challenging computer vision task due to the high
variation in text shapes and ratios. In this work, we propose a scene text
detector named Deformable Kernel Expansion (DKE), which incorporates the merits
of both segmentation and contour-based detectors. DKE employs a segmentation
module to segment the shrunken text region as the text kernel, then expands the
text kernel contour to obtain text boundary by regressing the vertex-wise
offsets. Generating the text kernel by segmentation enables DKE to inherit the
arbitrary-shaped text region modeling capability of segmentation-based
detectors. Regressing the kernel contour with some sampled vertices enables DKE
to avoid the complicated pixel-level post-processing and better learn contour
deformation as the contour-based detectors. Moreover, we propose an Optimal
Bipartite Graph Matching Loss (OBGML) that measures the matching error between
the predicted contour and the ground truth, which efficiently minimizes the
global contour matching distance. Extensive experiments on CTW1500, Total-Text,
MSRA-TD500, and ICDAR2015 demonstrate that DKE achieves a good tradeoff between
accuracy and efficiency in scene text detection