106 research outputs found
X-PDNet: Accurate Joint Plane Instance Segmentation and Monocular Depth Estimation with Cross-Task Distillation and Boundary Correction
Segmentation of planar regions from a single RGB image is a particularly
important task in the perception of complex scenes. To utilize both visual and
geometric properties in images, recent approaches often formulate the problem
as a joint estimation of planar instances and dense depth through feature
fusion mechanisms and geometric constraint losses. Despite promising results,
these methods do not consider cross-task feature distillation and perform
poorly in boundary regions. To overcome these limitations, we propose X-PDNet,
a framework for the multitask learning of plane instance segmentation and depth
estimation with improvements in the following two aspects. Firstly, we
construct the cross-task distillation design which promotes early information
sharing between dual-tasks for specific task improvements. Secondly, we
highlight the current limitations of using the ground truth boundary to develop
boundary regression loss, and propose a novel method that exploits depth
information to support precise boundary region segmentation. Finally, we
manually annotate more than 3000 images from Stanford 2D-3D-Semantics dataset
and make available for evaluation of plane instance segmentation. Through the
experiments, our proposed methods prove the advantages, outperforming the
baseline with large improvement margins in the quantitative results on the
ScanNet and the Stanford 2D-3D-S dataset, demonstrating the effectiveness of
our proposals.Comment: Accepted to BMVC 202
Gramian Attention Heads are Strong yet Efficient Vision Learners
We introduce a novel architecture design that enhances expressiveness by
incorporating multiple head classifiers (\ie, classification heads) instead of
relying on channel expansion or additional building blocks. Our approach
employs attention-based aggregation, utilizing pairwise feature similarity to
enhance multiple lightweight heads with minimal resource overhead. We compute
the Gramian matrices to reinforce class tokens in an attention layer for each
head. This enables the heads to learn more discriminative representations,
enhancing their aggregation capabilities. Furthermore, we propose a learning
algorithm that encourages heads to complement each other by reducing
correlation for aggregation. Our models eventually surpass state-of-the-art
CNNs and ViTs regarding the accuracy-throughput trade-off on ImageNet-1K and
deliver remarkable performance across various downstream tasks, such as COCO
object instance segmentation, ADE20k semantic segmentation, and fine-grained
visual classification datasets. The effectiveness of our framework is
substantiated by practical experimental results and further underpinned by
generalization error bound. We release the code publicly at:
https://github.com/Lab-LVM/imagenet-models
Beyond pairwise clustering
We consider the problem of clustering in domains where the affinity relations are not dyadic (pairwise), but rather triadic, tetradic or higher. The problem is an instance of the hypergraph partitioning problem. We propose a two-step algorithm for solving this problem. In the first step we use a novel scheme to approximate the hypergraph using a weighted graph. In the second step a spectral partitioning algorithm is used to partition the vertices of this graph. The algorithm is capable of handling hyperedges of all orders including order two, thus incorporating information of all orders simultaneously. We present a theoretical analysis that relates our algorithm to an existing hypergraph partitioning algorithm and explain the reasons for its superior performance. We report the performance of our algorithm on a variety of computer vision problems and compare it to several existing hypergraph partitioning algorithms
Physical Biology of the Materials-Microorganism Interface.
Future solar-to-chemical production will rely upon a deep understanding of the material-microorganism interface. Hybrid technologies, which combine inorganic semiconductor light harvesters with biological catalysis to transform light, air, and water into chemicals, already demonstrate a wide product scope and energy efficiencies surpassing that of natural photosynthesis. But optimization to economic competitiveness and fundamental curiosity beg for answers to two basic questions: (1) how do materials transfer energy and charge to microorganisms, and (2) how do we design for bio- and chemocompatibility between these seemingly unnatural partners? This Perspective highlights the state-of-the-art and outlines future research paths to inform the cadre of spectroscopists, electrochemists, bioinorganic chemists, material scientists, and biologists who will ultimately solve these mysteries
TSDF-Sampling: Efficient Sampling for Neural Surface Field using Truncated Signed Distance Field
Multi-view neural surface reconstruction has exhibited impressive results.
However, a notable limitation is the prohibitively slow inference time when
compared to traditional techniques, primarily attributed to the dense sampling,
required to maintain the rendering quality. This paper introduces a novel
approach that substantially reduces the number of samplings by incorporating
the Truncated Signed Distance Field (TSDF) of the scene. While prior works have
proposed importance sampling, their dependence on initial uniform samples over
the entire space makes them unable to avoid performance degradation when trying
to use less number of samples. In contrast, our method leverages the TSDF
volume generated only by the trained views, and it proves to provide a
reasonable bound on the sampling from upcoming novel views. As a result, we
achieve high rendering quality by fully exploiting the continuous neural SDF
estimation within the bounds given by the TSDF volume. Notably, our method is
the first approach that can be robustly plug-and-play into a diverse array of
neural surface field models, as long as they use the volume rendering
technique. Our empirical results show an 11-fold increase in inference speed
without compromising performance. The result videos are available at our
project page: https://tsdf-sampling.github.io
- …