202 research outputs found
UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection
Recent advances in deep neural networks have achieved significant progress in
detecting individual objects from an image. However, object detection is not
sufficient to fully understand a visual scene. Towards a deeper visual
understanding, the interactions between objects, especially humans and objects
are essential. Most prior works have obtained this information with a bottom-up
approach, where the objects are first detected and the interactions are
predicted sequentially by pairing the objects. This is a major bottleneck in
HOI detection inference time. To tackle this problem, we propose UnionDet, a
one-stage meta-architecture for HOI detection powered by a novel union-level
detector that eliminates this additional inference stage by directly capturing
the region of interaction. Our one-stage detector for human-object interaction
shows a significant reduction in interaction prediction time 4x~14x while
outperforming state-of-the-art methods on two public datasets: V-COCO and
HICO-DET.Comment: ECCV 202
Advancing Bayesian Optimization via Learning Correlated Latent Space
Bayesian optimization is a powerful method for optimizing black-box functions
with limited function evaluations. Recent works have shown that optimization in
a latent space through deep generative models such as variational autoencoders
leads to effective and efficient Bayesian optimization for structured or
discrete data. However, as the optimization does not take place in the input
space, it leads to an inherent gap that results in potentially suboptimal
solutions. To alleviate the discrepancy, we propose Correlated latent space
Bayesian Optimization (CoBO), which focuses on learning correlated latent
spaces characterized by a strong correlation between the distances in the
latent space and the distances within the objective function. Specifically, our
method introduces Lipschitz regularization, loss weighting, and trust region
recoordination to minimize the inherent gap around the promising areas. We
demonstrate the effectiveness of our approach on several optimization tasks in
discrete data, such as molecule design and arithmetic expression fitting, and
achieve high performance within a small budget
Self-positioning Point-based Transformer for Point Cloud Understanding
Transformers have shown superior performance on various computer vision tasks
with their capabilities to capture long-range dependencies. Despite the
success, it is challenging to directly apply Transformers on point clouds due
to their quadratic cost in the number of points. In this paper, we present a
Self-Positioning point-based Transformer (SPoTr), which is designed to capture
both local and global shape contexts with reduced complexity. Specifically,
this architecture consists of local self-attention and self-positioning
point-based global cross-attention. The self-positioning points, adaptively
located based on the input shape, consider both spatial and semantic
information with disentangled attention to improve expressive power. With the
self-positioning points, we propose a novel global cross-attention mechanism
for point clouds, which improves the scalability of global self-attention by
allowing the attention module to compute attention weights with only a small
set of self-positioning points. Experiments show the effectiveness of SPoTr on
three point cloud tasks such as shape classification, part segmentation, and
scene segmentation. In particular, our proposed model achieves an accuracy gain
of 2.6% over the previous best models on shape classification with
ScanObjectNN. We also provide qualitative analyses to demonstrate the
interpretability of self-positioning points. The code of SPoTr is available at
https://github.com/mlvlab/SPoTr.Comment: Accepted paper at CVPR 202
NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA
Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves
retrieving nodes from a knowledge graph (KG) to answer natural language
questions. Recent GNN-based approaches formulate this task as a KG path
searching problem, where messages are sequentially propagated from the seed
node towards the answer nodes. However, these messages are past-oriented, and
they do not consider the full KG context. To make matters worse, KG nodes often
represent proper noun entities and are sometimes encrypted, being uninformative
in selecting between paths. To address these problems, we propose Neural Tree
Search (NuTrea), a tree search-based GNN model that incorporates the broader KG
context. Our model adopts a message-passing scheme that probes the unreached
subtree regions to boost the past-oriented embeddings. In addition, we
introduce the Relation Frequency-Inverse Entity Frequency (RF-IEF) node
embedding that considers the global KG context to better characterize ambiguous
KG nodes. The general effectiveness of our approach is demonstrated through
experiments on three major multi-hop KGQA benchmark datasets, and our extensive
analyses further validate its expressiveness and robustness. Overall, NuTrea
provides a powerful means to query the KG with complex natural language
questions. Code is available at https://github.com/mlvlab/NuTrea.Comment: Neural Information Processing Systems (NeurIPS) 202
- …