115 research outputs found
Cross-relation Cross-bag Attention for Distantly-supervised Relation Extraction
Distant supervision leverages knowledge bases to automatically label
instances, thus allowing us to train relation extractor without human
annotations. However, the generated training data typically contain massive
noise, and may result in poor performances with the vanilla supervised
learning. In this paper, we propose to conduct multi-instance learning with a
novel Cross-relation Cross-bag Selective Attention (CSA), which leads to
noise-robust training for distant supervised relation extractor. Specifically,
we employ the sentence-level selective attention to reduce the effect of noisy
or mismatched sentences, while the correlation among relations were captured to
improve the quality of attention weights. Moreover, instead of treating all
entity-pairs equally, we try to pay more attention to entity-pairs with a
higher quality. Similarly, we adopt the selective attention mechanism to
achieve this goal. Experiments with two types of relation extractor demonstrate
the superiority of the proposed approach over the state-of-the-art, while
further ablation studies verify our intuitions and demonstrate the
effectiveness of our proposed two techniques.Comment: AAAI 201
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
Scene Graph Generation (SGG) aims to extract
relationships in images for vision understanding. Although recent works have
made steady progress on SGG, they still suffer long-tail distribution issues
that tail-predicates are more costly to train and hard to distinguish due to a
small amount of annotated data compared to frequent predicates. Existing
re-balancing strategies try to handle it via prior rules but are still confined
to pre-defined conditions, which are not scalable for various models and
datasets. In this paper, we propose a Cross-modal prediCate boosting (CaCao)
framework, where a visually-prompted language model is learned to generate
diverse fine-grained predicates in a low-resource way. The proposed CaCao can
be applied in a plug-and-play fashion and automatically strengthen existing SGG
to tackle the long-tailed problem. Based on that, we further introduce a novel
Entangled cross-modal prompt approach for open-world predicate scene graph
generation (Epic), where models can generalize to unseen predicates in a
zero-shot manner. Comprehensive experiments on three benchmark datasets show
that CaCao consistently boosts the performance of multiple scene graph
generation models in a model-agnostic way. Moreover, our Epic achieves
competitive performance on open-world predicate prediction. The data and code
for this paper are publicly available.Comment: Accepted by ICCV 202
Microstructural and Electron-Emission Characteristics of Nb-Si-N Films in Surface-Conduction Electron-Emitter Display
AbstractWe proposed ternary nitride Nb-Si-N film as a promising surface-conduction electron emitter (SCE) in surface-conduction electron-emitter display (SED). Nb-Si-N films consisted of continuous NbN polycrystalline phase with (Si3-xNb4x)N4 amorphous phase in NbN grain boundaries. After electroforming, serrated nanogaps were observed in Nb-Si-N SCE strips. The emission current of Nb-Si-N SCE array of 1×18 cells was 6.50μA with anode voltage of 1.5kV and device voltage of 22V, indicating satisfying potential for display applications comparing with NbN SCEs. © 2009 Published by Elsevier B.V
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding
Temporal grounding is the task of locating a specific segment from an
untrimmed video according to a query sentence. This task has achieved
significant momentum in the computer vision community as it enables activity
grounding beyond pre-defined activity classes by utilizing the semantic
diversity of natural language descriptions. The semantic diversity is rooted in
the principle of compositionality in linguistics, where novel semantics can be
systematically described by combining known words in novel ways (compositional
generalization). However, existing temporal grounding datasets are not
carefully designed to evaluate the compositional generalizability. To
systematically benchmark the compositional generalizability of temporal
grounding models, we introduce a new Compositional Temporal Grounding task and
construct two new dataset splits, i.e., Charades-CG and ActivityNet-CG. When
evaluating the state-of-the-art methods on our new dataset splits, we
empirically find that they fail to generalize to queries with novel
combinations of seen words. We argue that the inherent structured semantics
inside the videos and language is the crucial factor to achieve compositional
generalization. Based on this insight, we propose a variational cross-graph
reasoning framework that explicitly decomposes video and language into
hierarchical semantic graphs, respectively, and learns fine-grained semantic
correspondence between the two graphs. Furthermore, we introduce a novel
adaptive structured semantics learning approach to derive the
structure-informed and domain-generalizable graph representations, which
facilitate the fine-grained semantic correspondence reasoning between the two
graphs. Extensive experiments validate the superior compositional
generalizability of our approach.Comment: arXiv admin note: substantial text overlap with arXiv:2203.1304
- …