136 research outputs found
Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction
A capsule is a group of neurons, whose activity vector represents the
instantiation parameters of a specific type of entity. In this paper, we
explore the capsule networks used for relation extraction in a multi-instance
multi-label learning framework and propose a novel neural approach based on
capsule networks with attention mechanisms. We evaluate our method with
different benchmarks, and it is demonstrated that our method improves the
precision of the predicted relations. Particularly, we show that capsule
networks improve multiple entity pairs relation extraction.Comment: To be published in EMNLP 201
Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks
We propose a distance supervised relation extraction approach for
long-tailed, imbalanced data which is prevalent in real-world settings. Here,
the challenge is to learn accurate "few-shot" models for classes existing at
the tail of the class distribution, for which little data is available.
Inspired by the rich semantic correlations between classes at the long tail and
those at the head, we take advantage of the knowledge from data-rich classes at
the head of the distribution to boost the performance of the data-poor classes
at the tail. First, we propose to leverage implicit relational knowledge among
class labels from knowledge graph embeddings and learn explicit relational
knowledge using graph convolution networks. Second, we integrate that
relational knowledge into relation extraction model by coarse-to-fine
knowledge-aware attention mechanism. We demonstrate our results for a
large-scale benchmark dataset which show that our approach significantly
outperforms other baselines, especially for long-tail relations.Comment: To be published in NAACL 201
Towards A Unified View of Answer Calibration for Multi-Step Reasoning
Large Language Models (LLMs) employing Chain-of-Thought (CoT) prompting have
broadened the scope for improving multi-step reasoning capabilities. Usually,
answer calibration strategies such as step-level or path-level calibration play
a vital role in multi-step reasoning. While effective, there remains a
significant gap in our understanding of the key factors that drive their
success. In this paper, we break down the design of recent answer calibration
strategies and present a unified view which establishes connections between
them. We then conduct a thorough evaluation on these strategies from a unified
view, systematically scrutinizing step-level and path-level answer calibration
across multiple paths. Our study holds the potential to illuminate key insights
for optimizing multi-step reasoning with answer calibration.Comment: Working in Progres
SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres
Event-centric structured prediction involves predicting structured outputs of
events. In most NLP cases, event structures are complex with manifold
dependency, and it is challenging to effectively represent these complicated
structured events. To address these issues, we propose Structured Prediction
with Energy-based Event-Centric Hyperspheres (SPEECH). SPEECH models complex
dependency among event structured components with energy-based modeling, and
represents event classes with simple but effective hyperspheres. Experiments on
two unified-annotated event datasets indicate that SPEECH is predominant in
event detection and event-relation extraction tasks.Comment: Accepted by ACL 2023 Main Conference. Code is released at
\url{https://github.com/zjunlp/SPEECH
MLBiNet: A Cross-Sentence Collective Event Detection Network
We consider the problem of collectively detecting multiple events,
particularly in cross-sentence settings. The key to dealing with the problem is
to encode semantic information and model event inter-dependency at a
document-level. In this paper, we reformulate it as a Seq2Seq task and propose
a Multi-Layer Bidirectional Network (MLBiNet) to capture the document-level
association of events and semantic information simultaneously. Specifically, a
bidirectional decoder is firstly devised to model event inter-dependency within
a sentence when decoding the event tag vector sequence. Secondly, an
information aggregation module is employed to aggregate sentence-level semantic
and event tag information. Finally, we stack multiple bidirectional decoders
and feed cross-sentence information, forming a multi-layer bidirectional
tagging architecture to iteratively propagate information across sentences. We
show that our approach provides significant improvement in performance compared
to the current state-of-the-art results.Comment: Accepted by ACL 202
From Sky to the Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal
Learning-based image deraining methods have made great progress. However, the
lack of large-scale high-quality paired training samples is the main bottleneck
to hamper the real image deraining (RID). To address this dilemma and advance
RID, we construct a Large-scale High-quality Paired real rain benchmark
(LHP-Rain), including 3000 video sequences with 1 million high-resolution
(1920*1080) frame pairs. The advantages of the proposed dataset over the
existing ones are three-fold: rain with higher-diversity and larger-scale,
image with higher-resolution and higher-quality ground-truth. Specifically, the
real rains in LHP-Rain not only contain the classical rain
streak/veiling/occlusion in the sky, but also the \textbf{splashing on the
ground} overlooked by deraining community. Moreover, we propose a novel robust
low-rank tensor recovery model to generate the GT with better separating the
static background from the dynamic rain. In addition, we design a simple
transformer-based single image deraining baseline, which simultaneously utilize
the self-attention and cross-layer attention within the image and rain layer
with discriminative feature representation. Extensive experiments verify the
superiority of the proposed dataset and deraining method over state-of-the-art.Comment: Accepted by ICCV 202
Multimodal Analogical Reasoning over Knowledge Graphs
Analogical reasoning is fundamental to human cognition and holds an important
place in various fields. However, previous studies mainly focus on single-modal
analogical reasoning and ignore taking advantage of structure knowledge.
Notably, the research in cognitive psychology has demonstrated that information
from multimodal sources always brings more powerful cognitive transfer than
single modality sources. To this end, we introduce the new task of multimodal
analogical reasoning over knowledge graphs, which requires multimodal reasoning
ability with the help of background knowledge. Specifically, we construct a
Multimodal Analogical Reasoning dataSet (MARS) and a multimodal knowledge graph
MarKG. We evaluate with multimodal knowledge graph embedding and pre-trained
Transformer baselines, illustrating the potential challenges of the proposed
task. We further propose a novel model-agnostic Multimodal analogical reasoning
framework with Transformer (MarT) motivated by the structure mapping theory,
which can obtain better performance. Code and datasets are available in
https://github.com/zjunlp/MKG_Analogy.Comment: Accepted by ICLR 202
- …