12 research outputs found
Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis
Recognizing the layout of unstructured digital documents is crucial when
parsing the documents into the structured, machine-readable format for
downstream applications. Recent studies in Document Layout Analysis usually
rely on computer vision models to understand documents while ignoring other
information, such as context information or relation of document components,
which are vital to capture. Our Doc-GCN presents an effective way to harmonize
and integrate heterogeneous aspects for Document Layout Analysis. We first
construct graphs to explicitly describe four main aspects, including syntactic,
semantic, density, and appearance/visual information. Then, we apply graph
convolutional networks for representing each aspect of information and use
pooling to integrate them. Finally, we aggregate each aspect and feed them into
2-layer MLPs for document layout component classification. Our Doc-GCN achieves
new state-of-the-art results in three widely used DLA datasets.Comment: Accepted by COLING 202
MC-DRE: Multi-Aspect Cross Integration for Drug Event/Entity Extraction
Extracting meaningful drug-related information chunks, such as adverse drug
events (ADE), is crucial for preventing morbidity and saving many lives. Most
ADEs are reported via an unstructured conversation with the medical context, so
applying a general entity recognition approach is not sufficient enough. In
this paper, we propose a new multi-aspect cross-integration framework for drug
entity/event detection by capturing and aligning different
context/language/knowledge properties from drug-related documents. We first
construct multi-aspect encoders to describe semantic, syntactic, and medical
document contextual information by conducting those slot tagging tasks, main
drug entity/event detection, part-of-speech tagging, and general medical named
entity recognition. Then, each encoder conducts cross-integration with other
contextual information in three ways: the key-value cross, attention cross, and
feedforward cross, so the multi-encoders are integrated in depth. Our model
outperforms all SOTA on two widely used tasks, flat entity detection and
discontinuous event extraction.Comment: Accepted at CIKM 202
Understanding Attention for Vision-and-Language Tasks
Attention mechanism has been used as an important component across
Vision-and-Language(VL) tasks in order to bridge the semantic gap between
visual and textual features. While attention has been widely used in VL tasks,
it has not been examined the capability of different attention alignment
calculation in bridging the semantic gap between visual and textual clues. In
this research, we conduct a comprehensive analysis on understanding the role of
attention alignment by looking into the attention score calculation methods and
check how it actually represents the visual region's and textual token's
significance for the global assessment. We also analyse the conditions which
attention score calculation mechanism would be more (or less) interpretable,
and which may impact the model performance on three different VL tasks,
including visual question answering, text-to-image generation, text-and-image
matching (both sentence and image retrieval). Our analysis is the first of its
kind and provides useful insights of the importance of each attention alignment
score calculation when applied at the training phase of VL tasks, commonly
ignored in attention-based cross modal models, and/or pretrained models. Our
code is available at: https://github.com/adlnlp/Attention_VLComment: Accepted in COLING 202
Tri-level Joint Natural Language Understanding for Multi-turn Conversational Datasets
Natural language understanding typically maps single utterances to a dual
level semantic frame, sentence level intent and slot labels at the word level.
The best performing models force explicit interaction between intent detection
and slot filling. We present a novel tri-level joint natural language
understanding approach, adding domain, and explicitly exchange semantic
information between all levels. This approach enables the use of multi-turn
datasets which are a more natural conversational environment than single
utterance. We evaluate our model on two multi-turn datasets for which we are
the first to conduct joint slot-filling and intent detection. Our model
outperforms state-of-the-art joint models in slot filling and intent detection
on multi-turn data sets. We provide an analysis of explicit interaction
locations between the layers. We conclude that including domain information
improves model performance.Comment: accepted at INTERSPEECH 202
Form-NLU: Dataset for the Form Language Understanding
Compared to general document analysis tasks, form document structure
understanding and retrieval are challenging. Form documents are typically made
by two types of authors; A form designer, who develops the form structure and
keys, and a form user, who fills out form values based on the provided keys.
Hence, the form values may not be aligned with the form designer's intention
(structure and keys) if a form user gets confused. In this paper, we introduce
Form-NLU, the first novel dataset for form structure understanding and its key
and value information extraction, interpreting the form designer's intent and
the alignment of user-written value on it. It consists of 857 form images, 6k
form keys and values, and 4k table keys and values. Our dataset also includes
three form types: digital, printed, and handwritten, which cover diverse form
appearances and layouts. We propose a robust positional and logical
relation-based form key-value information extraction framework. Using this
dataset, Form-NLU, we first examine strong object detection models for the form
layout understanding, then evaluate the key information extraction task on the
dataset, providing fine-grained results for different types of forms and keys.
Furthermore, we examine it with the off-the-shelf pdf layout extraction tool
and prove its feasibility in real-world cases.Comment: Accepted by SIGIR 202
DDI-MuG: Multi-aspect graphs for drug-drug interaction extraction
IntroductionDrug-drug interaction (DDI) may lead to adverse reactions in patients, thus it is important to extract such knowledge from biomedical texts. However, previously proposed approaches typically focus on capturing sentence-aspect information while ignoring valuable knowledge concerning the whole corpus. In this paper, we propose a Multi-aspect Graph-based DDI extraction model, named DDI-MuG.MethodsWe first employ a bio-specific pre-trained language model to obtain the token contextualized representations. Then we use two graphs to get syntactic information from input instance and word co-occurrence information within the entire corpus, respectively. Finally, we combine the representations of drug entities and verb tokens for the final classificationResultsTo validate the effectiveness of the proposed model, we perform extensive experiments on two widely used DDI extraction dataset, DDIExtraction-2013 and TAC 2018. It is encouraging to see that our model outperforms all twelve state-of-the-art models.DiscussionIn contrast to the majority of earlier models that rely on the black-box approach, our model enables visualization of crucial words and their interrelationships by utilizing edge information from two graphs. To the best of our knowledge, this is the first model that explores multi-aspect graphs to the DDI extraction task, and we hope it can establish a foundation for more robust multi-aspect works in the future
SUPER-Rec: SUrrounding Position-Enhanced Representation for Recommendation
Collaborative filtering problems are commonly solved based on matrix
completion techniques which recover the missing values of user-item interaction
matrices. In a matrix, the rating position specifically represents the user
given and the item rated. Previous matrix completion techniques tend to neglect
the position of each element (user, item and ratings) in the matrix but mainly
focus on semantic similarity between users and items to predict the missing
value in a matrix. This paper proposes a novel position-enhanced user/item
representation training model for recommendation, SUPER-Rec. We first capture
the rating position in the matrix using the relative positional rating encoding
and store the position-enhanced rating information and its user-item
relationship to the fixed dimension of embedding that is not affected by the
matrix size. Then, we apply the trained position-enhanced user and item
representations to the simplest traditional machine learning models to
highlight the pure novelty of our representation learning model. We contribute
the first formal introduction and quantitative analysis of position-enhanced
item representation in the recommendation domain and produce a principled
discussion about our SUPER-Rec to the outperformed performance of typical
collaborative filtering recommendation tasks with both explicit and implicit
feedback.Comment: There was a testing environment issue so it is required to re-conduct
the model evaluatio