60 research outputs found
GenKIE: Robust Generative Multimodal Document Key Information Extraction
Key information extraction (KIE) from scanned documents has gained increasing
attention because of its applications in various domains. Although promising
results have been achieved by some recent KIE approaches, they are usually
built based on discriminative models, which lack the ability to handle optical
character recognition (OCR) errors and require laborious token-level labelling.
In this paper, we propose a novel generative end-to-end model, named GenKIE, to
address the KIE task. GenKIE is a sequence-to-sequence multimodal generative
model that utilizes multimodal encoders to embed visual, layout and textual
features and a decoder to generate the desired output. Well-designed prompts
are leveraged to incorporate the label semantics as the weakly supervised
signals and entice the generation of the key information. One notable advantage
of the generative model is that it enables automatic correction of OCR errors.
Besides, token-level granular annotation is not required. Extensive experiments
on multiple public real-world datasets show that GenKIE effectively generalizes
over different types of documents and achieves state-of-the-art results. Our
experiments also validate the model's robustness against OCR errors, making
GenKIE highly applicable in real-world scenarios.Comment: Accepted by EMNLP 2023, Findings pape
Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization
Biomedical named entity recognition is one of the core tasks in biomedical
natural language processing (BioNLP). To tackle this task, numerous
supervised/distantly supervised approaches have been proposed. Despite their
remarkable success, these approaches inescapably demand laborious human effort.
To alleviate the need of human effort, dictionary-based approaches have been
proposed to extract named entities simply based on a given dictionary. However,
one downside of existing dictionary-based approaches is that they are
challenged to identify concept synonyms that are not listed in the given
dictionary, which we refer as the synonym generalization problem. In this
study, we propose a novel Synonym Generalization (SynGen) framework that
recognizes the biomedical concepts contained in the input text using span-based
predictions. In particular, SynGen introduces two regularization terms, namely,
(1) a synonym distance regularizer; and (2) a noise perturbation regularizer,
to minimize the synonym generalization error. To demonstrate the effectiveness
of our approach, we provide a theoretical analysis of the bound of synonym
generalization error. We extensively evaluate our approach on a wide range of
benchmarks and the results verify that SynGen outperforms previous
dictionary-based models by notable margins. Lastly, we provide a detailed
analysis to further reveal the merits and inner-workings of our approach
Knowledge Graph Embedding: A Survey from the Perspective of Representation Spaces
Knowledge graph embedding (KGE) is a increasingly popular technique that aims
to represent entities and relations of knowledge graphs into low-dimensional
semantic spaces for a wide spectrum of applications such as link prediction,
knowledge reasoning and knowledge completion. In this paper, we provide a
systematic review of existing KGE techniques based on representation spaces.
Particularly, we build a fine-grained classification to categorise the models
based on three mathematical perspectives of the representation spaces: (1)
Algebraic perspective, (2) Geometric perspective, and (3) Analytical
perspective. We introduce the rigorous definitions of fundamental mathematical
spaces before diving into KGE models and their mathematical properties. We
further discuss different KGE methods over the three categories, as well as
summarise how spatial advantages work over different embedding needs. By
collating the experimental results from downstream tasks, we also explore the
advantages of mathematical space in different scenarios and the reasons behind
them. We further state some promising research directions from a representation
space perspective, with which we hope to inspire researchers to design their
KGE models as well as their related applications with more consideration of
their mathematical space properties.Comment: 32 pages, 6 figure
Variational Bayesian Context-aware Representation for Grocery Recommendation
Grocery recommendation is an important recommendation use-case, which aims to predict which items a user might choose to buy in the future, based on their shopping history. However, existing methods only represent each user and item by single deterministic points in a low-dimensional continuous space. In addition, most of these methods are trained by maximizing the co-occurrence likelihood with a simple Skip-gram-based formulation, which limits the expressive ability of their embeddings and the resulting recommendation performance. In this paper, we propose the Variational Bayesian Context-Aware Representation (VBCAR) model for grocery recommendation, which is a novel variational Bayesian model that learns the user and item latent vectors by leveraging basket context information from past user-item interactions. We train our VBCAR model based on the Bayesian Skip-gram framework coupled with the amortized variational inference so that it can learn more expressive latent representations that integrate both the non-linearity and Bayesian behaviour. Experiments conducted on a large real-world grocery recommendation dataset show that our proposed VBCAR model can significantly outperform existing state-of-the-art grocery recommendation methods
CLEX: Continuous Length Extrapolation for Large Language Models
Transformer-based Large Language Models (LLMs) are pioneering advances in
many natural language processing tasks, however, their exceptional capabilities
are restricted within the preset context window of Transformer. Position
Embedding (PE) scaling methods, while effective in extending the context window
to a specific length, demonstrate either notable limitations in their
extrapolation abilities or sacrificing partial performance within the context
window. Length extrapolation methods, although theoretically capable of
extending the context window beyond the training sequence length, often
underperform in practical long-context applications. To address these
challenges, we propose Continuous Length EXtrapolation (CLEX) for LLMs. We
generalise the PE scaling approaches to model the continuous dynamics by
ordinary differential equations over the length scaling factor, thereby
overcoming the constraints of current PE scaling methods designed for specific
lengths. Moreover, by extending the dynamics to desired context lengths beyond
the training sequence length, CLEX facilitates the length extrapolation with
impressive performance in practical tasks. We demonstrate that CLEX can be
seamlessly incorporated into LLMs equipped with Rotary Position Embedding, such
as LLaMA and GPT-NeoX, with negligible impact on training and inference
latency. Experimental results reveal that CLEX can effectively extend the
context window to over 4x or almost 8x training length, with no deterioration
in performance. Furthermore, when evaluated on the practical LongBench
benchmark, our model trained on a 4k length exhibits competitive performance
against state-of-the-art open-source models trained on context lengths up to
32k
LaCViT: A Label-aware Contrastive Training Framework for Vision Transformers
Vision Transformers have been incredibly effective when tackling computer
vision tasks due to their ability to model long feature dependencies. By using
large-scale training data and various self-supervised signals (e.g., masked
random patches), vision transformers provide state-of-the-art performance on
several benchmarking datasets, such as ImageNet-1k and CIFAR-10. However, these
vision transformers pretrained over general large-scale image corpora could
only produce an anisotropic representation space, limiting their
generalizability and transferability to the target downstream tasks. In this
paper, we propose a simple and effective Label-aware Contrastive Training
framework LaCViT, which improves the isotropy of the pretrained representation
space for vision transformers, thereby enabling more effective transfer
learning amongst a wide range of image classification tasks. Through
experimentation over five standard image classification datasets, we
demonstrate that LaCViT-trained models outperform the original pretrained
baselines by around 9% absolute Accuracy@1, and consistent improvements can be
observed when applying LaCViT to our three evaluated vision transformers
Profiling users for question answering communities via flow-based constrained co-embedding model
In this article, we study the task of user profiling in question answering communities (QACs). Previous user profiling algorithms suffer from a number of defects: they regard users and words as atomic units, leading to the mismatch between them; they are designed for other applications but not for QACs; and some semantic profiling algorithms do not co-embed users and words, leading to making the affinity measurement between them difficult. To improve the profiling performance, we propose a neural Flow-based Constrained Co-embedding Model, abbreviated as FCCM. FCCM jointly co-embeds the vector representations of both users and words in QACs such that the affinities between them can be semantically measured. Specifically, FCCM extends the standard variational auto-encoder model to enforce the inferred embeddings of users and words subject to the voting constraint, i.e., given a question and the users who answer this question in the community, representations of the users whose answers receive more votes are closer to the representations of the words associated with these answers, compared with representations of whose receiving fewer votes. In addition, FCCM integrates normalizing flow into the variational auto-encoder framework to avoid the assumption that the distributions of the embeddings are Gaussian, making the inferred embeddings fit the real distributions of the data better. Experimental results on a Chinese Zhihu question answering dataset demonstrate the effectiveness of our proposed FCCM model for the task of user profiling in QACs
- …