6,594 research outputs found
Concept-Oriented Deep Learning with Large Language Models
Large Language Models (LLMs) have been successfully used in many
natural-language tasks and applications including text generation and AI
chatbots. They also are a promising new technology for concept-oriented deep
learning (CODL). However, the prerequisite is that LLMs understand concepts and
ensure conceptual consistency. We discuss these in this paper, as well as major
uses of LLMs for CODL including concept extraction from text, concept graph
extraction from text, and concept learning. Human knowledge consists of both
symbolic (conceptual) knowledge and embodied (sensory) knowledge. Text-only
LLMs, however, can represent only symbolic (conceptual) knowledge. Multimodal
LLMs, on the other hand, are capable of representing the full range (conceptual
and sensory) of human knowledge. We discuss conceptual understanding in
visual-language LLMs, the most important multimodal LLMs, and major uses of
them for CODL including concept extraction from image, concept graph extraction
from image, and concept learning. While uses of LLMs for CODL are valuable
standalone, they are particularly valuable as part of LLM applications such as
AI chatbots
A Survey of Multi-task Learning in Natural Language Processing: Regarding Task Relatedness and Training Methods
Multi-task learning (MTL) has become increasingly popular in natural language
processing (NLP) because it improves the performance of related tasks by
exploiting their commonalities and differences. Nevertheless, it is still not
understood very well how multi-task learning can be implemented based on the
relatedness of training tasks. In this survey, we review recent advances of
multi-task learning methods in NLP, with the aim of summarizing them into two
general multi-task training methods based on their task relatedness: (i) joint
training and (ii) multi-step training. We present examples in various NLP
downstream applications, summarize the task relationships and discuss future
directions of this promising topic.Comment: Accepted to EACL 2023 as regular long pape
Multilingual Event Extraction from Historical Newspaper Adverts
NLP methods can aid historians in analyzing textual materials in greater volumes than manually feasible. Developing such methods poses substantial challenges though. First, acquiring large, annotated historical datasets is difficult, as only domain experts can reliably label them. Second, most available off-the-shelf NLP models are trained on modern language texts, rendering them significantly less effective when applied to historical corpora. This is particularly problematic for less well studied tasks, and for languages other than English. This paper addresses these challenges while focusing on the under-explored task of event extraction from a novel domain of historical texts. We introduce a new multilingual dataset in English, French, and Dutch composed of newspaper ads from the early modern colonial period reporting on enslaved people who liberated themselves from enslavement. We find that: 1) even with scarce annotated data, it is possible to achieve surprisingly good results by formulating the problem as an extractive QA task and leveraging existing datasets and models for modern languages; and 2) cross-lingual low-resource learning for historical languages is highly challenging, and machine translation of the historical datasets to the considered target languages is, in practice, often the best-performing solution
LOGEN: Few-shot Logical Knowledge-Conditioned Text Generation with Self-training
Natural language generation from structured data mainly focuses on
surface-level descriptions, suffering from uncontrollable content selection and
low fidelity. Previous works leverage logical forms to facilitate logical
knowledge-conditioned text generation. Though achieving remarkable progress,
they are data-hungry, which makes the adoption for real-world applications
challenging with limited data. To this end, this paper proposes a unified
framework for logical knowledge-conditioned text generation in the few-shot
setting. With only a few seeds logical forms (e.g., 20/100 shot), our approach
leverages self-training and samples pseudo logical forms based on content and
structure consistency. Experimental results demonstrate that our approach can
obtain better few-shot performance than baselines.Comment: Work in progres
QueryForm: A Simple Zero-shot Form Entity Query Framework
Zero-shot transfer learning for document understanding is a crucial yet
under-investigated scenario to help reduce the high cost involved in annotating
document entities. We present a novel query-based framework, QueryForm, that
extracts entity values from form-like documents in a zero-shot fashion.
QueryForm contains a dual prompting mechanism that composes both the document
schema and a specific entity type into a query, which is used to prompt a
Transformer model to perform a single entity extraction task. Furthermore, we
propose to leverage large-scale query-entity pairs generated from form-like
webpages with weak HTML annotations to pre-train QueryForm. By unifying
pre-training and fine-tuning into the same query-based framework, QueryForm
enables models to learn from structured documents containing various entities
and layouts, leading to better generalization to target document types without
the need for target-specific training data. QueryForm sets new state-of-the-art
average F1 score on both the XFUND (+4.6%~10.1%) and the Payment (+3.2%~9.5%)
zero-shot benchmark, with a smaller model size and no additional image input.Comment: Accepted to Findings of ACL 202
- …