80 research outputs found
Object-oriented Neural Programming (OONP) for Document Understanding
We propose Object-oriented Neural Programming (OONP), a framework for
semantically parsing documents in specific domains. Basically, OONP reads a
document and parses it into a predesigned object-oriented data structure
(referred to as ontology in this paper) that reflects the domain-specific
semantics of the document. An OONP parser models semantic parsing as a decision
process: a neural net-based Reader sequentially goes through the document, and
during the process it builds and updates an intermediate ontology to summarize
its partial understanding of the text it covers. OONP supports a rich family of
operations (both symbolic and differentiable) for composing the ontology, and a
big variety of forms (both symbolic and differentiable) for representing the
state and the document. An OONP parser can be trained with supervision of
different forms and strength, including supervised learning (SL) ,
reinforcement learning (RL) and hybrid of the two. Our experiments on both
synthetic and real-world document parsing tasks have shown that OONP can learn
to handle fairly complicated ontology with training data of modest sizes.Comment: accepted by ACL 201
Weakly Supervised Reasoning by Neuro-Symbolic Approaches
Deep learning has largely improved the performance of various natural
language processing (NLP) tasks. However, most deep learning models are
black-box machinery, and lack explicit interpretation. In this chapter, we will
introduce our recent progress on neuro-symbolic approaches to NLP, which
combines different schools of AI, namely, symbolism and connectionism.
Generally, we will design a neural system with symbolic latent structures for
an NLP task, and apply reinforcement learning or its relaxation to perform
weakly supervised reasoning in the downstream task. Our framework has been
successfully applied to various tasks, including table query reasoning,
syntactic structure reasoning, information extraction reasoning, and rule
reasoning. For each application, we will introduce the background, our
approach, and experimental results.Comment: Compendium of Neurosymbolic Artificial Intelligence, 665--692, 2023,
IOS Pres
Vector-Quantized Prompt Learning for Paraphrase Generation
Deep generative modeling of natural languages has achieved many successes,
such as producing fluent sentences and translating from one language into
another. However, the development of generative modeling techniques for
paraphrase generation still lags behind largely due to the challenges in
addressing the complex conflicts between expression diversity and semantic
preservation. This paper proposes to generate diverse and high-quality
paraphrases by exploiting the pre-trained models with instance-dependent
prompts. To learn generalizable prompts, we assume that the number of abstract
transforming patterns of paraphrase generation (governed by prompts) is finite
and usually not large. Therefore, we present vector-quantized prompts as the
cues to control the generation of pre-trained models. Extensive experiments
demonstrate that the proposed method achieves new state-of-art results on three
benchmark datasets, including Quora, Wikianswers, and MSCOCO. We will release
all the code upon acceptance.Comment: EMNLP Findings, 202
GPT-NAS: Neural Architecture Search with the Generative Pre-Trained Model
Neural Architecture Search (NAS) has emerged as one of the effective methods
to design the optimal neural network architecture automatically. Although
neural architectures have achieved human-level performances in several tasks,
few of them are obtained from the NAS method. The main reason is the huge
search space of neural architectures, making NAS algorithms inefficient. This
work presents a novel architecture search algorithm, called GPT-NAS, that
optimizes neural architectures by Generative Pre-Trained (GPT) model. In
GPT-NAS, we assume that a generative model pre-trained on a large-scale corpus
could learn the fundamental law of building neural architectures. Therefore,
GPT-NAS leverages the generative pre-trained (GPT) model to propose reasonable
architecture components given the basic one. Such an approach can largely
reduce the search space by introducing prior knowledge in the search process.
Extensive experimental results show that our GPT-NAS method significantly
outperforms seven manually designed neural architectures and thirteen
architectures provided by competing NAS methods. In addition, our ablation
study indicates that the proposed algorithm improves the performance of finely
tuned neural architectures by up to about 12% compared to those without GPT,
further demonstrating its effectiveness in searching neural architectures
MSGNet: Learning Multi-Scale Inter-Series Correlations for Multivariate Time Series Forecasting
Multivariate time series forecasting poses an ongoing challenge across
various disciplines. Time series data often exhibit diverse intra-series and
inter-series correlations, contributing to intricate and interwoven
dependencies that have been the focus of numerous studies. Nevertheless, a
significant research gap remains in comprehending the varying inter-series
correlations across different time scales among multiple time series, an area
that has received limited attention in the literature. To bridge this gap, this
paper introduces MSGNet, an advanced deep learning model designed to capture
the varying inter-series correlations across multiple time scales using
frequency domain analysis and adaptive graph convolution. By leveraging
frequency domain analysis, MSGNet effectively extracts salient periodic
patterns and decomposes the time series into distinct time scales. The model
incorporates a self-attention mechanism to capture intra-series dependencies,
while introducing an adaptive mixhop graph convolution layer to autonomously
learn diverse inter-series correlations within each time scale. Extensive
experiments are conducted on several real-world datasets to showcase the
effectiveness of MSGNet. Furthermore, MSGNet possesses the ability to
automatically learn explainable multi-scale inter-series correlations,
exhibiting strong generalization capabilities even when applied to
out-of-distribution samples.Comment: 13 pages, 12 figure
TimeSQL: Improving Multivariate Time Series Forecasting with Multi-Scale Patching and Smooth Quadratic Loss
Time series is a special type of sequence data, a sequence of real-valued
random variables collected at even intervals of time. The real-world
multivariate time series comes with noises and contains complicated local and
global temporal dynamics, making it difficult to forecast the future time
series given the historical observations. This work proposes a simple and
effective framework, coined as TimeSQL, which leverages multi-scale patching
and smooth quadratic loss (SQL) to tackle the above challenges. The multi-scale
patching transforms the time series into two-dimensional patches with different
length scales, facilitating the perception of both locality and long-term
correlations in time series. SQL is derived from the rational quadratic kernel
and can dynamically adjust the gradients to avoid overfitting to the noises and
outliers. Theoretical analysis demonstrates that, under mild conditions, the
effect of the noises on the model with SQL is always smaller than that with
MSE. Based on the two modules, TimeSQL achieves new state-of-the-art
performance on the eight real-world benchmark datasets. Further ablation
studies indicate that the key modules in TimeSQL could also enhance the results
of other models for multivariate time series forecasting, standing as
plug-and-play techniques
DrugLLM: Open Large Language Model for Few-shot Molecule Generation
Large Language Models (LLMs) have made great strides in areas such as
language processing and computer vision. Despite the emergence of diverse
techniques to improve few-shot learning capacity, current LLMs fall short in
handling the languages in biology and chemistry. For example, they are
struggling to capture the relationship between molecule structure and
pharmacochemical properties. Consequently, the few-shot learning capacity of
small-molecule drug modification remains impeded. In this work, we introduced
DrugLLM, a LLM tailored for drug design. During the training process, we
employed Group-based Molecular Representation (GMR) to represent molecules,
arranging them in sequences that reflect modifications aimed at enhancing
specific molecular properties. DrugLLM learns how to modify molecules in drug
discovery by predicting the next molecule based on past modifications.
Extensive computational experiments demonstrate that DrugLLM can generate new
molecules with expected properties based on limited examples, presenting a
powerful few-shot molecule generation capacity.Comment: 17 pages, 3 figure
- …