145 research outputs found
Stability analysis of a ratio-dependent predator-prey system with diffusion and stage structure
A two-species predator-prey system with diffusion term and stage
structure is discussed, local stability of the system is studied
using linearization method, and global stability of the system is
investigated by strong upper and lower solutions. The asymptotic
behavior of solutions and the negative effect of stage structure
on the permanence of populations are given
Structural Bias for Aspect Sentiment Triplet Extraction
Structural bias has recently been exploited for aspect sentiment triplet
extraction (ASTE) and led to improved performance. On the other hand, it is
recognized that explicitly incorporating structural bias would have a negative
impact on efficiency, whereas pretrained language models (PLMs) can already
capture implicit structures. Thus, a natural question arises: Is structural
bias still a necessity in the context of PLMs? To answer the question, we
propose to address the efficiency issues by using an adapter to integrate
structural bias in the PLM and using a cheap-to-compute relative position
structure in place of the syntactic dependency structure. Benchmarking
evaluation is conducted on the SemEval datasets. The results show that our
proposed structural adapter is beneficial to PLMs and achieves state-of-the-art
performance over a range of strong baselines, yet with a light parameter demand
and low latency. Meanwhile, we give rise to the concern that the current
evaluation default with data of small scale is under-confident. Consequently,
we release a large-scale dataset for ASTE. The results on the new dataset hint
that the structural adapter is confidently effective and efficient to a large
scale. Overall, we draw the conclusion that structural bias shall still be a
necessity even with PLMs.Comment: 10 pages, 4 figures, 5 tables, accepted to COLING 2022, code is
available at https://github.com/GeneZC/StructBia
VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction
With the booming of pre-trained transformers, remarkable progress has been
made on textual pair modeling to support relevant natural language
applications. Two lines of approaches are developed for text matching:
interaction-based models performing full interactions over the textual pair,
and representation-based models encoding the pair independently with siamese
encoders. The former achieves compelling performance due to its deep
interaction modeling ability, yet with a sacrifice in inference latency. The
latter is efficient and widely adopted for practical use, however, suffers from
severe performance degradation due to the lack of interactions. Though some
prior works attempt to integrate interactive knowledge into
representation-based models, considering the computational cost, they only
perform late interaction or knowledge transferring at the top layers.
Interactive information in the lower layers is still missing, which limits the
performance of representation-based solutions. To remedy this, we propose a
novel \textit{Virtual} InteRacTion mechanism, termed as VIRT, to enable full
and deep interaction modeling in representation-based models without
\textit{actual} inference computations. Concretely, VIRT asks
representation-based encoders to conduct virtual interactions to mimic the
behaviors as interaction-based models do. In addition, the knowledge distilled
from interaction-based encoders is taken as supervised signals to promise the
effectiveness of virtual interactions. Since virtual interactions only happen
at the training stage, VIRT would not increase the inference cost. Furthermore,
we design a VIRT-adapted late interaction strategy to fully utilize the learned
virtual interactive knowledge
A Physics-informed Machine Learning-based Control Method for Nonlinear Dynamic Systems with Highly Noisy Measurements
This study presents a physics-informed machine learning-based control method
for nonlinear dynamic systems with highly noisy measurements. Existing
data-driven control methods that use machine learning for system identification
cannot effectively cope with highly noisy measurements, resulting in unstable
control performance. To address this challenge, the present study extends
current physics-informed machine learning capabilities for modeling nonlinear
dynamics with control and integrates them into a model predictive control
framework. To demonstrate the capability of the proposed method we test and
validate with two noisy nonlinear dynamic systems: the chaotic Lorenz 3 system,
and turning machine tool. Analysis of the results illustrate that the proposed
method outperforms state-of-the-art benchmarks as measured by both modeling
accuracy and control performance for nonlinear dynamic systems under high-noise
conditions
GNN-encoder: Learning a Dual-encoder Architecture via Graph Neural Networks for Passage Retrieval
Recently, retrieval models based on dense representations are dominant in
passage retrieval tasks, due to their outstanding ability in terms of capturing
semantics of input text compared to the traditional sparse vector space models.
A common practice of dense retrieval models is to exploit a dual-encoder
architecture to represent a query and a passage independently. Though
efficient, such a structure loses interaction between the query-passage pair,
resulting in inferior accuracy. To enhance the performance of dense retrieval
models without loss of efficiency, we propose a GNN-encoder model in which
query (passage) information is fused into passage (query) representations via
graph neural networks that are constructed by queries and their top retrieved
passages. By this means, we maintain a dual-encoder structure, and retain some
interaction information between query-passage pairs in their representations,
which enables us to achieve both efficiency and efficacy in passage retrieval.
Evaluation results indicate that our method significantly outperforms the
existing models on MSMARCO, Natural Questions and TriviaQA datasets, and
achieves the new state-of-the-art on these datasets.Comment: 11 pages, 6 figure
Seen to Unseen: Exploring Compositional Generalization of Multi-Attribute Controllable Dialogue Generation
Existing controllable dialogue generation work focuses on the
single-attribute control and lacks generalization capability to
out-of-distribution multiple attribute combinations. In this paper, we explore
the compositional generalization for multi-attribute controllable dialogue
generation where a model can learn from seen attribute values and generalize to
unseen combinations. We propose a prompt-based disentangled controllable
dialogue generation model, DCG. It learns attribute concept composition by
generating attribute-oriented prompt vectors and uses a disentanglement loss to
disentangle different attributes for better generalization. Besides, we design
a unified reference-free evaluation framework for multiple attributes with
different levels of granularities. Experiment results on two benchmarks prove
the effectiveness of our method and the evaluation metric.Comment: ACL 2023 Main Conferenc
XPrompt: Exploring the Extreme of Prompt Tuning
Prompt tuning learns soft prompts to condition frozen Pre-trained Language
Models (PLMs) for performing downstream tasks in a parameter-efficient manner.
While prompt tuning has gradually reached the performance level of fine-tuning
as the model scale increases, there is still a large performance gap between
prompt tuning and fine-tuning for models of moderate and small scales
(typically less than 11B parameters). In this paper, we empirically show that
the trained prompt tokens can have a negative impact on a downstream task and
thus degrade its performance. To bridge the gap, we propose a novel Prompt
tuning model with an eXtremely small scale (XPrompt) under the regime of
lottery tickets hypothesis. Specifically, XPrompt eliminates the negative
prompt tokens at different granularity levels through a hierarchical structured
pruning, yielding a more parameter-efficient prompt yet with a competitive
performance. Comprehensive experiments are carried out on SuperGLUE tasks, and
the extensive results indicate that XPrompt is able to close the performance
gap at smaller model scales.Comment: 15 pages, accepted to EMNLP 2022 main conferenc
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models
While transformer-based pre-trained language models (PLMs) have dominated a
number of NLP applications, these models are heavy to deploy and expensive to
use. Therefore, effectively compressing large-scale PLMs becomes an
increasingly important problem. Quantization, which represents high-precision
tensors with low-bit fix-point format, is a viable solution. However, most
existing quantization methods are task-specific, requiring customized training
and quantization with a large number of trainable parameters on each individual
task. Inspired by the observation that the over-parameterization nature of PLMs
makes it possible to freeze most of the parameters during the fine-tuning
stage, in this work, we propose a novel ``quantize before fine-tuning''
framework, PreQuant, that differs from both quantization-aware training and
post-training quantization. PreQuant is compatible with various quantization
strategies, with outlier-aware parameter-efficient fine-tuning incorporated to
correct the induced quantization error. We demonstrate the effectiveness of
PreQuant on the GLUE benchmark using BERT, RoBERTa, and T5. We also provide an
empirical investigation into the workflow of PreQuant, which sheds light on its
efficacy.Comment: Findings of ACL202
APP: Adaptive Prototypical Pseudo-Labeling for Few-shot OOD Detection
Detecting out-of-domain (OOD) intents from user queries is essential for a
task-oriented dialogue system. Previous OOD detection studies generally work on
the assumption that plenty of labeled IND intents exist. In this paper, we
focus on a more practical few-shot OOD setting where there are only a few
labeled IND data and massive unlabeled mixed data that may belong to IND or
OOD. The new scenario carries two key challenges: learning discriminative
representations using limited IND data and leveraging unlabeled mixed data.
Therefore, we propose an adaptive prototypical pseudo-labeling (APP) method for
few-shot OOD detection, including a prototypical OOD detection framework
(ProtoOOD) to facilitate low-resource OOD detection using limited IND data, and
an adaptive pseudo-labeling method to produce high-quality pseudo OOD\&IND
labels. Extensive experiments and analysis demonstrate the effectiveness of our
method for few-shot OOD detection
- …