104 research outputs found
COOL, a Context Outlooker, and its Application to Question Answering and other Natural Language Processing Tasks
Vision outlookers improve the performance of vision transformers, which
implement a self-attention mechanism by adding outlook attention, a form of
local attention.
In natural language processing, as has been the case in computer vision and
other domains, transformer-based models constitute the state-of-the-art for
most processing tasks. In this domain, too, many authors have argued and
demonstrated the importance of local context.
We present and evaluate an outlook attention mechanism, COOL, for natural
language processing. COOL adds, on top of the self-attention layers of a
transformer-based model, outlook attention layers that encode local syntactic
context considering word proximity and consider more pair-wise constraints than
dynamic convolution operations used by existing approaches.
A comparative empirical performance evaluation of an implementation of COOL
with different transformer-based approaches confirms the opportunity of
improvement over a baseline using the neural language models alone for various
natural language processing tasks, including question answering. The proposed
approach is competitive with state-of-the-art methods
From Static to Dynamic: A Continual Learning Framework for Large Language Models
The vast number of parameters in large language models (LLMs) endows them
with remarkable capabilities, allowing them to excel in a variety of natural
language processing tasks. However, this complexity also presents challenges,
making LLMs difficult to train and inhibiting their ability to continuously
assimilate new knowledge, which may lead to inaccuracies in their outputs. To
mitigate these issues, this paper presents DynaMind, a novel continual learning
framework designed for LLMs. DynaMind incorporates memory mechanisms to
assimilate new knowledge and modular operators to enhance the model inference
process with the newly assimilated knowledge, consequently improving the
accuracies of LLMs' outputs. Benchmark experiments demonstrate DynaMind's
effectiveness in overcoming these challenges. The code and demo of DynaMind are
available on GitHub: https://github.com/Elfsong/DynaMind
A correlated motif approach for finding short linear motifs from protein interaction networks
BACKGROUND: An important class of interaction switches for biological circuits and disease pathways are short binding motifs. However, the biological experiments to find these binding motifs are often laborious and expensive. With the availability of protein interaction data, novel binding motifs can be discovered computationally: by applying standard motif extracting algorithms on protein sequence sets each interacting with either a common protein or a protein group with similar properties. The underlying assumption is that proteins with common interacting partners will share some common binding motifs. Although novel binding motifs have been discovered with such approach, it is not applicable if a protein interacts with very few other proteins or when prior knowledge of protein group is not available or erroneous. Experimental noise in input interaction data can further deteriorate the dismal performance of such approaches. RESULTS: We propose a novel approach of finding correlated short sequence motifs from protein-protein interaction data to effectively circumvent the above-mentioned limitations. Correlated motifs are those motifs that consistently co-occur only in pairs of interacting protein sequences, and could possibly interact with each other directly or indirectly to mediate interactions. We adopted the (l, d)-motif model and formulate finding the correlated motifs as an (l, d)-motif pair finding problem. We present both an exact algorithm, D-MOTIF, as well as its approximation algorithm, D-STAR to solve this problem. Evaluation on extensive simulated data showed that our approach not only eliminated the need for any prior protein grouping, but is also more robust in extracting motifs from noisy interaction data. Application on two biological datasets (SH3 interaction network and TGFβ signaling network) demonstrates that the approach can extract correlated motifs that correspond to actual interacting subsequences. CONCLUSION: The correlated motif approach outlined in this paper is able to find correlated linear motifs from sparse and noisy interaction data. This, in turn, will expedite the discovery of novel linear binding motifs, and facilitate the studies of biological pathways mediated by them
Solving Math Word Problems with Reexamination
Math word problem (MWP) solving aims to understand the descriptive math
problem and calculate the result, for which previous efforts are mostly devoted
to upgrade different technical modules. This paper brings a different
perspective of \textit{reexamination process} during training by introducing a
pseudo-dual task to enhance the MWP solving. We propose a pseudo-dual (PseDual)
learning scheme to model such process, which is model-agnostic thus can be
adapted to any existing MWP solvers. The pseudo-dual task is specifically
defined as filling the numbers in the expression back into the original word
problem with numbers masked. To facilitate the effective joint learning of the
two tasks, we further design a scheduled fusion strategy for the number
infilling task, which smoothly switches the input from the ground-truth math
expressions to the predicted ones. Our pseudo-dual learning scheme has been
tested and proven effective when being equipped in several representative MWP
solvers through empirical studies. \textit{The codes and trained models are
available at:} \url{https://github.com/steven640pixel/PsedualMWP}.
\end{abstract}Comment: To be appeared at NeurIPS2023 Workshop on MATH-A
FGAD: Self-boosted Knowledge Distillation for An Effective Federated Graph Anomaly Detection Framework
Graph anomaly detection (GAD) aims to identify anomalous graphs that
significantly deviate from other ones, which has raised growing attention due
to the broad existence and complexity of graph-structured data in many
real-world scenarios. However, existing GAD methods usually execute with
centralized training, which may lead to privacy leakage risk in some sensitive
cases, thereby impeding collaboration among organizations seeking to
collectively develop robust GAD models. Although federated learning offers a
promising solution, the prevalent non-IID problems and high communication costs
present significant challenges, particularly pronounced in collaborations with
graph data distributed among different participants. To tackle these
challenges, we propose an effective federated graph anomaly detection framework
(FGAD). We first introduce an anomaly generator to perturb the normal graphs to
be anomalous, and train a powerful anomaly detector by distinguishing generated
anomalous graphs from normal ones. Then, we leverage a student model to distill
knowledge from the trained anomaly detector (teacher model), which aims to
maintain the personality of local models and alleviate the adverse impact of
non-IID problems. Moreover, we design an effective collaborative learning
mechanism that facilitates the personalization preservation of local models and
significantly reduces communication costs among clients. Empirical results of
the GAD tasks on non-IID graphs compared with state-of-the-art baselines
demonstrate the superiority and efficiency of the proposed FGAD method
Gotcha! Don't trick me with unanswerable questions! Self-aligning Large Language Models for Responding to Unknown Questions
Despite the remarkable abilities of Large Language Models (LLMs) to answer
questions, they often display a considerable level of overconfidence even when
the question does not have a definitive answer. To avoid providing hallucinated
answers to these unknown questions, existing studies typically investigate
approaches to refusing to answer these questions. In this work, we propose a
novel and scalable self-alignment method to utilize the LLM itself to enhance
its response-ability to different types of unknown questions, being capable of
not only refusing to answer but also providing explanation to the
unanswerability of unknown questions. Specifically, the Self-Align method first
employ a two-stage class-aware self-augmentation approach to generate a large
amount of unknown question-response data. Then we conduct disparity-driven
self-curation to select qualified data for fine-tuning the LLM itself for
aligning the responses to unknown questions as desired. Experimental results on
two datasets across four types of unknown questions validate the superiority of
the Self-Align method over existing baselines in terms of three types of task
formulation
- …