299 research outputs found
Sen2Pro: A Probabilistic Perspective to Sentence Embedding from Pre-trained Language Model
Sentence embedding is one of the most fundamental tasks in Natural Language
Processing and plays an important role in various tasks. The recent
breakthrough in sentence embedding is achieved by pre-trained language models
(PLMs). Despite its success, an embedded vector (Sen2Vec) representing a point
estimate does not naturally express uncertainty in a taskagnostic way. This
paper thereby proposes an efficient framework on probabilistic sentence
embedding (Sen2Pro) from PLMs, and it represents a sentence as a probability
density distribution in an embedding space to reflect both model uncertainty
and data uncertainty (i.e., many-to-one nature) in the sentence representation.
The proposed framework performs in a plug-and-play way without retraining PLMs
anymore, and it is easy to implement and generally applied on top of any PLM.
The superiority of Sen2Pro over Sen2Vec has been theoretically verified and
practically illustrated on different NLP tasks.Comment: Accepted to ACL2023 workshop Rep4NL
Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study
Evaluating the quality of generated text is a challenging task in natural
language processing. This difficulty arises from the inherent complexity and
diversity of text. Recently, OpenAI's ChatGPT, a powerful large language model
(LLM), has garnered significant attention due to its impressive performance in
various tasks. Therefore, we present this report to investigate the
effectiveness of LLMs, especially ChatGPT, and explore ways to optimize their
use in assessing text quality. We compared three kinds of reference-free
evaluation methods based on ChatGPT or similar LLMs. The experimental results
prove that ChatGPT is capable to evaluate text quality effectively from various
perspectives without reference and demonstrates superior performance than most
existing automatic metrics. In particular, the Explicit Score, which utilizes
ChatGPT to generate a numeric score measuring text quality, is the most
effective and reliable method among the three exploited approaches. However,
directly comparing the quality of two texts using ChatGPT may lead to
suboptimal results. We hope this report will provide valuable insights into
selecting appropriate methods for evaluating text quality with LLMs such as
ChatGPT.Comment: Technical Report, 13 page
TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design
High-quality instruction-tuning data is critical to improving LLM
capabilities. Existing data collection methods are limited by unrealistic
manual labeling costs or by the hallucination of relying solely on LLM
generation. To address the problems, this paper presents a scalable method to
automatically collect high-quality instructional adaptation data by training
language models to automatically design tasks based on human-written texts.
Intuitively, human-written text helps to help the model attenuate illusions
during the generation of tasks. Unlike instruction back-translation-based
methods that directly take the given text as a response, we require the model
to generate the \textit{instruction}, \textit{input}, and \textit{output}
simultaneously to filter the noise. The results of the automated and manual
evaluation experiments demonstrate the quality of our dataset.Comment: Work in progres
StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving
Most existing chain-of-thought (CoT) prompting methods suffer from the issues
of generalizability and consistency, as they often rely on instance-specific
solutions that may not be applicable to other cases and lack task-level
consistency in their reasoning steps. To address these limitations, we propose
a comprehensive framework, StrategyLLM, harnessing the capabilities of LLMs to
tackle various tasks. The framework improves generalizability by formulating
general problem-solving strategies and enhances consistency by producing
consistent solutions using these strategies. StrategyLLM employs four LLM-based
agents: strategy generator, executor, optimizer, and evaluator, working
together to generate, evaluate, and select promising strategies for a given
task automatically. The experimental results demonstrate that StrategyLLM
outperforms the competitive baseline CoT-SC that requires human-annotated
solutions on 13 datasets across 4 challenging tasks without human involvement,
including math reasoning (39.2% 43.3%), commonsense reasoning
(70.3% 72.5%), algorithmic reasoning (51.7% 62.0%),
and symbolic reasoning (30.0% 79.2%)
A Benchmark for Text Expansion: Datasets, Metrics, and Baselines
This work presents a new task of Text Expansion (TE), which aims to insert
fine-grained modifiers into proper locations of the plain text to concretize or
vivify human writings. Different from existing insertion-based writing
assistance tasks, TE requires the model to be more flexible in both locating
and generation, and also more cautious in keeping basic semantics. We leverage
four complementary approaches to construct a dataset with 12 million
automatically generated instances and 2K human-annotated references for both
English and Chinese. To facilitate automatic evaluation, we design various
metrics from multiple perspectives. In particular, we propose Info-Gain to
effectively measure the informativeness of expansions, which is an important
quality dimension in TE. On top of a pre-trained text-infilling model, we build
both pipelined and joint Locate&Infill models, which demonstrate the
superiority over the Text2Text baselines, especially in expansion
informativeness. Experiments verify the feasibility of the TE task and point
out potential directions for future research toward better automatic text
expansion
Zero-Shot Rumor Detection with Propagation Structure via Prompt Learning
The spread of rumors along with breaking events seriously hinders the truth
in the era of social media. Previous studies reveal that due to the lack of
annotated resources, rumors presented in minority languages are hard to be
detected. Furthermore, the unforeseen breaking events not involved in
yesterday's news exacerbate the scarcity of data resources. In this work, we
propose a novel zero-shot framework based on prompt learning to detect rumors
falling in different domains or presented in different languages. More
specifically, we firstly represent rumor circulated on social media as diverse
propagation threads, then design a hierarchical prompt encoding mechanism to
learn language-agnostic contextual representations for both prompts and rumor
data. To further enhance domain adaptation, we model the domain-invariant
structural features from the propagation threads, to incorporate structural
position representations of influential community response. In addition, a new
virtual response augmentation method is used to improve model training.
Extensive experiments conducted on three real-world datasets demonstrate that
our proposed model achieves much better performance than state-of-the-art
methods and exhibits a superior capacity for detecting rumors at early stages.Comment: AAAI 202
Threshold Recognition Based on Non-stationarity of Extreme Rainfall in the Middle and Lower Reaches of the Yangtze River Basin
Analyzing the hydrological sequence from the non-stationary characteristics can better understand the responses of changes in extreme rainfall to climate change. Taking the plain area in the middle and lower reaches of the Yangtze River basin (MLRYRB) as the study area, this study adopted a set of extreme rainfall indices and used the Bernaola-Galvan Segmentation Algorithm (BGSA) method to test the non-stationarity of extreme rainfall events. The General Pareto Distribution (GPD) was used to fit extreme rainfall and was calculated to select the optimal threshold of extreme rainfall. In addition, the cross-wavelet technique was used to explore the correlations of extreme rainfall with El Niño-Southern Oscillation (ENSO) and Western Pacific Subtropical High (WPSH) events. The results showed that: (1) extreme rainfall under different thresholds had different non-stationary characteristics; (2) the GPD distribution could well fit the extreme rainfall in the MLRYRB, and 40–60 mm was considered as the suitable optimal threshold by comparing the uncertainty of the return period; and (3) ENSO and WPSH had significant periodic effects on extreme rainfall in the MLRYRB. These findings highlighted the significance of non-stationary assumptions in hydrological frequency analysis, which were of great importance for hydrological forecasting and water conservancy project management
Real-time visualization of clustering and intracellular transport of gold nanoparticles by correlative imaging.
Mechanistic understanding of the endocytosis and intracellular trafficking of nanoparticles is essential for designing smart theranostic carriers. Physico-chemical properties, including size, clustering and surface chemistry of nanoparticles regulate their cellular uptake and transport. Significantly, even single nanoparticles could cluster intracellularly, yet their clustering state and subsequent trafficking are not well understood. Here, we used DNA-decorated gold (fPlas-gold) nanoparticles as a dually emissive fluorescent and plasmonic probe to examine their clustering states and intracellular transport. Evidence from correlative fluorescence and plasmonic imaging shows that endocytosis of fPlas-gold follows multiple pathways. In the early stages of endocytosis, fPlas-gold nanoparticles appear mostly as single particles and they cluster during the vesicular transport and maturation. The speed of encapsulated fPlas-gold transport was critically dependent on the size of clusters but not on the types of organelle such as endosomes and lysosomes. Our results provide key strategies for engineering theranostic nanocarriers for efficient health management
- …