298 research outputs found

    Sen2Pro: A Probabilistic Perspective to Sentence Embedding from Pre-trained Language Model

    Full text link
    Sentence embedding is one of the most fundamental tasks in Natural Language Processing and plays an important role in various tasks. The recent breakthrough in sentence embedding is achieved by pre-trained language models (PLMs). Despite its success, an embedded vector (Sen2Vec) representing a point estimate does not naturally express uncertainty in a taskagnostic way. This paper thereby proposes an efficient framework on probabilistic sentence embedding (Sen2Pro) from PLMs, and it represents a sentence as a probability density distribution in an embedding space to reflect both model uncertainty and data uncertainty (i.e., many-to-one nature) in the sentence representation. The proposed framework performs in a plug-and-play way without retraining PLMs anymore, and it is easy to implement and generally applied on top of any PLM. The superiority of Sen2Pro over Sen2Vec has been theoretically verified and practically illustrated on different NLP tasks.Comment: Accepted to ACL2023 workshop Rep4NL

    Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: A Preliminary Empirical Study

    Full text link
    Evaluating the quality of generated text is a challenging task in natural language processing. This difficulty arises from the inherent complexity and diversity of text. Recently, OpenAI's ChatGPT, a powerful large language model (LLM), has garnered significant attention due to its impressive performance in various tasks. Therefore, we present this report to investigate the effectiveness of LLMs, especially ChatGPT, and explore ways to optimize their use in assessing text quality. We compared three kinds of reference-free evaluation methods based on ChatGPT or similar LLMs. The experimental results prove that ChatGPT is capable to evaluate text quality effectively from various perspectives without reference and demonstrates superior performance than most existing automatic metrics. In particular, the Explicit Score, which utilizes ChatGPT to generate a numeric score measuring text quality, is the most effective and reliable method among the three exploited approaches. However, directly comparing the quality of two texts using ChatGPT may lead to suboptimal results. We hope this report will provide valuable insights into selecting appropriate methods for evaluating text quality with LLMs such as ChatGPT.Comment: Technical Report, 13 page

    TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design

    Full text link
    High-quality instruction-tuning data is critical to improving LLM capabilities. Existing data collection methods are limited by unrealistic manual labeling costs or by the hallucination of relying solely on LLM generation. To address the problems, this paper presents a scalable method to automatically collect high-quality instructional adaptation data by training language models to automatically design tasks based on human-written texts. Intuitively, human-written text helps to help the model attenuate illusions during the generation of tasks. Unlike instruction back-translation-based methods that directly take the given text as a response, we require the model to generate the \textit{instruction}, \textit{input}, and \textit{output} simultaneously to filter the noise. The results of the automated and manual evaluation experiments demonstrate the quality of our dataset.Comment: Work in progres

    StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving

    Full text link
    Most existing chain-of-thought (CoT) prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other cases and lack task-level consistency in their reasoning steps. To address these limitations, we propose a comprehensive framework, StrategyLLM, harnessing the capabilities of LLMs to tackle various tasks. The framework improves generalizability by formulating general problem-solving strategies and enhances consistency by producing consistent solutions using these strategies. StrategyLLM employs four LLM-based agents: strategy generator, executor, optimizer, and evaluator, working together to generate, evaluate, and select promising strategies for a given task automatically. The experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (39.2% →\rightarrow 43.3%), commonsense reasoning (70.3% →\rightarrow 72.5%), algorithmic reasoning (51.7% →\rightarrow 62.0%), and symbolic reasoning (30.0% →\rightarrow 79.2%)

    A Benchmark for Text Expansion: Datasets, Metrics, and Baselines

    Full text link
    This work presents a new task of Text Expansion (TE), which aims to insert fine-grained modifiers into proper locations of the plain text to concretize or vivify human writings. Different from existing insertion-based writing assistance tasks, TE requires the model to be more flexible in both locating and generation, and also more cautious in keeping basic semantics. We leverage four complementary approaches to construct a dataset with 12 million automatically generated instances and 2K human-annotated references for both English and Chinese. To facilitate automatic evaluation, we design various metrics from multiple perspectives. In particular, we propose Info-Gain to effectively measure the informativeness of expansions, which is an important quality dimension in TE. On top of a pre-trained text-infilling model, we build both pipelined and joint Locate&Infill models, which demonstrate the superiority over the Text2Text baselines, especially in expansion informativeness. Experiments verify the feasibility of the TE task and point out potential directions for future research toward better automatic text expansion

    Zero-Shot Rumor Detection with Propagation Structure via Prompt Learning

    Full text link
    The spread of rumors along with breaking events seriously hinders the truth in the era of social media. Previous studies reveal that due to the lack of annotated resources, rumors presented in minority languages are hard to be detected. Furthermore, the unforeseen breaking events not involved in yesterday's news exacerbate the scarcity of data resources. In this work, we propose a novel zero-shot framework based on prompt learning to detect rumors falling in different domains or presented in different languages. More specifically, we firstly represent rumor circulated on social media as diverse propagation threads, then design a hierarchical prompt encoding mechanism to learn language-agnostic contextual representations for both prompts and rumor data. To further enhance domain adaptation, we model the domain-invariant structural features from the propagation threads, to incorporate structural position representations of influential community response. In addition, a new virtual response augmentation method is used to improve model training. Extensive experiments conducted on three real-world datasets demonstrate that our proposed model achieves much better performance than state-of-the-art methods and exhibits a superior capacity for detecting rumors at early stages.Comment: AAAI 202

    Threshold Recognition Based on Non-stationarity of Extreme Rainfall in the Middle and Lower Reaches of the Yangtze River Basin

    Get PDF
    Analyzing the hydrological sequence from the non-stationary characteristics can better understand the responses of changes in extreme rainfall to climate change. Taking the plain area in the middle and lower reaches of the Yangtze River basin (MLRYRB) as the study area, this study adopted a set of extreme rainfall indices and used the Bernaola-Galvan Segmentation Algorithm (BGSA) method to test the non-stationarity of extreme rainfall events. The General Pareto Distribution (GPD) was used to fit extreme rainfall and was calculated to select the optimal threshold of extreme rainfall. In addition, the cross-wavelet technique was used to explore the correlations of extreme rainfall with El Niño-Southern Oscillation (ENSO) and Western Pacific Subtropical High (WPSH) events. The results showed that: (1) extreme rainfall under different thresholds had different non-stationary characteristics; (2) the GPD distribution could well fit the extreme rainfall in the MLRYRB, and 40–60 mm was considered as the suitable optimal threshold by comparing the uncertainty of the return period; and (3) ENSO and WPSH had significant periodic effects on extreme rainfall in the MLRYRB. These findings highlighted the significance of non-stationary assumptions in hydrological frequency analysis, which were of great importance for hydrological forecasting and water conservancy project management

    Real-time visualization of clustering and intracellular transport of gold nanoparticles by correlative imaging.

    Get PDF
    Mechanistic understanding of the endocytosis and intracellular trafficking of nanoparticles is essential for designing smart theranostic carriers. Physico-chemical properties, including size, clustering and surface chemistry of nanoparticles regulate their cellular uptake and transport. Significantly, even single nanoparticles could cluster intracellularly, yet their clustering state and subsequent trafficking are not well understood. Here, we used DNA-decorated gold (fPlas-gold) nanoparticles as a dually emissive fluorescent and plasmonic probe to examine their clustering states and intracellular transport. Evidence from correlative fluorescence and plasmonic imaging shows that endocytosis of fPlas-gold follows multiple pathways. In the early stages of endocytosis, fPlas-gold nanoparticles appear mostly as single particles and they cluster during the vesicular transport and maturation. The speed of encapsulated fPlas-gold transport was critically dependent on the size of clusters but not on the types of organelle such as endosomes and lysosomes. Our results provide key strategies for engineering theranostic nanocarriers for efficient health management
    • …
    corecore