229 research outputs found

    Iterative Forward Tuning Boosts In-context Learning in Language Models

    Full text link
    Large language models (LLMs) have exhibited an emergent in-context learning (ICL) ability. However, the ICL models that can solve ordinary cases are hardly extended to solve more complex tasks by processing the demonstration examples once. This single-turn ICL is incoordinate with the decision making process of humans by learning from analogy. In this paper, we propose an effective and efficient two-stage framework to boost ICL in LLMs by exploiting a dual form between Transformer attention and gradient descent-based optimization. Concretely, we divide the ICL process into "Deep-Thinking" and inference stages. The "Deep-Thinking" stage performs iterative forward optimization of demonstrations, which is expected to boost the reasoning abilities of LLMs at test time by "thinking" demonstrations multiple times. It produces accumulated meta-gradients by manipulating the Key-Value matrices in the self-attention modules of the Transformer. Then, the inference stage only takes the test query as input without concatenating demonstrations and applies the learned meta-gradients through attention for output prediction. In this way, demonstrations are not required during the inference stage since they are already learned and stored in the definitive meta-gradients. LLMs can be effectively and efficiently adapted to downstream tasks. Extensive experiments on ten classification and multiple-choice datasets show that our method achieves substantially better performance than standard ICL in terms of both accuracy and efficiency.Comment: 14 pages, 5 figure

    Matching-based Data Valuation for Generative Model

    Full text link
    Data valuation is critical in machine learning, as it helps enhance model transparency and protect data properties. Existing data valuation methods have primarily focused on discriminative models, neglecting deep generative models that have recently gained considerable attention. Similar to discriminative models, there is an urgent need to assess data contributions in deep generative models as well. However, previous data valuation approaches mainly relied on discriminative model performance metrics and required model retraining. Consequently, they cannot be applied directly and efficiently to recent deep generative models, such as generative adversarial networks and diffusion models, in practice. To bridge this gap, we formulate the data valuation problem in generative models from a similarity-matching perspective. Specifically, we introduce Generative Model Valuator (GMValuator), the first model-agnostic approach for any generative models, designed to provide data valuation for generation tasks. We have conducted extensive experiments to demonstrate the effectiveness of the proposed method. To the best of their knowledge, GMValuator is the first work that offers a training-free, post-hoc data valuation strategy for deep generative models

    ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases

    Full text link
    Large Language Models (LLMs) have shown the potential to revolutionize natural language processing tasks in various domains, sparking great interest in vertical-specific large models. However, unlike proprietary models such as BloombergGPT and FinGPT, which have leveraged their unique data accumulations to make strides in the finance domain, there hasn't not many similar large language models in the Chinese legal domain to facilitate its digital transformation. In this paper, we propose an open-source legal large language model named ChatLaw. Due to the importance of data quality, we carefully designed a legal domain fine-tuning dataset. Additionally, to overcome the problem of model hallucinations in legal data screening during reference data retrieval, we introduce a method that combines vector database retrieval with keyword retrieval to effectively reduce the inaccuracy of relying solely on vector database retrieval. Furthermore, we propose a self-attention method to enhance the ability of large models to overcome errors present in reference data, further optimizing the issue of model hallucinations at the model level and improving the problem-solving capabilities of large models. We also open-sourced our model and part of the data at https://github.com/PKU-YuanGroup/ChatLaw

    Research and Application on Spark Clustering Algorithm in Campus Big Data Analysis

    Get PDF
    Big data analysis has penetrated into all fields of society and has brought about profound changes. However, there is relatively little research on big data supporting student management regarding college and university’s big data. Taking the student card information as the research sample, using spark big data mining technology and K-Means clustering algorithm, taking scholarship evaluation as an example, the big data is analyzed. Data includes analysis of students’ daily behavior from multiple dimensions, and it can prevent the unreasonable scholarship evaluation caused by unfair factors such as plagiarism, votes of teachers and students, etc. At the same time, students’ absenteeism, physical health and psychological status in advance can be predicted, which makes student management work more active, accurate and effective

    Effective thermal conductivity of wire-woven bulk Kagome sandwich panels

    Get PDF
    AbstractThermal transport in a highly porous metallic wire-woven bulk Kagome (WBK) is numerically and analytically modeled. Based on topology similarity and upon introducing an elongation parameter in thermal tortuosity, an idealized Kagome with non-twisted struts is employed. Special focus is placed upon quantifying the effect of topological anisotropy of WBK upon its effective conductivity. It is demonstrated that the effective conductivity reduces linearly as the porosity increases, and the extent of the reduction is significantly dependent on the orientation of WBK. The governing physical mechanism of anisotropic thermal transport in WBK is found to be the anisotropic thermal tortuosity caused by the intrinsic anisotropic topology of WBK

    Federated Learning Incentive Mechanism under Buyers' Auction Market

    Full text link
    Auction-based Federated Learning (AFL) enables open collaboration among self-interested data consumers and data owners. Existing AFL approaches are commonly under the assumption of sellers' market in that the service clients as sellers are treated as scarce resources so that the aggregation servers as buyers need to compete the bids. Yet, as the technology progresses, an increasing number of qualified clients are now capable of performing federated learning tasks, leading to shift from sellers' market to a buyers' market. In this paper, we shift the angle by adapting the procurement auction framework, aiming to explain the pricing behavior under buyers' market. Our modeling starts with basic setting under complete information, then move further to the scenario where sellers' information are not fully observable. In order to select clients with high reliability and data quality, and to prevent from external attacks, we utilize a blockchain-based reputation mechanism. The experimental results validate the effectiveness of our approach

    A matter of time: Using dynamics and theory to uncover mechanisms of transcriptional bursting

    Full text link
    Eukaryotic transcription generally occurs in bursts of activity lasting minutes to hours; however, state-of-the-art measurements have revealed that many of the molecular processes that underlie bursting, such as transcription factor binding to DNA, unfold on timescales of seconds. This temporal disconnect lies at the heart of a broader challenge in physical biology of predicting transcriptional outcomes and cellular decision-making from the dynamics of underlying molecular processes. Here, we review how new dynamical information about the processes underlying transcriptional control can be combined with theoretical models that predict not only averaged transcriptional dynamics, but also their variability, to formulate testable hypotheses about the molecular mechanisms underlying transcriptional bursting and control.Comment: 41 pages, 4 figures, review articl
    corecore