6 research outputs found

    CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model

    Full text link
    Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks. However, when tuning publicly accessible, centralized LLMs with private instruction data, privacy concerns are inevitable. While direct transfer of parameterized modules between models is a plausible approach to address this, its implications and effectiveness need further exploration. This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators. Given the limited understanding of the underlying mechanism of OFT, we perform an empirical analysis on LLMs from the perspectives of representation and functional similarity. Interestingly, our findings reveal a unique modular structure within the layers of LLMs that appears to emerge as the model size expands. Simultaneously, we note subtle but potentially significant changes in representation and intermediate predictions across the layers. Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs. CRaSh significantly boosts performance of OFT with billions of parameters. Furthermore, we investigate the optimal solutions yielded by fine-tuning with and without full model through the lens of loss landscape. Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT. The source code is publicly available at https://github.com/TsinghuaC3I/CRaSh.Comment: Accepted to EMNLP 2023 (Main Conference

    Large Language Models are Zero Shot Hypothesis Proposers

    Full text link
    Significant scientific discoveries have driven the progress of human civilisation. The explosion of scientific literature and data has created information barriers across disciplines that have slowed the pace of scientific discovery. Large Language Models (LLMs) hold a wealth of global and interdisciplinary knowledge that promises to break down these information barriers and foster a new wave of scientific discovery. However, the potential of LLMs for scientific discovery has not been formally explored. In this paper, we start from investigating whether LLMs can propose scientific hypotheses. To this end, we construct a dataset consist of background knowledge and hypothesis pairs from biomedical literature. The dataset is divided into training, seen, and unseen test sets based on the publication date to control visibility. We subsequently evaluate the hypothesis generation capabilities of various top-tier instructed models in zero-shot, few-shot, and fine-tuning settings, including both closed and open-source LLMs. Additionally, we introduce an LLM-based multi-agent cooperative framework with different role designs and external tools to enhance the capabilities related to generating hypotheses. We also design four metrics through a comprehensive review to evaluate the generated hypotheses for both ChatGPT-based and human evaluations. Through experiments and analyses, we arrive at the following findings: 1) LLMs surprisingly generate untrained yet validated hypotheses from testing literature. 2) Increasing uncertainty facilitates candidate generation, potentially enhancing zero-shot hypothesis generation capabilities. These findings strongly support the potential of LLMs as catalysts for new scientific discoveries and guide further exploration.Comment: Instruction Workshop @ NeurIPS 202

    Off-Grid Compressive Channel Estimation for mm-Wave Massive MIMO With Hybrid Precoding

    No full text

    DOA Estimation under GNSS Spoofing Attacks Using a Coprime Array: From a Sparse Reconstruction Viewpoint

    No full text
    The antispoofing method using the direction-of-arrival (DOA) feature can effectively improve the application security of the global navigation satellite system (GNSS) receivers. In this paper, a sparse reconstruction approach based on a coprime array of antennas is proposed to provide reliable DOA estimation under a GNSS spoofing attack. Specifically, the self-coherence property of genuine satellite signals and spoofing was fully exploited to construct a denoised covariance matrix that enables DOA estimation before receiver despreading. Based on this, an equivalent uniform linear array (ULA) was generated from the constructed covariance matrix via virtual array interpolation. By applying the ideal of sparse reconstruction to an equivalent ULA signal, the preliminary DOA estimation results could be obtained without the need for a number of signals. Considering that the sparse estimation technique suffers from basis mismatch effects, we designed an optimization problem with respect to off-grid error to compensate the initial DOA such that the performance loss of DOA estimation could be reduced. Numerical examples demonstrated the advantages of the proposed method in terms of degrees-of-freedom (DOFs), resolution and accuracy

    Balanced Convolutional Neural Networks for Pneumoconiosis Detection

    No full text
    Pneumoconiosis remains one of the most common and harmful occupational diseases in China, leading to huge economic losses to society with its high prevalence and costly treatment. Diagnosis of pneumoconiosis still strongly depends on the experience of radiologists, which affects rapid detection on large populations. Recent research focuses on computer-aided detection based on machine learning. These have achieved high accuracy, among which artificial neural network (ANN) shows excellent performance. However, due to imbalanced samples and lack of interpretability, wide utilization in clinical practice meets difficulty. To address these problems, we first establish a pneumoconiosis radiograph dataset, including both positive and negative samples. Second, deep convolutional diagnosis approaches are compared in pneumoconiosis detection, and a balanced training is adopted to promote recall. Comprehensive experiments conducted on this dataset demonstrate high accuracy (88.6%). Third, we explain diagnosis results by visualizing suspected opacities on pneumoconiosis radiographs, which could provide solid diagnostic reference for surgeons
    corecore