161 research outputs found
CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model
Instruction tuning has recently been recognized as an effective way of
aligning Large Language Models (LLMs) to enhance their generalization ability
across various tasks. However, when tuning publicly accessible, centralized
LLMs with private instruction data, privacy concerns are inevitable. While
direct transfer of parameterized modules between models is a plausible approach
to address this, its implications and effectiveness need further exploration.
This paper focuses on Offsite-Tuning (OFT), a representative technique that
transfers transformer blocks between centralized LLMs and downstream emulators.
Given the limited understanding of the underlying mechanism of OFT, we perform
an empirical analysis on LLMs from the perspectives of representation and
functional similarity. Interestingly, our findings reveal a unique modular
structure within the layers of LLMs that appears to emerge as the model size
expands. Simultaneously, we note subtle but potentially significant changes in
representation and intermediate predictions across the layers. Inspired by
these observations, we propose CRaSh, involving Clustering, Removing, and
Sharing, a training-free strategy to derive improved emulators from LLMs. CRaSh
significantly boosts performance of OFT with billions of parameters.
Furthermore, we investigate the optimal solutions yielded by fine-tuning with
and without full model through the lens of loss landscape. Our findings
demonstrate a linear connectivity among these optima falling over the same
basin, thereby highlighting the effectiveness of CRaSh and OFT. The source code
is publicly available at https://github.com/TsinghuaC3I/CRaSh.Comment: Accepted to EMNLP 2023 (Main Conference
Large Language Models are Zero Shot Hypothesis Proposers
Significant scientific discoveries have driven the progress of human
civilisation. The explosion of scientific literature and data has created
information barriers across disciplines that have slowed the pace of scientific
discovery. Large Language Models (LLMs) hold a wealth of global and
interdisciplinary knowledge that promises to break down these information
barriers and foster a new wave of scientific discovery. However, the potential
of LLMs for scientific discovery has not been formally explored. In this paper,
we start from investigating whether LLMs can propose scientific hypotheses. To
this end, we construct a dataset consist of background knowledge and hypothesis
pairs from biomedical literature. The dataset is divided into training, seen,
and unseen test sets based on the publication date to control visibility. We
subsequently evaluate the hypothesis generation capabilities of various
top-tier instructed models in zero-shot, few-shot, and fine-tuning settings,
including both closed and open-source LLMs. Additionally, we introduce an
LLM-based multi-agent cooperative framework with different role designs and
external tools to enhance the capabilities related to generating hypotheses. We
also design four metrics through a comprehensive review to evaluate the
generated hypotheses for both ChatGPT-based and human evaluations. Through
experiments and analyses, we arrive at the following findings: 1) LLMs
surprisingly generate untrained yet validated hypotheses from testing
literature. 2) Increasing uncertainty facilitates candidate generation,
potentially enhancing zero-shot hypothesis generation capabilities. These
findings strongly support the potential of LLMs as catalysts for new scientific
discoveries and guide further exploration.Comment: Instruction Workshop @ NeurIPS 202
Unified Multi-Modal Image Synthesis for Missing Modality Imputation
Multi-modal medical images provide complementary soft-tissue characteristics
that aid in the screening and diagnosis of diseases. However, limited scanning
time, image corruption and various imaging protocols often result in incomplete
multi-modal images, thus limiting the usage of multi-modal data for clinical
purposes. To address this issue, in this paper, we propose a novel unified
multi-modal image synthesis method for missing modality imputation. Our method
overall takes a generative adversarial architecture, which aims to synthesize
missing modalities from any combination of available ones with a single model.
To this end, we specifically design a Commonality- and Discrepancy-Sensitive
Encoder for the generator to exploit both modality-invariant and specific
information contained in input modalities. The incorporation of both types of
information facilitates the generation of images with consistent anatomy and
realistic details of the desired distribution. Besides, we propose a Dynamic
Feature Unification Module to integrate information from a varying number of
available modalities, which enables the network to be robust to random missing
modalities. The module performs both hard integration and soft integration,
ensuring the effectiveness of feature combination while avoiding information
loss. Verified on two public multi-modal magnetic resonance datasets, the
proposed method is effective in handling various synthesis tasks and shows
superior performance compared to previous methods.Comment: 10 pages, 9 figure
Related-Key Almost Universal Hash Functions: Definitions, Constructions and Applications
Universal hash functions (UHFs) have been extensively used in the design of cryptographic schemes. If we consider the related-key attack (RKA) against these UHF-based schemes, some of them may not be secure, especially those using the key of UHF as a part of the whole key of scheme, due to the weakness of UHF in the RKA setting. In order to solve the issue, we propose a new concept of related-key almost universal hash function, which is a natural extension to almost universal hash function in the RKA setting. We define related-key almost universal (RKA-AU) hash function and related-key almost XOR universal (RKA-AXU) hash function. However almost all the existing UHFs do not satisfy the new definitions. We construct one fixed-input-length universal hash functions named RH1 and two variable-input-length universal hash functions named RH2, RH3. We show that RH1 and RH2 are both RKA-AXU, and RH3 is RKA-AU for the RKD set . Furthermore, RH1, RH2 and RH3 are nearly as efficient as previous similar constructions. RKA-AU (RKA-AXU) hash functions can be used as components in the related-key secure cryptographic schemes. If we replace the universal hash functions in the schemes with our corresponding constructions, the problems about related-key attack can be solved for some RKD sets. More specifically, we give four concrete applications of RKA-AU and RKA-AXU in related-key secure message authentication codes and tweakable block ciphers
Discriminating Origin Tissues of Tumor Cell Lines by Methylation Signatures and Dys-Methylated Rules
Loss of COPZ1 induces NCOA4 mediated autophagy and ferroptosis in glioblastoma cell lines
Dysregulated iron metabolism is a hallmark of many cancers, including glioblastoma (GBM). However, its role in tumor progression remains unclear. Herein, we identified coatomer protein complex subunit zeta 1 (COPZ1) as a therapeutic target candidate which significantly dysregulated iron metabolism in GBM cells. Overexpression of COPZ1 was associated with increasing tumor grade and poor prognosis in glioma patients based on analysis of expression data from the publicly available database The Cancer Genome Atlas (P < 0.001). Protein levels of COPZ1 were significantly increased in GBM compared to non-neoplastic brain tissue samples in immunohistochemistry and western blot analysis. SiRNA knockdown of COPZ1 suppressed proliferation of U87MG, U251 and P3#GBM in vitro. Stable expression of a COPZ1 shRNA construct in U87MG inhibited tumor growth in vivo by ~60% relative to controls at day 21 after implantation (P < 0.001). Kaplan–Meier analysis of the survival data demonstrated that the overall survival of tumor bearing animals increased from 20.8 days (control) to 27.8 days (knockdown, P < 0.05). COPZ1 knockdown also led to the increase in nuclear receptor coactivator 4 (NCOA4), resulting in the degradation of ferritin, and a subsequent increase in the intracellular levels of ferrous iron and ultimately ferroptosis. These data demonstrate that COPZ1 is a critical mediator in iron metabolism. The COPZ1/NCOA4/FTH1 axis is therefore a novel therapeutic target for the treatment of human GBM.publishedVersio
Primary tumor site specificity is preserved in patient-derived tumor xenograft models
Patient-derived tumor xenograft (PDX) mouse models are widely used for drug screening. The underlying assumption is that PDX tissue is very similar with the original patient tissue, and it has the same response to the drug treatment. To investigate whether the primary tumor site information is well preserved in PDX, we analyzed the gene expression profiles of PDX mouse models originated from different tissues, including breast, kidney, large intestine, lung, ovary, pancreas, skin, and soft tissues. The popular Monte Carlo feature selection method was employed to analyze the expression profile, yielding a feature list. From this list, incremental feature selection and support vector machine (SVM) were adopted to extract distinctively expressed genes in PDXs from different primary tumor sites and build an optimal SVM classifier. In addition, we also set up a group of quantitative rules to identify primary tumor sites. A total of 755 genes were extracted by the feature selection procedures, on which the SVM classifier can provide a high performance with MCC 0.986 on classifying primary tumor sites originated from different tissues. Furthermore, we obtained 16 classification rules, which gave a lower accuracy but clear classification procedures. Such results validated that the primary tumor site specificity was well preserved in PDX as the PDXs from different primary tumor sites were still very different and these PDX differences were similar with the differences observed in patients with tumor. For example, VIM and ABHD17C were highly expressed in the PDX from breast tissue and also highly expressed in breast cancer patients
Regenerated woody plants influence soil microbial communities in a subtropical forest
10 páginas.- 4 figuras.- 3 tablas.- referencias.- upplementary data to this article can be found online at https://doi.
org/10.1016/j.apsoil.2023.104890Forests are critical for supporting multiple ecosystem services such as climate change mitigation. Microbial diversity in soil provides important functions to maintain and regenerate forest ecosystems, and yet a critical knowledge gap remains in identifying the linkage between attributes of regenerated woody plant (RWP) communities and the diversity patterns of soil microbial communities in subtropical plantations. Here, we investigated the changes in soil microbial communities and plant traits in a nine hectare Chinese fir (Cunninghamia lanceolata; CF) plantation to assess how non-planted RWP communities regulate soil bacterial and fungal diversity, and further explore the potential mechanisms that structure their interaction. Our study revealed that soil bacterial richness was positively associated with RWP richness, whereas soil fungal richness was negatively associated with RWP basal area. Meanwhile, RWP richness was positively correlated with ectomycorrhizal (ECM) fungal richness but negatively correlated with the richness of both pathogenic and saprotrophic fungi, suggesting that the RWP-fungal richness relationship was trophic guild-specific. Soil microbial community beta diversity (i.e., dissimilarity in community composition) was strongly coupled with both RWP beta diversity and the heterogeneity of RWP basal area. Our study highlights the importance of community-level RWP plant attributes for the regulation of microbial biodiversity in plantation systems, which should be considered in forest management programs in the future.This work was funded by the National Key Research and Development Program of China (2021YFD2201301 and 2022YFF1303003), the National Natural Science Foundation of China (U22A20612), and the Key Project of Jiangxi Province Natural Science Foundation of China (20224ACB205003).Peer reviewe
- …