40 research outputs found
Exploring Model Transferability through the Lens of Potential Energy
Transfer learning has become crucial in computer vision tasks due to the vast
availability of pre-trained deep learning models. However, selecting the
optimal pre-trained model from a diverse pool for a specific downstream task
remains a challenge. Existing methods for measuring the transferability of
pre-trained models rely on statistical correlations between encoded static
features and task labels, but they overlook the impact of underlying
representation dynamics during fine-tuning, leading to unreliable results,
especially for self-supervised models. In this paper, we present an insightful
physics-inspired approach named PED to address these challenges. We reframe the
challenge of model selection through the lens of potential energy and directly
model the interaction forces that influence fine-tuning dynamics. By capturing
the motion of dynamic representations to decline the potential energy within a
force-driven physical model, we can acquire an enhanced and more stable
observation for estimating transferability. The experimental results on 10
downstream tasks and 12 self-supervised models demonstrate that our approach
can seamlessly integrate into existing ranking techniques and enhance their
performances, revealing its effectiveness for the model selection task and its
potential for understanding the mechanism in transfer learning. Code will be
available at https://github.com/lixiaotong97/PED.Comment: Accepted by ICCV 202
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
Image BERT pre-training with masked image modeling (MIM) becomes a popular
practice to cope with self-supervised representation learning. A seminal work,
BEiT, casts MIM as a classification task with a visual vocabulary, tokenizing
the continuous visual signals into discrete vision tokens using a pre-learned
dVAE. Despite a feasible solution, the improper discretization hinders further
improvements of image pre-training. Since image discretization has no
ground-truth answers, we believe that the masked patch should not be assigned
with a unique token id even if a better tokenizer can be obtained. In this
work, we introduce an improved BERT-style image pre-training method, namely
mc-BEiT, which performs MIM proxy tasks towards eased and refined multi-choice
training objectives. Specifically, the multi-choice supervision for the masked
image patches is formed by the soft probability vectors of the discrete token
ids, which are predicted by the off-the-shelf image tokenizer and further
refined by high-level inter-patch perceptions resorting to the observation that
similar patches should share their choices. Extensive experiments on
classification, segmentation, and detection tasks demonstrate the superiority
of our method, e.g., the pre-trained ViT-B achieves 84.1% top-1 fine-tuning
accuracy on ImageNet-1K classification, 50.8% mIOU on ADE20K semantic
segmentation, 51.2% AP^b and 44.3% AP^m of object detection and instance
segmentation on COCO, outperforming the competitive counterparts
Cross Entropy versus Label Smoothing: A Neural Collapse Perspective
Label smoothing loss is a widely adopted technique to mitigate overfitting in
deep neural networks. This paper studies label smoothing from the perspective
of Neural Collapse (NC), a powerful empirical and theoretical framework which
characterizes model behavior during the terminal phase of training. We first
show empirically that models trained with label smoothing converge faster to
neural collapse solutions and attain a stronger level of neural collapse.
Additionally, we show that at the same level of NC1, models under label
smoothing loss exhibit intensified NC2. These findings provide valuable
insights into the performance benefits and enhanced model calibration under
label smoothing loss. We then leverage the unconstrained feature model to
derive closed-form solutions for the global minimizers for both loss functions
and further demonstrate that models under label smoothing have a lower
conditioning number and, therefore, theoretically converge faster. Our study,
combining empirical evidence and theoretical results, not only provides nuanced
insights into the differences between label smoothing and cross-entropy losses,
but also serves as an example of how the powerful neural collapse framework can
be used to improve our understanding of DNNs
SpikeBERT: A Language Spikformer Trained with Two-Stage Knowledge Distillation from BERT
Spiking neural networks (SNNs) offer a promising avenue to implement deep
neural networks in a more energy-efficient way. However, the network
architectures of existing SNNs for language tasks are too simplistic, and deep
architectures have not been fully explored, resulting in a significant
performance gap compared to mainstream transformer-based networks such as BERT.
To this end, we improve a recently-proposed spiking transformer (i.e.,
Spikformer) to make it possible to process language tasks and propose a
two-stage knowledge distillation method for training it, which combines
pre-training by distilling knowledge from BERT with a large collection of
unlabelled texts and fine-tuning with task-specific instances via knowledge
distillation again from the BERT fine-tuned on the same training examples.
Through extensive experimentation, we show that the models trained with our
method, named SpikeBERT, outperform state-of-the-art SNNs and even achieve
comparable results to BERTs on text classification tasks for both English and
Chinese with much less energy consumption
Tailoring Personality Traits in Large Language Models via Unsupervisedly-Built Personalized Lexicons
Personality plays a pivotal role in shaping human expression patterns, thus
regulating the personality of large language models (LLMs) holds significant
potential in enhancing the user experience of LLMs. Previous methods either
relied on fine-tuning LLMs on specific corpora or necessitated manually crafted
prompts to elicit specific personalities from LLMs. However, the former
approach is inefficient and costly, while the latter cannot precisely
manipulate personality traits at a fine-grained level. To address the above
challenges, we have employed a novel Unsupervisedly-Built Personalized Lexicons
(UBPL) in a pluggable manner during the decoding phase of LLMs to manipulate
their personality traits. UBPL is a lexicon built through an unsupervised
approach from a situational judgment test dataset (SJTs4LLM). Users can utilize
UBPL to adjust the probability vectors of predicted words in the decoding phase
of LLMs, thus influencing the personality expression of LLMs. Extensive
experimentation demonstrates the remarkable effectiveness and pluggability of
our method for fine-grained manipulation of LLM's personality.Comment: Work in progres
Elucidating the multifaceted roles of GPR146 in non-specific orbital inflammation: a concerted analytical approach through the prisms of bioinformatics and machine learning
BackgroundNon-specific Orbital Inflammation (NSOI) is a chronic idiopathic condition marked by extensive polymorphic lymphoid infiltration in the orbital area. The integration of metabolic and immune pathways suggests potential therapeutic roles for C-peptide and G protein-coupled receptor 146 (GPR146) in diabetes and its sequelae. However, the specific mechanisms through which GPR146 modulates immune responses remain poorly understood. Furthermore, the utility of GPR146 as a diagnostic or prognostic marker for NSOI has not been conclusively demonstrated.MethodsWe adopted a comprehensive analytical strategy, merging differentially expressed genes (DEGs) from the Gene Expression Omnibus (GEO) datasets GSE58331 and GSE105149 with immune-related genes from the ImmPort database. Our methodology combined LASSO regression and support vector machine-recursive feature elimination (SVM-RFE) for feature selection, followed by Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA) to explore gene sets co-expressed with GPR146, identifying a significant enrichment in immune-related pathways. The tumor microenvironment’s immune composition was quantified using the CIBERSORT algorithm and the ESTIMATE method, which confirmed a positive correlation between GPR146 expression and immune cell infiltration. Validation of GPR146 expression was performed using the GSE58331 dataset.ResultsAnalysis identified 113 DEGs associated with GPR146, with a significant subset showing distinct expression patterns. Using LASSO and SVM-RFE, we pinpointed 15 key hub genes. Functionally, these genes and GPR146 were predominantly linked to receptor ligand activity, immune receptor activity, and cytokine-mediated signaling. Specific immune cells, such as memory B cells, M2 macrophages, resting mast cells, monocytes, activated NK cells, plasma cells, and CD8+ T cells, were positively associated with GPR146 expression. In contrast, M0 macrophages, naive B cells, M1 macrophages, activated mast cells, activated memory CD4+ T cells, naive CD4+ T cells, and gamma delta T cells showed inverse correlations. Notably, our findings underscore the potential diagnostic relevance of GPR146 in distinguishing NSOI.ConclusionOur study elucidates the immunological signatures associated with GPR146 in the context of NSOI, highlighting its prognostic and diagnostic potential. These insights pave the way for GPR146 to be a novel biomarker for monitoring the progression of NSOI, providing a foundation for future therapeutic strategies targeting immune-metabolic pathways
Experimental quantum adversarial learning with programmable superconducting qubits
Quantum computing promises to enhance machine learning and artificial
intelligence. Different quantum algorithms have been proposed to improve a wide
spectrum of machine learning tasks. Yet, recent theoretical works show that,
similar to traditional classifiers based on deep classical neural networks,
quantum classifiers would suffer from the vulnerability problem: adding tiny
carefully-crafted perturbations to the legitimate original data samples would
facilitate incorrect predictions at a notably high confidence level. This will
pose serious problems for future quantum machine learning applications in
safety and security-critical scenarios. Here, we report the first experimental
demonstration of quantum adversarial learning with programmable superconducting
qubits. We train quantum classifiers, which are built upon variational quantum
circuits consisting of ten transmon qubits featuring average lifetimes of 150
s, and average fidelities of simultaneous single- and two-qubit gates
above 99.94% and 99.4% respectively, with both real-life images (e.g., medical
magnetic resonance imaging scans) and quantum data. We demonstrate that these
well-trained classifiers (with testing accuracy up to 99%) can be practically
deceived by small adversarial perturbations, whereas an adversarial training
process would significantly enhance their robustness to such perturbations. Our
results reveal experimentally a crucial vulnerability aspect of quantum
learning systems under adversarial scenarios and demonstrate an effective
defense strategy against adversarial attacks, which provide a valuable guide
for quantum artificial intelligence applications with both near-term and future
quantum devices.Comment: 26 pages, 17 figures, 8 algorithm
CCL21/CCR7 Prevents Apoptosis via the ERK Pathway in Human Non-Small Cell Lung Cancer Cells
Previously, we confirmed that C-C chemokine receptor 7 (CCR7) promotes cell proliferation via the extracellular signal-regulated kinase (ERK) pathway, but its role in apoptosis of non-small cell lung cancer (NSCLC) cell lines remains unknown. A549 and H460 cells of NSCLC were used to examine the effect of CCL21/CCR7 on apoptosis using flow cytometry. The results showed that activation of CCR7 by its specific ligand, exogenous chemokine ligand 21 (CCL21), was associated with a significant decline in the percent of apoptosis. Western blot and real-time PCR assays indicated that activation of CCR7 significantly caused upregulation of anti-apoptotic bcl-2 and downregulation of pro-apoptotic bax and caspase-3, but not p53, at both protein and mRNA levels. CCR7 small interfering RNA significantly attenuated these effects of exogenous CCL21. Besides, PD98059, a selective inhibitor of MEK that disrupts the activation of downstream ERK, significantly abolished these effects of CCL21/CCR7. Coimmunoprecipitation further confirmed that there was an interaction between p-ERK and bcl-2, bax, or caspase-3, particularly in the presence of CCL21. These results strongly suggest that CCL21/CCR7 prevents apoptosis by upregulating the expression of bcl-2 and by downregulating the expression of bax and caspase-3 potentially via the ERK pathway in A549 and H460 cells of NSCLC
Impact of opioid-free analgesia on pain severity and patient satisfaction after discharge from surgery: multispecialty, prospective cohort study in 25 countries
Background: Balancing opioid stewardship and the need for adequate analgesia following discharge after surgery is challenging. This study aimed to compare the outcomes for patients discharged with opioid versus opioid-free analgesia after common surgical procedures.Methods: This international, multicentre, prospective cohort study collected data from patients undergoing common acute and elective general surgical, urological, gynaecological, and orthopaedic procedures. The primary outcomes were patient-reported time in severe pain measured on a numerical analogue scale from 0 to 100% and patient-reported satisfaction with pain relief during the first week following discharge. Data were collected by in-hospital chart review and patient telephone interview 1 week after discharge.Results: The study recruited 4273 patients from 144 centres in 25 countries; 1311 patients (30.7%) were prescribed opioid analgesia at discharge. Patients reported being in severe pain for 10 (i.q.r. 1-30)% of the first week after discharge and rated satisfaction with analgesia as 90 (i.q.r. 80-100) of 100. After adjustment for confounders, opioid analgesia on discharge was independently associated with increased pain severity (risk ratio 1.52, 95% c.i. 1.31 to 1.76; P < 0.001) and re-presentation to healthcare providers owing to side-effects of medication (OR 2.38, 95% c.i. 1.36 to 4.17; P = 0.004), but not with satisfaction with analgesia (beta coefficient 0.92, 95% c.i. -1.52 to 3.36; P = 0.468) compared with opioid-free analgesia. Although opioid prescribing varied greatly between high-income and low- and middle-income countries, patient-reported outcomes did not.Conclusion: Opioid analgesia prescription on surgical discharge is associated with a higher risk of re-presentation owing to side-effects of medication and increased patient-reported pain, but not with changes in patient-reported satisfaction. Opioid-free discharge analgesia should be adopted routinely