95 research outputs found
The Counterattack of CNNs in Self-Supervised Learning: Larger Kernel Size might be All You Need
Vision Transformers have been rapidly uprising in computer vision thanks to
their outstanding scaling trends, and gradually replacing convolutional neural
networks (CNNs). Recent works on self-supervised learning (SSL) introduce
siamese pre-training tasks, on which Transformer backbones continue to
demonstrate ever stronger results than CNNs. People come to believe that
Transformers or self-attention modules are inherently more suitable than CNNs
in the context of SSL. However, it is noteworthy that most if not all prior
arts of SSL with CNNs chose the standard ResNets as their backbones, whose
architecture effectiveness is known to already lag behind advanced Vision
Transformers. Therefore, it remains unclear whether the self-attention
operation is crucial for the recent advances in SSL - or CNNs can deliver the
same excellence with more advanced designs, too? Can we close the SSL
performance gap between Transformers and CNNs? To answer these intriguing
questions, we apply self-supervised pre-training to the recently proposed,
stronger lager-kernel CNN architecture and conduct an apple-to-apple comparison
with Transformers, in their SSL performance. Our results show that we are able
to build pure CNN SSL architectures that perform on par with or better than the
best SSL-trained Transformers, by just scaling up convolutional kernel sizes
besides other small tweaks. Impressively, when transferring to the downstream
tasks \texttt{MS COCO} detection and segmentation, our SSL pre-trained CNN
model (trained in 100 epochs) achieves the same good performance as the
300-epoch pre-trained Transformer counterpart. We hope this work can help to
better understand what is essential (or not) for self-supervised learning
backbones
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
The Diffusion model, a prevalent framework for image generation, encounters
significant challenges in terms of broad applicability due to its extended
inference times and substantial memory requirements. Efficient Post-training
Quantization (PTQ) is pivotal for addressing these issues in traditional
models. Different from traditional models, diffusion models heavily depend on
the time-step to achieve satisfactory multi-round denoising. Usually,
from the finite set is encoded to a temporal feature by a
few modules totally irrespective of the sampling data. However, existing PTQ
methods do not optimize these modules separately. They adopt inappropriate
reconstruction targets and complex calibration methods, resulting in a severe
disturbance of the temporal feature and denoising trajectory, as well as a low
compression efficiency. To solve these, we propose a Temporal Feature
Maintenance Quantization (TFMQ) framework building upon a Temporal Information
Block which is just related to the time-step and unrelated to the sampling
data. Powered by the pioneering block design, we devise temporal information
aware reconstruction (TIAR) and finite set calibration (FSC) to align the
full-precision temporal features in a limited time. Equipped with the
framework, we can maintain the most temporal information and ensure the
end-to-end generation quality. Extensive experiments on various datasets and
diffusion models prove our state-of-the-art results. Remarkably, our
quantization approach, for the first time, achieves model performance nearly on
par with the full-precision model under 4-bit weight quantization.
Additionally, our method incurs almost no extra computational cost and
accelerates quantization time by on LSUN-Bedrooms
compared to previous works. Our code is publicly available at
https://github.com/ModelTC/TFMQ-DM
Regularity of a Stochastic Fractional Delayed Reaction-Diffusion Equation Driven by Lévy Noise
The current paper is devoted to the regularity of the mild solution for a stochastic fractional delayed reaction-diffusion equation driven by Lévy space-time white noise. By the Banach fixed point theorem, the existence and uniqueness of the mild solution are proved in the proper working function space which is affected by the delays. Furthermore, the time regularity and space regularity of the mild solution are established respectively. The main results show that both time regularity and space regularity of the mild solution depend on the regularity of initial value and the order of fractional operator. In particular, the time regularity is affected by the regularity of initial value with delays
SpikeBERT: A Language Spikformer Trained with Two-Stage Knowledge Distillation from BERT
Spiking neural networks (SNNs) offer a promising avenue to implement deep
neural networks in a more energy-efficient way. However, the network
architectures of existing SNNs for language tasks are too simplistic, and deep
architectures have not been fully explored, resulting in a significant
performance gap compared to mainstream transformer-based networks such as BERT.
To this end, we improve a recently-proposed spiking transformer (i.e.,
Spikformer) to make it possible to process language tasks and propose a
two-stage knowledge distillation method for training it, which combines
pre-training by distilling knowledge from BERT with a large collection of
unlabelled texts and fine-tuning with task-specific instances via knowledge
distillation again from the BERT fine-tuned on the same training examples.
Through extensive experimentation, we show that the models trained with our
method, named SpikeBERT, outperform state-of-the-art SNNs and even achieve
comparable results to BERTs on text classification tasks for both English and
Chinese with much less energy consumption
Anisotropic magnetic properties and tunable conductivity in two-dimensional layered NaCrX2 (X=Te,Se,S) single crystals
Monolayer NaCrX2 (X=Te,Se,S) were theoretically proposed to be
two-dimensional intrinsic ferromagnetic semiconductors while their physical
properties have not been thoroughly investigated in bulk single crystals. We
report the single-crystal growth, structural, magnetic and electronic transport
properties of NaCr(Te1-xSex)2 (0 6 x 6 1) and NaCrS2. For NaCr(Te1-xSex)2, the
strong perpendicular magnetic anisotropy of NaCrTe2 can be gradually tuned to
be a nearly isotropic one by Se-doping. Meanwhile, a systematic change in the
conductivity with increasing x is observed, displaying a doping-induced
metal-insulator-like transition. Under magnetic field larger than 30 koe, both
NaCrTe2 and NaCrSe2 can be polarized to a ferromagnetic state. While for
NaCrS2, robust antiferromagnetism is observed up to 70 kOe and two
field-induced metamagnetic transitions are identified along H||ab. These
intriguing properties together with the potential to be exfoliated down to
few-layer thickness make NaCrX2 (X=Te,Se,S) promising for exploring spintronic
applications
Tailoring Personality Traits in Large Language Models via Unsupervisedly-Built Personalized Lexicons
Personality plays a pivotal role in shaping human expression patterns, thus
regulating the personality of large language models (LLMs) holds significant
potential in enhancing the user experience of LLMs. Previous methods either
relied on fine-tuning LLMs on specific corpora or necessitated manually crafted
prompts to elicit specific personalities from LLMs. However, the former
approach is inefficient and costly, while the latter cannot precisely
manipulate personality traits at a fine-grained level. To address the above
challenges, we have employed a novel Unsupervisedly-Built Personalized Lexicons
(UBPL) in a pluggable manner during the decoding phase of LLMs to manipulate
their personality traits. UBPL is a lexicon built through an unsupervised
approach from a situational judgment test dataset (SJTs4LLM). Users can utilize
UBPL to adjust the probability vectors of predicted words in the decoding phase
of LLMs, thus influencing the personality expression of LLMs. Extensive
experimentation demonstrates the remarkable effectiveness and pluggability of
our method for fine-grained manipulation of LLM's personality.Comment: Work in progres
You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets
Recent works have impressively demonstrated that there exists a subnetwork in
randomly initialized convolutional neural networks (CNNs) that can match the
performance of the fully trained dense networks at initialization, without any
optimization of the weights of the network (i.e., untrained networks). However,
the presence of such untrained subnetworks in graph neural networks (GNNs)
still remains mysterious. In this paper we carry out the first-of-its-kind
exploration of discovering matching untrained GNNs. With sparsity as the core
tool, we can find \textit{untrained sparse subnetworks} at the initialization,
that can match the performance of \textit{fully trained dense} GNNs. Besides
this already encouraging finding of comparable performance, we show that the
found untrained subnetworks can substantially mitigate the GNN over-smoothing
problem, hence becoming a powerful tool to enable deeper GNNs without bells and
whistles. We also observe that such sparse untrained subnetworks have appealing
performance in out-of-distribution detection and robustness of input
perturbations. We evaluate our method across widely-used GNN architectures on
various popular datasets including the Open Graph Benchmark (OGB).Comment: Accepted by the LoG conference 2022 as a spotligh
Using Google web search to analyze and evaluate the application of ChatGPT in femoroacetabular impingement syndrome
BackgroundChat Generative Pre-trained Transformer (ChatGPT) is a new machine learning tool that allows patients to access health information online, specifically compared to Google, the most commonly used search engine in the United States. Patients can use ChatGPT to better understand medical issues. This study compared the two search engines based on: (i) frequently asked questions (FAQs) about Femoroacetabular Impingement Syndrome (FAI), (ii) the corresponding answers to these FAQs, and (iii) the most FAQs yielding a numerical response.PurposeTo assess the suitability of ChatGPT as an online health information resource for patients by replicating their internet searches.Study designCross-sectional study.MethodsThe same keywords were used to search the 10 most common questions about FAI on both Google and ChatGPT. The responses from both search engines were recorded and analyzed.ResultsOf the 20 questions, 8 (40%) were similar. Among the 10 questions searched on Google, 7 were provided by a medical practice. For numerical questions, there was a notable difference in answers between Google and ChatGPT for 3 out of the top 5 most common questions (60%). Expert evaluation indicated that 67.5% of experts were satisfied or highly satisfied with the accuracy of ChatGPT’s descriptions of both conservative and surgical treatment options for FAI. Additionally, 62.5% of experts were satisfied or highly satisfied with the safety of the information provided. Regarding the etiology of FAI, including cam and pincer impingements, 52.5% of experts expressed satisfaction or high satisfaction with ChatGPT’s explanations. Overall, 62.5% of experts affirmed that ChatGPT could serve effectively as a reliable medical resource for initial information retrieval.ConclusionThis study confirms that ChatGPT, despite being a new tool, shows significant potential as a supplementary resource for health information on FAI. Expert evaluations commend its capacity to provide accurate and comprehensive responses, valued by medical professionals for relevance and safety. Nonetheless, continuous improvements in its medical content’s depth and precision are recommended for ongoing reliability. While ChatGPT offers a promising alternative to traditional search engines, meticulous validation is imperative before it can be fully embraced as a trusted medical resource
Nobiletin Inhibits IL-1β-Induced Inflammation in Chondrocytes via Suppression of NF-κB Signaling and Attenuates Osteoarthritis in Mice
Osteoarthritis (OA), a common degenerative joint disease, is principally characterized by inflammation and destruction of cartilage. Nobiletin, an extract of the peel of citrus fruits, is known to have anti-inflammatory properties. However, the mechanisms by which nobiletin plays a protective role in osteoarthritis (OA) are not completely understood. In the present study, we investigated the anti-inflammatory effects of nobiletin in the progression of OA in both in vitro and in vivo experiments. Mouse chondrocytes were pretreated with nobiletin (0, 10, 20, 40 μM) for 24 h and then incubated with IL-1β (10 ng/ml, 24 h) in vitro. The generation of PGE2 and NO was evaluated by the Griess reaction and ELISAs. The protein expression of inducible nitric oxide synthase, matrix metalloproteinase-3, matrix metalloproteinase-13, A disintegrin and metalloproteinase with thrombospondin motifs-5 (ADAMTS5), cyclooxygenase-2, collagen II, and aggrecan was analyzed by Western blotting. Immunofluorescence and Western blot analysis were used to detect nuclear factor-κB (NF-κB) signaling molecules. Induction of proinflammatory and catabolic mediators by IL-1β stimulation of mouse chondrocytes could be partially blocked by treatment with nobiletin or ammonium pyrrolidine dithiocarbamate (an NF-κB inhibitor). Furthermore, our results indicated that nobiletin exhibited a therapeutic effect through active inhibition of the NF-κB signaling pathway. In a mouse model of OA, injection of nobiletin (20 mg/kg) every 2 days for 8 weeks after surgery inhibited cartilage destruction and synovitis. Taken together, our findings suggest that nobiletin may be a potential therapeutic agent for the treatment of OA
- …