95 research outputs found

    The Counterattack of CNNs in Self-Supervised Learning: Larger Kernel Size might be All You Need

    Full text link
    Vision Transformers have been rapidly uprising in computer vision thanks to their outstanding scaling trends, and gradually replacing convolutional neural networks (CNNs). Recent works on self-supervised learning (SSL) introduce siamese pre-training tasks, on which Transformer backbones continue to demonstrate ever stronger results than CNNs. People come to believe that Transformers or self-attention modules are inherently more suitable than CNNs in the context of SSL. However, it is noteworthy that most if not all prior arts of SSL with CNNs chose the standard ResNets as their backbones, whose architecture effectiveness is known to already lag behind advanced Vision Transformers. Therefore, it remains unclear whether the self-attention operation is crucial for the recent advances in SSL - or CNNs can deliver the same excellence with more advanced designs, too? Can we close the SSL performance gap between Transformers and CNNs? To answer these intriguing questions, we apply self-supervised pre-training to the recently proposed, stronger lager-kernel CNN architecture and conduct an apple-to-apple comparison with Transformers, in their SSL performance. Our results show that we are able to build pure CNN SSL architectures that perform on par with or better than the best SSL-trained Transformers, by just scaling up convolutional kernel sizes besides other small tweaks. Impressively, when transferring to the downstream tasks \texttt{MS COCO} detection and segmentation, our SSL pre-trained CNN model (trained in 100 epochs) achieves the same good performance as the 300-epoch pre-trained Transformer counterpart. We hope this work can help to better understand what is essential (or not) for self-supervised learning backbones

    TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

    Full text link
    The Diffusion model, a prevalent framework for image generation, encounters significant challenges in terms of broad applicability due to its extended inference times and substantial memory requirements. Efficient Post-training Quantization (PTQ) is pivotal for addressing these issues in traditional models. Different from traditional models, diffusion models heavily depend on the time-step tt to achieve satisfactory multi-round denoising. Usually, tt from the finite set {1,…,T}\{1, \ldots, T\} is encoded to a temporal feature by a few modules totally irrespective of the sampling data. However, existing PTQ methods do not optimize these modules separately. They adopt inappropriate reconstruction targets and complex calibration methods, resulting in a severe disturbance of the temporal feature and denoising trajectory, as well as a low compression efficiency. To solve these, we propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block which is just related to the time-step tt and unrelated to the sampling data. Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the full-precision temporal features in a limited time. Equipped with the framework, we can maintain the most temporal information and ensure the end-to-end generation quality. Extensive experiments on various datasets and diffusion models prove our state-of-the-art results. Remarkably, our quantization approach, for the first time, achieves model performance nearly on par with the full-precision model under 4-bit weight quantization. Additionally, our method incurs almost no extra computational cost and accelerates quantization time by 2.0×2.0 \times on LSUN-Bedrooms 256×256256 \times 256 compared to previous works. Our code is publicly available at https://github.com/ModelTC/TFMQ-DM

    Regularity of a Stochastic Fractional Delayed Reaction-Diffusion Equation Driven by Lévy Noise

    Get PDF
    The current paper is devoted to the regularity of the mild solution for a stochastic fractional delayed reaction-diffusion equation driven by Lévy space-time white noise. By the Banach fixed point theorem, the existence and uniqueness of the mild solution are proved in the proper working function space which is affected by the delays. Furthermore, the time regularity and space regularity of the mild solution are established respectively. The main results show that both time regularity and space regularity of the mild solution depend on the regularity of initial value and the order of fractional operator. In particular, the time regularity is affected by the regularity of initial value with delays

    SpikeBERT: A Language Spikformer Trained with Two-Stage Knowledge Distillation from BERT

    Full text link
    Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. However, the network architectures of existing SNNs for language tasks are too simplistic, and deep architectures have not been fully explored, resulting in a significant performance gap compared to mainstream transformer-based networks such as BERT. To this end, we improve a recently-proposed spiking transformer (i.e., Spikformer) to make it possible to process language tasks and propose a two-stage knowledge distillation method for training it, which combines pre-training by distilling knowledge from BERT with a large collection of unlabelled texts and fine-tuning with task-specific instances via knowledge distillation again from the BERT fine-tuned on the same training examples. Through extensive experimentation, we show that the models trained with our method, named SpikeBERT, outperform state-of-the-art SNNs and even achieve comparable results to BERTs on text classification tasks for both English and Chinese with much less energy consumption

    Anisotropic magnetic properties and tunable conductivity in two-dimensional layered NaCrX2 (X=Te,Se,S) single crystals

    Full text link
    Monolayer NaCrX2 (X=Te,Se,S) were theoretically proposed to be two-dimensional intrinsic ferromagnetic semiconductors while their physical properties have not been thoroughly investigated in bulk single crystals. We report the single-crystal growth, structural, magnetic and electronic transport properties of NaCr(Te1-xSex)2 (0 6 x 6 1) and NaCrS2. For NaCr(Te1-xSex)2, the strong perpendicular magnetic anisotropy of NaCrTe2 can be gradually tuned to be a nearly isotropic one by Se-doping. Meanwhile, a systematic change in the conductivity with increasing x is observed, displaying a doping-induced metal-insulator-like transition. Under magnetic field larger than 30 koe, both NaCrTe2 and NaCrSe2 can be polarized to a ferromagnetic state. While for NaCrS2, robust antiferromagnetism is observed up to 70 kOe and two field-induced metamagnetic transitions are identified along H||ab. These intriguing properties together with the potential to be exfoliated down to few-layer thickness make NaCrX2 (X=Te,Se,S) promising for exploring spintronic applications

    Tailoring Personality Traits in Large Language Models via Unsupervisedly-Built Personalized Lexicons

    Full text link
    Personality plays a pivotal role in shaping human expression patterns, thus regulating the personality of large language models (LLMs) holds significant potential in enhancing the user experience of LLMs. Previous methods either relied on fine-tuning LLMs on specific corpora or necessitated manually crafted prompts to elicit specific personalities from LLMs. However, the former approach is inefficient and costly, while the latter cannot precisely manipulate personality traits at a fine-grained level. To address the above challenges, we have employed a novel Unsupervisedly-Built Personalized Lexicons (UBPL) in a pluggable manner during the decoding phase of LLMs to manipulate their personality traits. UBPL is a lexicon built through an unsupervised approach from a situational judgment test dataset (SJTs4LLM). Users can utilize UBPL to adjust the probability vectors of predicted words in the decoding phase of LLMs, thus influencing the personality expression of LLMs. Extensive experimentation demonstrates the remarkable effectiveness and pluggability of our method for fine-grained manipulation of LLM's personality.Comment: Work in progres

    You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets

    Full text link
    Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i.e., untrained networks). However, the presence of such untrained subnetworks in graph neural networks (GNNs) still remains mysterious. In this paper we carry out the first-of-its-kind exploration of discovering matching untrained GNNs. With sparsity as the core tool, we can find \textit{untrained sparse subnetworks} at the initialization, that can match the performance of \textit{fully trained dense} GNNs. Besides this already encouraging finding of comparable performance, we show that the found untrained subnetworks can substantially mitigate the GNN over-smoothing problem, hence becoming a powerful tool to enable deeper GNNs without bells and whistles. We also observe that such sparse untrained subnetworks have appealing performance in out-of-distribution detection and robustness of input perturbations. We evaluate our method across widely-used GNN architectures on various popular datasets including the Open Graph Benchmark (OGB).Comment: Accepted by the LoG conference 2022 as a spotligh

    Using Google web search to analyze and evaluate the application of ChatGPT in femoroacetabular impingement syndrome

    Get PDF
    BackgroundChat Generative Pre-trained Transformer (ChatGPT) is a new machine learning tool that allows patients to access health information online, specifically compared to Google, the most commonly used search engine in the United States. Patients can use ChatGPT to better understand medical issues. This study compared the two search engines based on: (i) frequently asked questions (FAQs) about Femoroacetabular Impingement Syndrome (FAI), (ii) the corresponding answers to these FAQs, and (iii) the most FAQs yielding a numerical response.PurposeTo assess the suitability of ChatGPT as an online health information resource for patients by replicating their internet searches.Study designCross-sectional study.MethodsThe same keywords were used to search the 10 most common questions about FAI on both Google and ChatGPT. The responses from both search engines were recorded and analyzed.ResultsOf the 20 questions, 8 (40%) were similar. Among the 10 questions searched on Google, 7 were provided by a medical practice. For numerical questions, there was a notable difference in answers between Google and ChatGPT for 3 out of the top 5 most common questions (60%). Expert evaluation indicated that 67.5% of experts were satisfied or highly satisfied with the accuracy of ChatGPT’s descriptions of both conservative and surgical treatment options for FAI. Additionally, 62.5% of experts were satisfied or highly satisfied with the safety of the information provided. Regarding the etiology of FAI, including cam and pincer impingements, 52.5% of experts expressed satisfaction or high satisfaction with ChatGPT’s explanations. Overall, 62.5% of experts affirmed that ChatGPT could serve effectively as a reliable medical resource for initial information retrieval.ConclusionThis study confirms that ChatGPT, despite being a new tool, shows significant potential as a supplementary resource for health information on FAI. Expert evaluations commend its capacity to provide accurate and comprehensive responses, valued by medical professionals for relevance and safety. Nonetheless, continuous improvements in its medical content’s depth and precision are recommended for ongoing reliability. While ChatGPT offers a promising alternative to traditional search engines, meticulous validation is imperative before it can be fully embraced as a trusted medical resource

    Nobiletin Inhibits IL-1β-Induced Inflammation in Chondrocytes via Suppression of NF-κB Signaling and Attenuates Osteoarthritis in Mice

    Get PDF
    Osteoarthritis (OA), a common degenerative joint disease, is principally characterized by inflammation and destruction of cartilage. Nobiletin, an extract of the peel of citrus fruits, is known to have anti-inflammatory properties. However, the mechanisms by which nobiletin plays a protective role in osteoarthritis (OA) are not completely understood. In the present study, we investigated the anti-inflammatory effects of nobiletin in the progression of OA in both in vitro and in vivo experiments. Mouse chondrocytes were pretreated with nobiletin (0, 10, 20, 40 μM) for 24 h and then incubated with IL-1β (10 ng/ml, 24 h) in vitro. The generation of PGE2 and NO was evaluated by the Griess reaction and ELISAs. The protein expression of inducible nitric oxide synthase, matrix metalloproteinase-3, matrix metalloproteinase-13, A disintegrin and metalloproteinase with thrombospondin motifs-5 (ADAMTS5), cyclooxygenase-2, collagen II, and aggrecan was analyzed by Western blotting. Immunofluorescence and Western blot analysis were used to detect nuclear factor-κB (NF-κB) signaling molecules. Induction of proinflammatory and catabolic mediators by IL-1β stimulation of mouse chondrocytes could be partially blocked by treatment with nobiletin or ammonium pyrrolidine dithiocarbamate (an NF-κB inhibitor). Furthermore, our results indicated that nobiletin exhibited a therapeutic effect through active inhibition of the NF-κB signaling pathway. In a mouse model of OA, injection of nobiletin (20 mg/kg) every 2 days for 8 weeks after surgery inhibited cartilage destruction and synovitis. Taken together, our findings suggest that nobiletin may be a potential therapeutic agent for the treatment of OA
    • …
    corecore