32 research outputs found

    Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

    Full text link
    The Segment Anything Model (SAM) stands as a foundational framework for image segmentation. While it exhibits remarkable zero-shot generalization in typical scenarios, its advantage diminishes when applied to specialized domains like medical imagery and remote sensing. To address this limitation, this paper introduces Conv-LoRA, a simple yet effective parameter-efficient fine-tuning approach. By integrating ultra-lightweight convolutional parameters into Low-Rank Adaptation (LoRA), Conv-LoRA can inject image-related inductive biases into the plain ViT encoder, further reinforcing SAM's local prior assumption. Notably, Conv-LoRA not only preserves SAM's extensive segmentation knowledge but also revives its capacity of learning high-level image semantics, which is constrained by SAM's foreground-background segmentation pretraining. Comprehensive experimentation across diverse benchmarks spanning multiple domains underscores Conv-LoRA's superiority in adapting SAM to real-world semantic segmentation tasks.Comment: Accepted at ICLR 2024 Conferenc

    Towards Robust Text Retrieval with Progressive Learning

    Full text link
    Retrieval augmentation has become an effective solution to empower large language models (LLMs) with external and verified knowledge sources from the database, which overcomes the limitations and hallucinations of LLMs in handling up-to-date and domain-specific information. However, existing embedding models for text retrieval usually have three non-negligible limitations. First, the number and diversity of samples in a batch are too restricted to supervise the modeling of textual nuances at scale. Second, the high proportional noise are detrimental to the semantic correctness and consistency of embeddings. Third, the equal treatment to easy and difficult samples would cause sub-optimum convergence of embeddings with poorer generalization. In this paper, we propose the PEG, a progressively learned embeddings for robust text retrieval. Specifically, we increase the training in-batch negative samples to 80,000, and for each query, we extracted five hard negatives. Concurrently, we incorporated a progressive learning mechanism, enabling the model to dynamically modulate its attention to the samples throughout the entire training process. Additionally, PEG is trained on more than 100 million data, encompassing a wide range of domains (e.g., finance, medicine, and tourism) and covering various tasks (e.g., question-answering, machine reading comprehension, and similarity matching). Extensive experiments conducted on C-MTEB and DuReader demonstrate that PEG surpasses state-of-the-art embeddings in retrieving true positives, highlighting its significant potential for applications in LLMs. Our model is publicly available at https://huggingface.co/TownsWu/PEG

    ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

    Full text link
    Despite remarkable advances that large language models have achieved in chatbots, maintaining a non-toxic user-AI interactive environment has become increasingly critical nowadays. However, previous efforts in toxicity detection have been mostly based on benchmarks derived from social media content, leaving the unique challenges inherent to real-world user-AI interactions insufficiently explored. In this work, we introduce ToxicChat, a novel benchmark based on real user queries from an open-source chatbot. This benchmark contains the rich, nuanced phenomena that can be tricky for current toxicity detection models to identify, revealing a significant domain difference compared to social media content. Our systematic evaluation of models trained on existing toxicity datasets has shown their shortcomings when applied to this unique domain of ToxicChat. Our work illuminates the potentially overlooked challenges of toxicity detection in real-world user-AI conversations. In the future, ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions

    SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger

    Full text link
    During the preceding biennium, vision-language pre-training has achieved noteworthy success on several downstream tasks. Nevertheless, acquiring high-quality image-text pairs, where the pairs are entirely exclusive of each other, remains a challenging task, and noise exists in the commonly used datasets. To address this issue, we propose SoftCLIP, a novel approach that relaxes the strict one-to-one constraint and achieves a soft cross-modal alignment by introducing a softened target, which is generated from the fine-grained intra-modal self-similarity. The intra-modal guidance is indicative to enable two pairs have some local similarities and model many-to-many relationships between the two modalities. Besides, since the positive still dominates in the softened target distribution, we disentangle the negatives in the distribution to further boost the relation alignment with the negatives in the cross-modal learning. Extensive experiments demonstrate the effectiveness of SoftCLIP. In particular, on ImageNet zero-shot classification task, using CC3M/CC12M as pre-training dataset, SoftCLIP brings a top-1 accuracy improvement of 6.8%/7.2% over the CLIP baseline

    Sinkhorn Distance Minimization for Knowledge Distillation

    Full text link
    Knowledge distillation (KD) has been widely adopted to compress large language models (LLMs). Existing KD methods investigate various divergence measures including the Kullback-Leibler (KL), reverse Kullback-Leibler (RKL), and Jensen-Shannon (JS) divergences. However, due to limitations inherent in their assumptions and definitions, these measures fail to deliver effective supervision when few distribution overlap exists between the teacher and the student. In this paper, we show that the aforementioned KL, RKL, and JS divergences respectively suffer from issues of mode-averaging, mode-collapsing, and mode-underestimation, which deteriorates logits-based KD for diverse NLP tasks. We propose the Sinkhorn Knowledge Distillation (SinKD) that exploits the Sinkhorn distance to ensure a nuanced and precise assessment of the disparity between teacher and student distributions. Besides, profit by properties of the Sinkhorn metric, we can get rid of sample-wise KD that restricts the perception of divergence in each teacher-student sample pair. Instead, we propose a batch-wise reformulation to capture geometric intricacies of distributions across samples in the high-dimensional space. Comprehensive evaluation on GLUE and SuperGLUE, in terms of comparability, validity, and generalizability, highlights our superiority over state-of-the-art methods on all kinds of LLMs with encoder-only, encoder-decoder, and decoder-only architectures.Comment: Accepted by COLING 202

    Challenges and recent advancements of functionalization of two-dimensional nanostructured molybdenum trioxide and dichalcogenides

    No full text
    Atomically-thin two-dimensional (2D) semiconductors are the thinnest functional semiconducting materials available today. Among them, both molybdenum trioxide and chalcogenides (MT&Ds) represent key components within the family of the different 2D semiconductors for various electronic, optoelectronic and electrochemical applications due to their unique electronic, optical, mechanical and electrochemical properties. However, despite great progress in research dedicated to the development and fabrication of 2D MT&Ds observed within the last decade, there are significant challenges affected their charge transport behavior, fabrication on a large scale as well as high dependence of the carrier mobility on thickness. In this article, we review the recent progress on the carrier mobility engineering of 2D MT&Ds and elaborate devised strategies dedicated to the optimization of MT&Ds properties. Specifically, the latest physical and chemical methods towards the surface functionalization and optimization of the major factors influencing the extrinsic transport at the electrode-2D semiconductor interface are discusse

    LightMixer: A novel lightweight convolutional neural network for tomato disease detection

    Get PDF
    Tomatoes are among the very important crops grown worldwide. However, tomato diseases can harm the health of tomato plants during growth and reduce tomato yields over large areas. The development of computer vision technology offers the prospect of solving this problem. However, traditional deep learning algorithms require a high computational cost and several parameters. Therefore, a lightweight tomato leaf disease identification model called LightMixer was designed in this study. The LightMixer model comprises a depth convolution with a Phish module and a light residual module. Depth convolution with the Phish module represents a lightweight convolution module designed to splice nonlinear activation functions with depth convolution as the backbone; it also focuses on lightweight convolutional feature extraction to facilitate deep feature fusion. The light residual module was built based on lightweight residual blocks to accelerate the computational efficiency of the entire network architecture and reduce the information loss of disease features. Experimental results show that the proposed LightMixer model achieved 99.3% accuracy on public datasets while requiring only 1.5 M parameters, an improvement over other classical convolutional neural network and lightweight models, and can be used for automatic tomato leaf disease identification on mobile devices

    Research on Environmental Suitability Evaluation of the Transfer Spaces in Urban Subway Stations

    No full text
    The transfer space realizes the connectivity of subway intersections. Passengers generally express that they have a poor experience in the use of this space, so improving the environmental suitability of transfer spaces at subway stations is a top priority. Based on a literature review and field research, this study established an environmental suitability evaluation system for transfer spaces and used the fuzzy comprehensive evaluation method to evaluate the environmental suitability of eight samples in Shanghai. The results showed that the evaluation results of the eight samples were ranked as follows: Hanzhong Road Station > People’s Square Station > East Nanjing Road Station > Century Avenue Station > Xujiahui Station > Laoximen Station > Jiangsu Road Station > Shanghai Railway Station. Through the analysis of the relationship between the indicators, it was found that the environmental suitability of a transfer space is greatly affected by safety and convenience, while practicality, comfort, and aesthetics were found to have a weak influence on the suitability of transfer spaces. These evaluation methods and results provide a reference for the improvement of the environmental quality of subway transfer spaces in other cities

    Effect of Negative Valve Overlap on Combustion and Emissions of CNG-Fueled HCCI Engine with Hydrogen Addition

    No full text
    In order to study the effect of negative valve overlap on combustion and emission characteristics of a homogeneous charge compression ignition engine fueled with natural gas and hydrogen, the test and the simulation were conducted using an engine cycle model coupling the chemical kinetic reaction mechanism under different valve timing conditions. Results show that the internal EGR formed by using negative valve overlap could heat the inlet mixtures and improve the spontaneous ignition characteristic of the engine. The residual exhaust gas could slow down the heat release rate, decrease the pressure rise rate and the maximum combustion temperature, and reduce the NOx emission simultaneously. Among the three NVO schemes, the strategy of changing the intake valve opening timing individually can create the least power loss, and the symmetric NVO strategy which changes both the exhaust valve closing timing and the intake valve opening timing simultaneously can achieve the best heating effect of inlet mixtures and the satisfactory decrease of combustion temperature, as well as the largest reduction of NOx emission

    Spatial-temporal evolution of overweight and obesity among Chinese adolescents from 2016 to 2020

    No full text
    Summary: This study examines the spatial-temporal evolution of overweight and obesity among Chinese adolescents aged 14–17. Data from five national surveys conducted between 2016 and 2020 were analyzed to determine distribution patterns and trends. Results showed that overweight and obesity exhibit spatial clustering, with greater severity in the north and less severity in the south. The issue has spread from the northeast to the southwest of Mainland China. Using a local autocorrelation model, the regions were divided into a northern disease cold spot area (Inner Mongolia) and a southern disease hot spot area (Guangxi). Over the past five years, overweight rates among Chinese adolescents have not been effectively curbed, but obesity has shown some success in control and reversal until 2019. Future efforts should focus on the spatial-temporal pattern of disease spread, targeting hotspot areas and abnormal values for regional synergy and precise prevention and control
    corecore