32 research outputs found
Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model
The Segment Anything Model (SAM) stands as a foundational framework for image
segmentation. While it exhibits remarkable zero-shot generalization in typical
scenarios, its advantage diminishes when applied to specialized domains like
medical imagery and remote sensing. To address this limitation, this paper
introduces Conv-LoRA, a simple yet effective parameter-efficient fine-tuning
approach. By integrating ultra-lightweight convolutional parameters into
Low-Rank Adaptation (LoRA), Conv-LoRA can inject image-related inductive biases
into the plain ViT encoder, further reinforcing SAM's local prior assumption.
Notably, Conv-LoRA not only preserves SAM's extensive segmentation knowledge
but also revives its capacity of learning high-level image semantics, which is
constrained by SAM's foreground-background segmentation pretraining.
Comprehensive experimentation across diverse benchmarks spanning multiple
domains underscores Conv-LoRA's superiority in adapting SAM to real-world
semantic segmentation tasks.Comment: Accepted at ICLR 2024 Conferenc
Towards Robust Text Retrieval with Progressive Learning
Retrieval augmentation has become an effective solution to empower large
language models (LLMs) with external and verified knowledge sources from the
database, which overcomes the limitations and hallucinations of LLMs in
handling up-to-date and domain-specific information. However, existing
embedding models for text retrieval usually have three non-negligible
limitations. First, the number and diversity of samples in a batch are too
restricted to supervise the modeling of textual nuances at scale. Second, the
high proportional noise are detrimental to the semantic correctness and
consistency of embeddings. Third, the equal treatment to easy and difficult
samples would cause sub-optimum convergence of embeddings with poorer
generalization. In this paper, we propose the PEG, a progressively learned
embeddings for robust text retrieval. Specifically, we increase the training
in-batch negative samples to 80,000, and for each query, we extracted five hard
negatives. Concurrently, we incorporated a progressive learning mechanism,
enabling the model to dynamically modulate its attention to the samples
throughout the entire training process. Additionally, PEG is trained on more
than 100 million data, encompassing a wide range of domains (e.g., finance,
medicine, and tourism) and covering various tasks (e.g., question-answering,
machine reading comprehension, and similarity matching). Extensive experiments
conducted on C-MTEB and DuReader demonstrate that PEG surpasses
state-of-the-art embeddings in retrieving true positives, highlighting its
significant potential for applications in LLMs. Our model is publicly available
at https://huggingface.co/TownsWu/PEG
ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation
Despite remarkable advances that large language models have achieved in
chatbots, maintaining a non-toxic user-AI interactive environment has become
increasingly critical nowadays. However, previous efforts in toxicity detection
have been mostly based on benchmarks derived from social media content, leaving
the unique challenges inherent to real-world user-AI interactions
insufficiently explored. In this work, we introduce ToxicChat, a novel
benchmark based on real user queries from an open-source chatbot. This
benchmark contains the rich, nuanced phenomena that can be tricky for current
toxicity detection models to identify, revealing a significant domain
difference compared to social media content. Our systematic evaluation of
models trained on existing toxicity datasets has shown their shortcomings when
applied to this unique domain of ToxicChat. Our work illuminates the
potentially overlooked challenges of toxicity detection in real-world user-AI
conversations. In the future, ToxicChat can be a valuable resource to drive
further advancements toward building a safe and healthy environment for user-AI
interactions
SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger
During the preceding biennium, vision-language pre-training has achieved
noteworthy success on several downstream tasks. Nevertheless, acquiring
high-quality image-text pairs, where the pairs are entirely exclusive of each
other, remains a challenging task, and noise exists in the commonly used
datasets. To address this issue, we propose SoftCLIP, a novel approach that
relaxes the strict one-to-one constraint and achieves a soft cross-modal
alignment by introducing a softened target, which is generated from the
fine-grained intra-modal self-similarity. The intra-modal guidance is
indicative to enable two pairs have some local similarities and model
many-to-many relationships between the two modalities. Besides, since the
positive still dominates in the softened target distribution, we disentangle
the negatives in the distribution to further boost the relation alignment with
the negatives in the cross-modal learning. Extensive experiments demonstrate
the effectiveness of SoftCLIP. In particular, on ImageNet zero-shot
classification task, using CC3M/CC12M as pre-training dataset, SoftCLIP brings
a top-1 accuracy improvement of 6.8%/7.2% over the CLIP baseline
Sinkhorn Distance Minimization for Knowledge Distillation
Knowledge distillation (KD) has been widely adopted to compress large
language models (LLMs). Existing KD methods investigate various divergence
measures including the Kullback-Leibler (KL), reverse Kullback-Leibler (RKL),
and Jensen-Shannon (JS) divergences. However, due to limitations inherent in
their assumptions and definitions, these measures fail to deliver effective
supervision when few distribution overlap exists between the teacher and the
student. In this paper, we show that the aforementioned KL, RKL, and JS
divergences respectively suffer from issues of mode-averaging, mode-collapsing,
and mode-underestimation, which deteriorates logits-based KD for diverse NLP
tasks. We propose the Sinkhorn Knowledge Distillation (SinKD) that exploits the
Sinkhorn distance to ensure a nuanced and precise assessment of the disparity
between teacher and student distributions. Besides, profit by properties of the
Sinkhorn metric, we can get rid of sample-wise KD that restricts the perception
of divergence in each teacher-student sample pair. Instead, we propose a
batch-wise reformulation to capture geometric intricacies of distributions
across samples in the high-dimensional space. Comprehensive evaluation on GLUE
and SuperGLUE, in terms of comparability, validity, and generalizability,
highlights our superiority over state-of-the-art methods on all kinds of LLMs
with encoder-only, encoder-decoder, and decoder-only architectures.Comment: Accepted by COLING 202
Challenges and recent advancements of functionalization of two-dimensional nanostructured molybdenum trioxide and dichalcogenides
Atomically-thin two-dimensional (2D) semiconductors are the thinnest functional semiconducting materials available today. Among them, both molybdenum trioxide and chalcogenides (MT&Ds) represent key components within the family of the different 2D semiconductors for various electronic, optoelectronic and electrochemical applications due to their unique electronic, optical, mechanical and electrochemical properties. However, despite great progress in research dedicated to the development and fabrication of 2D MT&Ds observed within the last decade, there are significant challenges affected their charge transport behavior, fabrication on a large scale as well as high dependence of the carrier mobility on thickness. In this article, we review the recent progress on the carrier mobility engineering of 2D MT&Ds and elaborate devised strategies dedicated to the optimization of MT&Ds properties. Specifically, the latest physical and chemical methods towards the surface functionalization and optimization of the major factors influencing the extrinsic transport at the electrode-2D semiconductor interface are discusse
LightMixer: A novel lightweight convolutional neural network for tomato disease detection
Tomatoes are among the very important crops grown worldwide. However, tomato diseases can harm the health of tomato plants during growth and reduce tomato yields over large areas. The development of computer vision technology offers the prospect of solving this problem. However, traditional deep learning algorithms require a high computational cost and several parameters. Therefore, a lightweight tomato leaf disease identification model called LightMixer was designed in this study. The LightMixer model comprises a depth convolution with a Phish module and a light residual module. Depth convolution with the Phish module represents a lightweight convolution module designed to splice nonlinear activation functions with depth convolution as the backbone; it also focuses on lightweight convolutional feature extraction to facilitate deep feature fusion. The light residual module was built based on lightweight residual blocks to accelerate the computational efficiency of the entire network architecture and reduce the information loss of disease features. Experimental results show that the proposed LightMixer model achieved 99.3% accuracy on public datasets while requiring only 1.5 M parameters, an improvement over other classical convolutional neural network and lightweight models, and can be used for automatic tomato leaf disease identification on mobile devices
Research on Environmental Suitability Evaluation of the Transfer Spaces in Urban Subway Stations
The transfer space realizes the connectivity of subway intersections. Passengers generally express that they have a poor experience in the use of this space, so improving the environmental suitability of transfer spaces at subway stations is a top priority. Based on a literature review and field research, this study established an environmental suitability evaluation system for transfer spaces and used the fuzzy comprehensive evaluation method to evaluate the environmental suitability of eight samples in Shanghai. The results showed that the evaluation results of the eight samples were ranked as follows: Hanzhong Road Station > People’s Square Station > East Nanjing Road Station > Century Avenue Station > Xujiahui Station > Laoximen Station > Jiangsu Road Station > Shanghai Railway Station. Through the analysis of the relationship between the indicators, it was found that the environmental suitability of a transfer space is greatly affected by safety and convenience, while practicality, comfort, and aesthetics were found to have a weak influence on the suitability of transfer spaces. These evaluation methods and results provide a reference for the improvement of the environmental quality of subway transfer spaces in other cities
Effect of Negative Valve Overlap on Combustion and Emissions of CNG-Fueled HCCI Engine with Hydrogen Addition
In order to study the effect of negative valve overlap on combustion and emission characteristics of a homogeneous charge compression ignition engine fueled with natural gas and hydrogen, the test and the simulation were conducted using an engine cycle model coupling the chemical kinetic reaction mechanism under different valve timing conditions. Results show that the internal EGR formed by using negative valve overlap could heat the inlet mixtures and improve the spontaneous ignition characteristic of the engine. The residual exhaust gas could slow down the heat release rate, decrease the pressure rise rate and the maximum combustion temperature, and reduce the NOx emission simultaneously. Among the three NVO schemes, the strategy of changing the intake valve opening timing individually can create the least power loss, and the symmetric NVO strategy which changes both the exhaust valve closing timing and the intake valve opening timing simultaneously can achieve the best heating effect of inlet mixtures and the satisfactory decrease of combustion temperature, as well as the largest reduction of NOx emission
Spatial-temporal evolution of overweight and obesity among Chinese adolescents from 2016 to 2020
Summary: This study examines the spatial-temporal evolution of overweight and obesity among Chinese adolescents aged 14–17. Data from five national surveys conducted between 2016 and 2020 were analyzed to determine distribution patterns and trends. Results showed that overweight and obesity exhibit spatial clustering, with greater severity in the north and less severity in the south. The issue has spread from the northeast to the southwest of Mainland China. Using a local autocorrelation model, the regions were divided into a northern disease cold spot area (Inner Mongolia) and a southern disease hot spot area (Guangxi). Over the past five years, overweight rates among Chinese adolescents have not been effectively curbed, but obesity has shown some success in control and reversal until 2019. Future efforts should focus on the spatial-temporal pattern of disease spread, targeting hotspot areas and abnormal values for regional synergy and precise prevention and control