436 research outputs found
A Catalog of GAL4 Drivers for Labeling and Manipulating Circadian Clock Neurons in Drosophila melanogaster
Daily rhythms of physiology, metabolism, and behavior are orchestrated by a central circadian clock. In mice, this clock is coordinated by the suprachiasmatic nucleus, which consists of 20,000 neurons, making it challenging to characterize individual neurons. In Drosophila, the clock is controlled by only 150 clock neurons that distribute across the fly's brain. Here, we describe a comprehensive set of genetic drivers to facilitate individual characterization of Drosophila clock neurons. We screened GAL4 lines that were obtained from Drosophila stock centers and identified 63 lines that exhibit expression in subsets of central clock neurons. Furthermore, we generated split-GAL4 lines that exhibit specific expression in subsets of clock neurons such as the 2 DN2 neurons and the 6 LPN neurons. Together with existing driver lines, these newly identified ones are versatile tools that will facilitate a better understanding of the Drosophila central circadian clock
Towards Omni-supervised Referring Expression Segmentation
Referring Expression Segmentation (RES) is an emerging task in computer
vision, which segments the target instances in images based on text
descriptions. However, its development is plagued by the expensive segmentation
labels. To address this issue, we propose a new learning task for RES called
Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to
make full use of unlabeled, fully labeled and weakly labeled data, e.g.,
referring points or grounding boxes, for efficient RES training. To accomplish
this task, we also propose a novel yet strong baseline method for Omni-RES
based on the recently popular teacher-student learning, where the weak labels
are not directly transformed into supervision signals but used as a yardstick
to select and refine high-quality pseudo-masks for teacher-student learning. To
validate the proposed Omni-RES method, we apply it to a set of state-of-the-art
RES models and conduct extensive experiments on a bunch of RES datasets. The
experimental results yield the obvious merits of Omni-RES than the
fully-supervised and semi-supervised training schemes. For instance, with only
10% fully labeled data, Omni-RES can help the base model achieve 100% fully
supervised performance, and it also outperform the semi-supervised alternative
by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+,
respectively. More importantly, Omni-RES also enable the use of large-scale
vision-langauges like Visual Genome to facilitate low-cost RES training, and
achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Recently, growing interest has been aroused in extending the multimodal
capability of large language models (LLMs), e.g., vision-language (VL)
learning, which is regarded as the next milestone of artificial general
intelligence. However, existing solutions are prohibitively expensive, which
not only need to optimize excessive parameters, but also require another
large-scale pre-training before VL instruction tuning. In this paper, we
propose a novel and affordable solution for the effective VL adaption of LLMs,
called Mixture-of-Modality Adaptation (MMA). Instead of using large neural
networks to connect the image encoder and LLM, MMA adopts lightweight modules,
i.e., adapters, to bridge the gap between LLMs and VL tasks, which also enables
the joint optimization of the image and language models. Meanwhile, MMA is also
equipped with a routing algorithm to help LLMs achieve an automatic shift
between single- and multi-modal instructions without compromising their ability
of natural language understanding. To validate MMA, we apply it to a recent LLM
called LLaMA and term this formed large vision-language instructed model as
LaVIN. To validate MMA and LaVIN, we conduct extensive experiments under two
setups, namely multimodal science question answering and multimodal dialogue.
The experimental results not only demonstrate the competitive performance and
the superior training efficiency of LaVIN than existing multimodal LLMs, but
also confirm its great potential as a general-purpose chatbot. More
importantly, the actual expenditure of LaVIN is extremely cheap, e.g., only 1.4
training hours with 3.8M trainable parameters, greatly confirming the
effectiveness of MMA. Our project is released at
https://luogen1996.github.io/lavin
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation
In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts
a two-stage paradigm, extracting segmentation proposals and then matching them
with referring expressions. However, this conventional paradigm encounters
significant challenges, most notably in terms of the generation of lackluster
initial proposals and a pronounced deceleration in inference speed. Recognizing
these limitations, we introduce an innovative end-to-end Superpoint-Text
Matching Network (3D-STMN) that is enriched by dependency-driven insights. One
of the keystones of our model is the Superpoint-Text Matching (STM) mechanism.
Unlike traditional methods that navigate through instance proposals, STM
directly correlates linguistic indications with their respective superpoints,
clusters of semantically related points. This architectural decision empowers
our model to efficiently harness cross-modal semantic relationships, primarily
leveraging densely annotated superpoint-text pairs, as opposed to the more
sparse instance-text pairs. In pursuit of enhancing the role of text in guiding
the segmentation process, we further incorporate the Dependency-Driven
Interaction (DDI) module to deepen the network's semantic comprehension of
referring expressions. Using the dependency trees as a beacon, this module
discerns the intricate relationships between primary terms and their associated
descriptors in expressions, thereby elevating both the localization and
segmentation capacities of our model. Comprehensive experiments on the
ScanRefer benchmark reveal that our model not only set new performance
standards, registering an mIoU gain of 11.7 points but also achieve a
staggering enhancement in inference speed, surpassing traditional methods by
95.7 times. The code and models are available at
https://github.com/sosppxo/3D-STMN
Towards Efficient Visual Adaption via Structural Re-parameterization
Parameter-efficient transfer learning (PETL) is an emerging research spot
aimed at inexpensively adapting large-scale pre-trained models to downstream
tasks. Recent advances have achieved great success in saving storage costs for
various vision tasks by updating or injecting a small number of parameters
instead of full fine-tuning. However, we notice that most existing PETL methods
still incur non-negligible latency during inference. In this paper, we propose
a parameter-efficient and computationally friendly adapter for giant vision
models, called RepAdapter. Specifically, we prove that the adaption modules,
even with a complex structure, can be seamlessly integrated into most giant
vision models via structural re-parameterization. This property makes
RepAdapter zero-cost during inference. In addition to computation efficiency,
RepAdapter is more effective and lightweight than existing PETL methods due to
its sparse structure and our careful deployment. To validate RepAdapter, we
conduct extensive experiments on 27 benchmark datasets of three vision tasks,
i.e., image and video classifications and semantic segmentation. Experimental
results show the superior performance and efficiency of RepAdapter than the
state-of-the-art PETL methods. For instance, by updating only 0.6% parameters,
we can improve the performance of ViT from 38.8 to 55.1 on Sun397. Its
generalizability is also well validated by a bunch of vision models, i.e., ViT,
CLIP, Swin-Transformer and ConvNeXt. Our source code is released at
https://github.com/luogen1996/RepAdapter
- …