1,238 research outputs found
Geranylgeranyltransferase I is essential for dendritic development of cerebellar Purkinje cells
<p>Abstract</p> <p>Background</p> <p>During cerebellar development, Purkinje cells (PCs) form the most elaborate dendritic trees among neurons in the brain, but the mechanism regulating PC arborization remains largely unknown. Geranylgeranyltransferase I (GGT) is a prenyltransferase that is responsible for lipid modification of several signaling proteins, such as Rho family small GTPase Rac1, which has been shown to be involved in neuronal morphogenesis. Here we show that GGT plays an important role in dendritic development of PCs.</p> <p>Results</p> <p>We found that GGT was abundantly expressed in the developing rat cerebellum, in particular molecular layer (ML), the region enriched with PC dendrites. Inhibition or down-regulation of GGT using small interference RNA (siRNA) inhibited dendritic development of PCs. In contrast, up-regulation of GGT promoted dendritic arborization of PCs. Furthermore, neuronal depolarization induced by high K<sup>+ </sup>or treatment with brain-derived neurotrophic factor (BDNF) promoted membrane association of Rac1 and dendritic development of PCs in cultured cerebellar slices. The effect of BDNF or high K<sup>+ </sup>was inhibited by inhibition or down-regulation of GGT.</p> <p>Conclusion</p> <p>Our results indicate that GGT plays an important role in Purkinje cell development, and suggest a novel role of GGT in neuronal morphogenesis <it>in vivo</it>.</p
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
Pre-training on large-scale video data has become a common recipe for
learning transferable spatiotemporal representations in recent years. Despite
some progress, existing methods are mostly limited to highly curated datasets
(e.g., K400) and exhibit unsatisfactory out-of-the-box representations. We
argue that it is due to the fact that they only capture pixel-level knowledge
rather than spatiotemporal commonsense, which is far away from cognition-level
video understanding. Inspired by the great success of image-text pre-training
(e.g., CLIP), we take the first step to exploit language semantics to boost
transferable spatiotemporal representation learning. We introduce a new pretext
task, Turning to Video for Transcript Sorting (TVTS), which sorts shuffled ASR
scripts by attending to learned video representations. We do not rely on
descriptive captions and learn purely from video, i.e., leveraging the natural
transcribed speech knowledge to provide noisy but useful semantics over time.
Furthermore, rather than the simple concept learning in vision-caption
contrast, we encourage cognition-level temporal commonsense reasoning via
narrative reorganization. The advantages enable our model to contextualize what
is happening like human beings and seamlessly apply to large-scale uncurated
video data in the real world. Note that our method differs from ones designed
for video-text alignment (e.g., Frozen) and multimodal representation learning
(e.g., Merlot). Our method demonstrates strong out-of-the-box spatiotemporal
representations on diverse video benchmarks, e.g., +13.6% gains over VideoMAE
on SSV2 via linear probing
Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models
Recent progress on vision-language foundation models have brought significant
advancement to building general-purpose robots. By using the pre-trained models
to encode the scene and instructions as inputs for decision making, the
instruction-conditioned policy can generalize across different objects and
tasks. While this is encouraging, the policy still fails in most cases given an
unseen task or environment. To adapt the policy to unseen tasks and
environments, we explore a new paradigm on leveraging the pre-trained
foundation models with Self-PLAY and Self-Describe (SPLAYD). When deploying the
trained policy to a new task or a new environment, we first let the policy
self-play with randomly generated instructions to record the demonstrations.
While the execution could be wrong, we can use the pre-trained foundation
models to accurately self-describe (i.e., re-label or classify) the
demonstrations. This automatically provides new pairs of
demonstration-instruction data for policy fine-tuning. We evaluate our method
on a broad range of experiments with the focus on generalization on unseen
objects, unseen tasks, unseen environments, and sim-to-real transfer. We show
SPLAYD improves baselines by a large margin in all cases. Our project page is
available at https://geyuying.github.io/SPLAYD/Comment: Project page: https://geyuying.github.io/SPLAYD
Cached Transformers: Improving Transformers with Differentiable Memory Cache
This work introduces a new Transformer model called Cached Transformer, which
uses Gated Recurrent Cached (GRC) attention to extend the self-attention
mechanism with a differentiable memory cache of tokens. GRC attention enables
attending to both past and current tokens, increasing the receptive field of
attention and allowing for exploring long-range dependencies. By utilizing a
recurrent gating unit to continuously update the cache, our model achieves
significant advancements in \textbf{six} language and vision tasks, including
language modeling, machine translation, ListOPs, image classification, object
detection, and instance segmentation. Furthermore, our approach surpasses
previous memory-based techniques in tasks such as language modeling and
displays the ability to be applied to a broader range of situations.Comment: AAAI 202
Advancing Vision Transformers with Group-Mix Attention
Vision Transformers (ViTs) have been shown to enhance visual recognition
through modeling long-range dependencies with multi-head self-attention (MHSA),
which is typically formulated as Query-Key-Value computation. However, the
attention map generated from the Query and Key captures only token-to-token
correlations at one single granularity. In this paper, we argue that
self-attention should have a more comprehensive mechanism to capture
correlations among tokens and groups (i.e., multiple adjacent tokens) for
higher representational capacity. Thereby, we propose Group-Mix Attention (GMA)
as an advanced replacement for traditional self-attention, which can
simultaneously capture token-to-token, token-to-group, and group-to-group
correlations with various group sizes. To this end, GMA splits the Query, Key,
and Value into segments uniformly and performs different group aggregations to
generate group proxies. The attention map is computed based on the mixtures of
tokens and group proxies and used to re-combine the tokens and groups in Value.
Based on GMA, we introduce a powerful backbone, namely GroupMixFormer, which
achieves state-of-the-art performance in image classification, object
detection, and semantic segmentation with fewer parameters than existing
models. For instance, GroupMixFormer-L (with 70.3M parameters and 384^2 input)
attains 86.2% Top-1 accuracy on ImageNet-1K without external data, while
GroupMixFormer-B (with 45.8M parameters) attains 51.2% mIoU on ADE20K
-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
Foundation models have achieved great advances in multi-task learning with a
unified interface of unimodal and multimodal tasks. However, the potential of
such multi-task learners has not been exploited during transfer learning. In
this work, we present a universal parameter-efficient transfer learning method,
termed Predict-Interpolate Tuning (-Tuning), for vision, language, and
vision-language tasks. It aggregates the parameters of lightweight
task-specific experts learned from similar tasks to aid the target downstream
task. The task similarities are predicted in a unified modality-independent
space, yielding a scalable graph to demonstrate task relationships.
-Tuning has several appealing benefits. First, it flexibly explores both
intra- and inter-modal transferability between similar tasks to improve the
accuracy and robustness of transfer learning, especially in data-scarce
scenarios. Second, it offers a systematical solution for transfer learning with
multi-task prediction-and-then-interpolation, compatible with diverse types of
parameter-efficient experts, such as prompt and adapter. Third, an extensive
study of task-level mutual benefits on 14 unimodal and 6 multimodal datasets
shows that -Tuning surpasses fine-tuning and other parameter-efficient
transfer learning methods both in full-shot and low-shot regimes. The task
graph also enables an in-depth interpretable analysis of task transferability
across modalities.Comment: To appear in ICML 202
Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space
This paper addresses an important problem of ranking the pre-trained deep
neural networks and screening the most transferable ones for downstream tasks.
It is challenging because the ground-truth model ranking for each task can only
be generated by fine-tuning the pre-trained models on the target dataset, which
is brute-force and computationally expensive. Recent advanced methods proposed
several lightweight transferability metrics to predict the fine-tuning results.
However, these approaches only capture static representations but neglect the
fine-tuning dynamics. To this end, this paper proposes a new transferability
metric, called \textbf{S}elf-challenging \textbf{F}isher \textbf{D}iscriminant
\textbf{A}nalysis (\textbf{SFDA}), which has many appealing benefits that
existing works do not have. First, SFDA can embed the static features into a
Fisher space and refine them for better separability between classes. Second,
SFDA uses a self-challenging mechanism to encourage different pre-trained
models to differentiate on hard examples. Third, SFDA can easily select
multiple pre-trained models for the model ensemble. Extensive experiments on
pre-trained models of downstream tasks show that SFDA is efficient,
effective, and robust when measuring the transferability of pre-trained models.
For instance, compared with the state-of-the-art method NLEEP, SFDA
demonstrates an average of \% gain while bringing x speedup in
wall-clock time. The code will be available at
\url{https://github.com/TencentARC/SFDA}.Comment: ECCV 2022 camera ready. 24 pages, 11 tables, 5 figure
- …