23 research outputs found
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
The user base of short video apps has experienced unprecedented growth in
recent years, resulting in a significant demand for video content analysis. In
particular, text-video retrieval, which aims to find the top matching videos
given text descriptions from a vast video corpus, is an essential function, the
primary challenge of which is to bridge the modality gap. Nevertheless, most
existing approaches treat texts merely as discrete tokens and neglect their
syntax structures. Moreover, the abundant spatial and temporal clues in videos
are often underutilized due to the lack of interaction with text. To address
these issues, we argue that using texts as guidance to focus on relevant
temporal frames and spatial regions within videos is beneficial. In this paper,
we propose a novel Syntax-Hierarchy-Enhanced text-video retrieval method
(SHE-Net) that exploits the inherent semantic and syntax hierarchy of texts to
bridge the modality gap from two perspectives. First, to facilitate a more
fine-grained integration of visual content, we employ the text syntax
hierarchy, which reveals the grammatical structure of text descriptions, to
guide the visual representations. Second, to further enhance the multi-modal
interaction and alignment, we also utilize the syntax hierarchy to guide the
similarity calculation. We evaluated our method on four public text-video
retrieval datasets of MSR-VTT, MSVD, DiDeMo, and ActivityNet. The experimental
results and ablation studies confirm the advantages of our proposed method
EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints
Motivated by the superior performance of image diffusion models, more and
more researchers strive to extend these models to the text-based video editing
task. Nevertheless, current video editing tasks mainly suffer from the dilemma
between the high fine-tuning cost and the limited generation capacity. Compared
with images, we conjecture that videos necessitate more constraints to preserve
the temporal consistency during editing. Towards this end, we propose EVE, a
robust and efficient zero-shot video editing method. Under the guidance of
depth maps and temporal consistency constraints, EVE derives satisfactory video
editing results with an affordable computational and time cost. Moreover,
recognizing the absence of a publicly available video editing dataset for fair
comparisons, we construct a new benchmark ZVE-50 dataset. Through comprehensive
experimentation, we validate that EVE could achieve a satisfactory trade-off
between performance and efficiency. We will release our dataset and codebase to
facilitate future researchers
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining
Vision-language foundation models like CLIP have revolutionized the field of
artificial intelligence. Nevertheless, VLM models supporting multi-language,
e.g., in both Chinese and English, have lagged due to the relative scarcity of
large-scale pretraining datasets. Toward this end, we introduce a comprehensive
bilingual (Chinese-English) dataset BM-6B with over 6 billion image-text pairs,
aimed at enhancing multimodal foundation models to well understand images in
both languages. To handle such a scale of dataset, we propose a novel grouped
aggregation approach for image-text contrastive loss computation, which reduces
the communication overhead and GPU memory demands significantly, facilitating a
60% increase in training speed. We pretrain a series of bilingual image-text
foundation models with an enhanced fine-grained understanding ability on BM-6B,
the resulting models, dubbed as -Encoders (pronounced "M-Square"), set new
benchmarks in both languages for multimodal retrieval and classification tasks.
Notably, Our largest -Encoder-10B model has achieved top-1 accuracies of
88.5% on ImageNet and 80.7% on ImageNet-CN under a zero-shot classification
setting, surpassing previously reported SoTA methods by 2.2% and 21.1%,
respectively. The -Encoder series represents one of the most comprehensive
bilingual image-text foundation models to date, so we are making it available
to the research community for further exploration and development
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
In recent years, the explosion of web videos makes text-video retrieval
increasingly essential and popular for video filtering, recommendation, and
search. Text-video retrieval aims to rank relevant text/video higher than
irrelevant ones. The core of this task is to precisely measure the cross-modal
similarity between texts and videos. Recently, contrastive learning methods
have shown promising results for text-video retrieval, most of which focus on
the construction of positive and negative pairs to learn text and video
representations. Nevertheless, they do not pay enough attention to hard
negative pairs and lack the ability to model different levels of semantic
similarity. To address these two issues, this paper improves contrastive
learning using two novel techniques. First, to exploit hard examples for robust
discriminative power, we propose a novel Dual-Modal Attention-Enhanced Module
(DMAE) to mine hard negative pairs from textual and visual clues. By further
introducing a Negative-aware InfoNCE (NegNCE) loss, we are able to adaptively
identify all these hard negatives and explicitly highlight their impacts in the
training loss. Second, our work argues that triplet samples can better model
fine-grained semantic similarity compared to pairwise samples. We thereby
present a new Triplet Partial Margin Contrastive Learning (TPM-CL) module to
construct partial order triplet samples by automatically generating
fine-grained hard negatives for matched text-video pairs. The proposed TPM-CL
designs an adaptive token masking strategy with cross-modal interaction to
model subtle semantic differences. Extensive experiments demonstrate that the
proposed approach outperforms existing methods on four widely-used text-video
retrieval datasets, including MSR-VTT, MSVD, DiDeMo and ActivityNet.Comment: Accepted by ACM MM 202
Inhibition of HDAC activity directly reprograms murine embryonic stem cells to trophoblast stem cells
Embryonic stem cells (ESCs) can differentiate into all cell types of the embryonic germ layers. ESCs can also generate totipotent 2C-like cells and trophectodermal cells. However, these latter transitions occur at low frequency due to epigenetic barriers, the nature of which is not fully understood. Here, we show that treating mouse ESCs with sodium butyrate (NaB) increases the population of 2C-like cells and enables direct reprogramming of ESCs into trophoblast stem cells (TSCs) without a transition through a 2C-like state. Mechanistically, NaB inhibits histone deacetylase activities in the LSD1-HDAC1/2 corepressor complex. This increases acetylation levels in the regulatory regions of both 2C- and TSC-specific genes, promoting their expression. In addition, NaB-treated cells acquire the capacity to generate blastocyst-like structures that can develop beyond the implantation stage in vitro and form deciduae in vivo. These results identify how epigenetics restrict the totipotent and trophectoderm fate in mouse ESCs.</p
Recommended from our members
Encroachment order and spatial patterns of broad-leaf tree species in the naturalization of a Cunninghamia lanceolata plantation
Authors of the article state that the aim of this study was to understand the encroachment order, spatial patterns, interspecific associations, and species diversity of a Cunninghamia lanceolata plantation and to provide context for how to improve the spatial structure of C. lanceolata plantations. Their results show that the encroachment of broad-leaf trees into the C. lanceolata plantation followed a clear successional sequence of tree community assembly: intolerant tree species encroachment first, such as Alniphyllum fortunei and Liquidambar formosana, encroached; neutral tree species then encroachment, such as Daphniphyllum oldhamii and Schima superba, and shade-tolerant tree species encroachment last, such as Castanopsis eyrei and Castanopsis tibetana
Uncovering the Functional Link Between SHANK3 Deletions and Deficiency in Neurodevelopment Using iPSC-Derived Human Neurons
SHANK3 mutations, including de novo deletions, have been associated with autism spectrum disorders (ASD). However, the effects of SHANK3 loss of function on neurodevelopment remain poorly understood. Here we generated human induced pluripotent stem cells (iPSC) in vitro, followed by neuro-differentiation and lentivirus-mediated shRNA expression to evaluate how SHANK3 knockdown affects the in vitro neurodevelopmental process at multiple time points (up to 4 weeks). We found that SHANK3 knockdown impaired both early stage of neuronal development and mature neuronal function, as demonstrated by a reduction in neuronal soma size, growth cone area, neurite length and branch numbers. Notably, electrophysiology analyses showed defects in excitatory and inhibitory synaptic transmission. Furthermore, transcriptome analyses revealed that multiple biological pathways related to neuron projection, motility and regulation of neurogenesis were disrupted in cells with SHANK3 knockdown. In conclusion, utilizing a human iPSC-based neural induction model, this study presented combined morphological, electrophysiological and transcription evidence that support that SHANK3 as an intrinsic, cell autonomous factor that controls cellular function development in human neurons
Monasone Naphthoquinone Biosynthesis and Resistance in Monascus Fungi
The genes for Monascus naphthoquinone (monasone) biosynthesis are embedded in and form a composite supercluster with the Monascus azaphilone pigment biosynthetic gene cluster. Early biosynthetic intermediates are shared by the two pathways. Some enzymes encoded by the supercluster play double duty in contributing to both pathways, while others are specific for one or the other pathway. The monasone subcluster is independently regulated and inducible by elicitation with competing microorganisms. This study illustrates genomic and biosynthetic parsimony in fungi and proposes a potential path for the evolution of the mosaic-like azaphilone-naphthoquinone supercluster. The monasone subcluster also encodes a two-tiered self-resistance mechanism that models resistance determinants that may transfer to target microorganisms or emerge in cancer cells in case of naphthoquinone-type cytotoxic agents.Despite the important biological activities of natural product naphthoquinones, the biosynthetic pathways of and resistance mechanisms against such compounds remain poorly understood in fungi. Here, we report that the genes responsible for the biosynthesis of Monascus naphthoquinones (monasones) reside within the gene cluster for Monascus azaphilone pigments (MonAzPs). We elucidate the biosynthetic pathway of monasones by a combination of comparative genome analysis, gene knockouts, heterologous coexpression, and in vivo and in vitro enzymatic reactions to show that this pathway branches from the first polyketide intermediate of MonAzPs. Furthermore, we propose that the monasone subset of biosynthetic genes also encodes a two-tiered resistance strategy in which an inducible monasone-specific exporter expels monasones from the mycelia, while residual intracellular monasones may be rendered nontoxic through a multistep reduction cascade
Monasone Naphthoquinone Biosynthesis and Resistance in Monascus Fungi
Despite the important biological activities of natural product naphthoquinones, the biosynthetic pathways of and resistance mechanisms against such compounds remain poorly understood in fungi. Here, we report that the genes responsible for the biosynthesis of Monascus naphthoquinones (monasones) reside within the gene cluster for Monascus azaphilone pigments (MonAzPs). We elucidate the biosynthetic pathway of monasones by a combination of comparative genome analysis, gene knockouts, heterologous coexpression, and in vivo and in vitro enzymatic reactions to show that this pathway branches from the first polyketide intermediate of MonAzPs. Furthermore, we propose that the monasone subset of biosynthetic genes also encodes a two-tiered resistance strategy in which an inducible monasone-specific exporter expels monasones from the mycelia, while residual intracellular monasones may be rendered nontoxic through a multistep reduction cascade.IMPORTANCE The genes for Monascus naphthoquinone (monasone) biosynthesis are embedded in and form a composite supercluster with the Monascus azaphilone pigment biosynthetic gene cluster. Early biosynthetic intermediates are shared by the two pathways. Some enzymes encoded by the supercluster play double duty in contributing to both pathways, while others are specific for one or the other pathway. The monasone subcluster is independently regulated and inducible by elicitation with competing microorganisms. This study illustrates genomic and biosynthetic parsimony in fungi and proposes a potential path for the evolution of the mosaic-like azaphilone-naphthoquinone supercluster. The monasone subcluster also encodes a two-tiered self-resistance mechanism that models resistance determinants that may transfer to target microorganisms or emerge in cancer cells in case of naphthoquinone-type cytotoxic agents.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]