23 research outputs found

    SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval

    Full text link
    The user base of short video apps has experienced unprecedented growth in recent years, resulting in a significant demand for video content analysis. In particular, text-video retrieval, which aims to find the top matching videos given text descriptions from a vast video corpus, is an essential function, the primary challenge of which is to bridge the modality gap. Nevertheless, most existing approaches treat texts merely as discrete tokens and neglect their syntax structures. Moreover, the abundant spatial and temporal clues in videos are often underutilized due to the lack of interaction with text. To address these issues, we argue that using texts as guidance to focus on relevant temporal frames and spatial regions within videos is beneficial. In this paper, we propose a novel Syntax-Hierarchy-Enhanced text-video retrieval method (SHE-Net) that exploits the inherent semantic and syntax hierarchy of texts to bridge the modality gap from two perspectives. First, to facilitate a more fine-grained integration of visual content, we employ the text syntax hierarchy, which reveals the grammatical structure of text descriptions, to guide the visual representations. Second, to further enhance the multi-modal interaction and alignment, we also utilize the syntax hierarchy to guide the similarity calculation. We evaluated our method on four public text-video retrieval datasets of MSR-VTT, MSVD, DiDeMo, and ActivityNet. The experimental results and ablation studies confirm the advantages of our proposed method

    EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

    Full text link
    Motivated by the superior performance of image diffusion models, more and more researchers strive to extend these models to the text-based video editing task. Nevertheless, current video editing tasks mainly suffer from the dilemma between the high fine-tuning cost and the limited generation capacity. Compared with images, we conjecture that videos necessitate more constraints to preserve the temporal consistency during editing. Towards this end, we propose EVE, a robust and efficient zero-shot video editing method. Under the guidance of depth maps and temporal consistency constraints, EVE derives satisfactory video editing results with an affordable computational and time cost. Moreover, recognizing the absence of a publicly available video editing dataset for fair comparisons, we construct a new benchmark ZVE-50 dataset. Through comprehensive experimentation, we validate that EVE could achieve a satisfactory trade-off between performance and efficiency. We will release our dataset and codebase to facilitate future researchers

    M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

    Full text link
    Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence. Nevertheless, VLM models supporting multi-language, e.g., in both Chinese and English, have lagged due to the relative scarcity of large-scale pretraining datasets. Toward this end, we introduce a comprehensive bilingual (Chinese-English) dataset BM-6B with over 6 billion image-text pairs, aimed at enhancing multimodal foundation models to well understand images in both languages. To handle such a scale of dataset, we propose a novel grouped aggregation approach for image-text contrastive loss computation, which reduces the communication overhead and GPU memory demands significantly, facilitating a 60% increase in training speed. We pretrain a series of bilingual image-text foundation models with an enhanced fine-grained understanding ability on BM-6B, the resulting models, dubbed as M2M^2-Encoders (pronounced "M-Square"), set new benchmarks in both languages for multimodal retrieval and classification tasks. Notably, Our largest M2M^2-Encoder-10B model has achieved top-1 accuracies of 88.5% on ImageNet and 80.7% on ImageNet-CN under a zero-shot classification setting, surpassing previously reported SoTA methods by 2.2% and 21.1%, respectively. The M2M^2-Encoder series represents one of the most comprehensive bilingual image-text foundation models to date, so we are making it available to the research community for further exploration and development

    Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning

    Full text link
    In recent years, the explosion of web videos makes text-video retrieval increasingly essential and popular for video filtering, recommendation, and search. Text-video retrieval aims to rank relevant text/video higher than irrelevant ones. The core of this task is to precisely measure the cross-modal similarity between texts and videos. Recently, contrastive learning methods have shown promising results for text-video retrieval, most of which focus on the construction of positive and negative pairs to learn text and video representations. Nevertheless, they do not pay enough attention to hard negative pairs and lack the ability to model different levels of semantic similarity. To address these two issues, this paper improves contrastive learning using two novel techniques. First, to exploit hard examples for robust discriminative power, we propose a novel Dual-Modal Attention-Enhanced Module (DMAE) to mine hard negative pairs from textual and visual clues. By further introducing a Negative-aware InfoNCE (NegNCE) loss, we are able to adaptively identify all these hard negatives and explicitly highlight their impacts in the training loss. Second, our work argues that triplet samples can better model fine-grained semantic similarity compared to pairwise samples. We thereby present a new Triplet Partial Margin Contrastive Learning (TPM-CL) module to construct partial order triplet samples by automatically generating fine-grained hard negatives for matched text-video pairs. The proposed TPM-CL designs an adaptive token masking strategy with cross-modal interaction to model subtle semantic differences. Extensive experiments demonstrate that the proposed approach outperforms existing methods on four widely-used text-video retrieval datasets, including MSR-VTT, MSVD, DiDeMo and ActivityNet.Comment: Accepted by ACM MM 202

    Inhibition of HDAC activity directly reprograms murine embryonic stem cells to trophoblast stem cells

    Get PDF
    Embryonic stem cells (ESCs) can differentiate into all cell types of the embryonic germ layers. ESCs can also generate totipotent 2C-like cells and trophectodermal cells. However, these latter transitions occur at low frequency due to epigenetic barriers, the nature of which is not fully understood. Here, we show that treating mouse ESCs with sodium butyrate (NaB) increases the population of 2C-like cells and enables direct reprogramming of ESCs into trophoblast stem cells (TSCs) without a transition through a 2C-like state. Mechanistically, NaB inhibits histone deacetylase activities in the LSD1-HDAC1/2 corepressor complex. This increases acetylation levels in the regulatory regions of both 2C- and TSC-specific genes, promoting their expression. In addition, NaB-treated cells acquire the capacity to generate blastocyst-like structures that can develop beyond the implantation stage in vitro and form deciduae in vivo. These results identify how epigenetics restrict the totipotent and trophectoderm fate in mouse ESCs.</p

    Uncovering the Functional Link Between SHANK3 Deletions and Deficiency in Neurodevelopment Using iPSC-Derived Human Neurons

    Get PDF
    SHANK3 mutations, including de novo deletions, have been associated with autism spectrum disorders (ASD). However, the effects of SHANK3 loss of function on neurodevelopment remain poorly understood. Here we generated human induced pluripotent stem cells (iPSC) in vitro, followed by neuro-differentiation and lentivirus-mediated shRNA expression to evaluate how SHANK3 knockdown affects the in vitro neurodevelopmental process at multiple time points (up to 4 weeks). We found that SHANK3 knockdown impaired both early stage of neuronal development and mature neuronal function, as demonstrated by a reduction in neuronal soma size, growth cone area, neurite length and branch numbers. Notably, electrophysiology analyses showed defects in excitatory and inhibitory synaptic transmission. Furthermore, transcriptome analyses revealed that multiple biological pathways related to neuron projection, motility and regulation of neurogenesis were disrupted in cells with SHANK3 knockdown. In conclusion, utilizing a human iPSC-based neural induction model, this study presented combined morphological, electrophysiological and transcription evidence that support that SHANK3 as an intrinsic, cell autonomous factor that controls cellular function development in human neurons

    Monasone Naphthoquinone Biosynthesis and Resistance in Monascus Fungi

    No full text
    The genes for Monascus naphthoquinone (monasone) biosynthesis are embedded in and form a composite supercluster with the Monascus azaphilone pigment biosynthetic gene cluster. Early biosynthetic intermediates are shared by the two pathways. Some enzymes encoded by the supercluster play double duty in contributing to both pathways, while others are specific for one or the other pathway. The monasone subcluster is independently regulated and inducible by elicitation with competing microorganisms. This study illustrates genomic and biosynthetic parsimony in fungi and proposes a potential path for the evolution of the mosaic-like azaphilone-naphthoquinone supercluster. The monasone subcluster also encodes a two-tiered self-resistance mechanism that models resistance determinants that may transfer to target microorganisms or emerge in cancer cells in case of naphthoquinone-type cytotoxic agents.Despite the important biological activities of natural product naphthoquinones, the biosynthetic pathways of and resistance mechanisms against such compounds remain poorly understood in fungi. Here, we report that the genes responsible for the biosynthesis of Monascus naphthoquinones (monasones) reside within the gene cluster for Monascus azaphilone pigments (MonAzPs). We elucidate the biosynthetic pathway of monasones by a combination of comparative genome analysis, gene knockouts, heterologous coexpression, and in vivo and in vitro enzymatic reactions to show that this pathway branches from the first polyketide intermediate of MonAzPs. Furthermore, we propose that the monasone subset of biosynthetic genes also encodes a two-tiered resistance strategy in which an inducible monasone-specific exporter expels monasones from the mycelia, while residual intracellular monasones may be rendered nontoxic through a multistep reduction cascade

    Monasone Naphthoquinone Biosynthesis and Resistance in Monascus Fungi

    No full text
    Despite the important biological activities of natural product naphthoquinones, the biosynthetic pathways of and resistance mechanisms against such compounds remain poorly understood in fungi. Here, we report that the genes responsible for the biosynthesis of Monascus naphthoquinones (monasones) reside within the gene cluster for Monascus azaphilone pigments (MonAzPs). We elucidate the biosynthetic pathway of monasones by a combination of comparative genome analysis, gene knockouts, heterologous coexpression, and in vivo and in vitro enzymatic reactions to show that this pathway branches from the first polyketide intermediate of MonAzPs. Furthermore, we propose that the monasone subset of biosynthetic genes also encodes a two-tiered resistance strategy in which an inducible monasone-specific exporter expels monasones from the mycelia, while residual intracellular monasones may be rendered nontoxic through a multistep reduction cascade.IMPORTANCE The genes for Monascus naphthoquinone (monasone) biosynthesis are embedded in and form a composite supercluster with the Monascus azaphilone pigment biosynthetic gene cluster. Early biosynthetic intermediates are shared by the two pathways. Some enzymes encoded by the supercluster play double duty in contributing to both pathways, while others are specific for one or the other pathway. The monasone subcluster is independently regulated and inducible by elicitation with competing microorganisms. This study illustrates genomic and biosynthetic parsimony in fungi and proposes a potential path for the evolution of the mosaic-like azaphilone-naphthoquinone supercluster. The monasone subcluster also encodes a two-tiered self-resistance mechanism that models resistance determinants that may transfer to target microorganisms or emerge in cancer cells in case of naphthoquinone-type cytotoxic agents.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
    corecore