81 research outputs found

    A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

    Full text link
    Body language (BL) refers to the non-verbal communication expressed through physical movements, gestures, facial expressions, and postures. It is a form of communication that conveys information, emotions, attitudes, and intentions without the use of spoken or written words. It plays a crucial role in interpersonal interactions and can complement or even override verbal communication. Deep multi-modal learning techniques have shown promise in understanding and analyzing these diverse aspects of BL. The survey emphasizes their applications to BL generation and recognition. Several common BLs are considered i.e., Sign Language (SL), Cued Speech (CS), Co-speech (CoS), and Talking Head (TH), and we have conducted an analysis and established the connections among these four BL for the first time. Their generation and recognition often involve multi-modal approaches. Benchmark datasets for BL research are well collected and organized, along with the evaluation of SOTA methods on these datasets. The survey highlights challenges such as limited labeled data, multi-modal learning, and the need for domain adaptation to generalize models to unseen speakers or languages. Future research directions are presented, including exploring self-supervised learning techniques, integrating contextual information from other modalities, and exploiting large-scale pre-trained multi-modal models. In summary, this survey paper provides a comprehensive understanding of deep multi-modal learning for various BL generations and recognitions for the first time. By analyzing advancements, challenges, and future directions, it serves as a valuable resource for researchers and practitioners in advancing this field. n addition, we maintain a continuously updated paper list for deep multi-modal learning for BL recognition and generation: https://github.com/wentaoL86/awesome-body-language

    Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis

    Full text link
    This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries. We present two key innovations: Vision Guidance and the Layered Rendering Diffusion (LRDiff) framework. Vision Guidance, a spatial layout condition, acts as a clue in the perturbed distribution, greatly narrowing down the search space, to focus on the image sampling process adhering to the spatial layout condition. The LRDiff framework constructs an image-rendering process with multiple layers, each of which applies the vision guidance to instructively estimate the denoising direction for a single object. Such a layered rendering strategy effectively prevents issues like unintended conceptual blending or mismatches, while allowing for more coherent and contextually accurate image synthesis. The proposed method provides a more efficient and accurate means of synthesising images that align with specific spatial and contextual requirements. We demonstrate through our experiments that our method provides better results than existing techniques both quantitatively and qualitatively. We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing

    Diversification of importin-α isoforms in cellular trafficking and disease states.

    Get PDF
    The human genome encodes seven isoforms of importin α which are grouped into three subfamilies known as α1, α2 and α3. All isoforms share a fundamentally conserved architecture that consists of an N-terminal, autoinhibitory, importin-β-binding (IBB) domain and a C-terminal Arm (Armadillo)-core that associates with nuclear localization signal (NLS) cargoes. Despite striking similarity in amino acid sequence and 3D structure, importin-α isoforms display remarkable substrate specificity in vivo. In the present review, we look at key differences among importin-α isoforms and provide a comprehensive inventory of known viral and cellular cargoes that have been shown to associate preferentially with specific isoforms. We illustrate how the diversification of the adaptor importin α into seven isoforms expands the dynamic range and regulatory control of nucleocytoplasmic transport, offering unexpected opportunities for pharmacological intervention. The emerging view of importin α is that of a key signalling molecule, with isoforms that confer preferential nuclear entry and spatiotemporal specificity on viral and cellular cargoes directly linked to human diseases

    NMDA Receptors on Non-Dopaminergic Neurons in the VTA Support Cocaine Sensitization

    Get PDF
    The initiation of behavioral sensitization to cocaine and other psychomotor stimulants is thought to reflect N-methyl-D-aspartate receptor (NMDAR)-mediated synaptic plasticity in the mesolimbic dopamine (DA) circuitry. The importance of drug induced NMDAR mediated adaptations in ventral tegmental area (VTA) DA neurons, and its association with drug seeking behaviors, has recently been evaluated in Cre-loxp mice lacking functional NMDARs in DA neurons expressing Cre recombinase under the control of the endogenous dopamine transporter gene (NR1(DATCre) mice).Using an additional NR1(DATCre) mouse transgenic model, we demonstrate that while the selective inactivation of NMDARs in DA neurons eliminates the induction of molecular changes leading to synaptic strengthening, behavioral measures such as cocaine induced locomotor sensitization and conditioned place preference remain intact in NR1(DATCre) mice. Since VTA DA neurons projecting to the prefrontal cortex and amygdala express little or no detectable levels of the dopamine transporter, it has been speculated that NMDA receptors in DA neurons projecting to these brain areas may have been spared in NR1(DATCre) mice. Here we demonstrate that the NMDA receptor gene is ablated in the majority of VTA DA neurons, including those exhibiting undetectable DAT expression levels in our NR1(DATCre) transgenic model, and that application of an NMDAR antagonist within the VTA of NR1(DATCre) animals still blocks sensitization to cocaine.These results eliminate the possibility of NMDAR mediated neuroplasticity in the different DA neuronal subpopulations in our NR1(DATCre) mouse model and therefore suggest that NMDARs on non-DA neurons within the VTA must play a major role in cocaine-related addictive behavior

    Mitochondrial Localized STAT3 Is Involved in NGF Induced Neurite Outgrowth

    Get PDF
    Background: Signal transducer and activator of transcription 3 (STAT3) plays critical roles in neural development and is increasingly recognized as a major mediator of injury response in the nervous system. Cytokines and growth factors are known to phosphorylate STAT3 at tyrosine 705 with or without the concomitant phosphorylation at serine 727, resulting in the nuclear localization of STAT3 and subsequent transcriptional activation of genes. Recent evidence suggests that STAT3 may control cell function via alternative mechanisms independent of its transcriptional activity. Currently, the involvement of STAT3 mono-phosphorylated at residue serine 727 (P-Ser-STAT3) in neurite outgrowth and the underlying mechanism is largely unknown. Principal Findings: In this study, we investigated the role of nerve growth factor (NGF) induced P-Ser-STAT3 in mediating neurite outgrowth. NGF induced the phosphorylation of residue serine 727 but not tyrosine 705 of STAT3 in PC12 and primary cortical neuronal cells. In PC12 cells, serine but not tyrosine dominant negative mutant of STAT3 was found to impair NGF induced neurite outgrowth. Unexpectedly, NGF induced P-Ser-STAT3 was localized to the mitochondria but not in the nucleus. Mitochondrial STAT3 was further found to be intimately involved in NGF induced neurite outgrowth and the production of reactive oxygen species (ROS). Conclusion: Taken together, the findings herein demonstrated a hitherto unrecognized novel transcription independen

    PatFig: Generating Short and Long Captions for Patent Figures

    No full text
    International audienceThis paper introduces Qatent PatFig, a novel large-scale patent figure dataset comprising 30,000+ patent figures from over 11,000 European patent applications. For each figure, this dataset provides short and long captions, reference numerals, their corresponding terms, and the minimal claim set that describes the interactions between the components of the image. To assess the usability of the dataset, we finetune an LVLM model on Qatent PatFig to generate short and long descriptions, and we investigate the effects of incorporating various text-based cues at the prediction stage of the patent figure captioning process

    PatFig: Generating Short and Long Captions for Patent Figures

    No full text
    International audienceThis paper introduces Qatent PatFig, a novel large-scale patent figure dataset comprising 30,000+ patent figures from over 11,000 European patent applications. For each figure, this dataset provides short and long captions, reference numerals, their corresponding terms, and the minimal claim set that describes the interactions between the components of the image. To assess the usability of the dataset, we finetune an LVLM model on Qatent PatFig to generate short and long descriptions, and we investigate the effects of incorporating various text-based cues at the prediction stage of the patent figure captioning process
    corecore