46 research outputs found

    Empowering CAM-Based Methods with Capability to Generate Fine-Grained and High-Faithfulness Explanations

    Full text link
    Recently, the explanation of neural network models has garnered considerable research attention. In computer vision, CAM (Class Activation Map)-based methods and LRP (Layer-wise Relevance Propagation) method are two common explanation methods. However, since most CAM-based methods can only generate global weights, they can only generate coarse-grained explanations at a deep layer. LRP and its variants, on the other hand, can generate fine-grained explanations. But the faithfulness of the explanations is too low. To address these challenges, in this paper, we propose FG-CAM (Fine-Grained CAM), which extends CAM-based methods to enable generating fine-grained and high-faithfulness explanations. FG-CAM uses the relationship between two adjacent layers of feature maps with resolution differences to gradually increase the explanation resolution, while finding the contributing pixels and filtering out the pixels that do not contribute. Our method not only solves the shortcoming of CAM-based methods without changing their characteristics, but also generates fine-grained explanations that have higher faithfulness than LRP and its variants. We also present FG-CAM with denoising, which is a variant of FG-CAM and is able to generate less noisy explanations with almost no change in explanation faithfulness. Experimental results show that the performance of FG-CAM is almost unaffected by the explanation resolution. FG-CAM outperforms existing CAM-based methods significantly in both shallow and intermediate layers, and outperforms LRP and its variants significantly in the input layer. Our code is available at https://github.com/dongmo-qcq/FG-CAM.Comment: This paper has been accepted by AAAI202

    Contextual Dictionary Lookup for Knowledge Graph Completion

    Full text link
    Knowledge graph completion (KGC) aims to solve the incompleteness of knowledge graphs (KGs) by predicting missing links from known triples, numbers of knowledge graph embedding (KGE) models have been proposed to perform KGC by learning embeddings. Nevertheless, most existing embedding models map each relation into a unique vector, overlooking the specific fine-grained semantics of them under different entities. Additionally, the few available fine-grained semantic models rely on clustering algorithms, resulting in limited performance and applicability due to the cumbersome two-stage training process. In this paper, we present a novel method utilizing contextual dictionary lookup, enabling conventional embedding models to learn fine-grained semantics of relations in an end-to-end manner. More specifically, we represent each relation using a dictionary that contains multiple latent semantics. The composition of a given entity and the dictionary's central semantics serves as the context for generating a lookup, thus determining the fine-grained semantics of the relation adaptively. The proposed loss function optimizes both the central and fine-grained semantics simultaneously to ensure their semantic consistency. Besides, we introduce two metrics to assess the validity and accuracy of the dictionary lookup operation. We extend several KGE models with the method, resulting in substantial performance improvements on widely-used benchmark datasets

    3D-VLA: A 3D Vision-Language-Action Generative World Model

    Full text link
    Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world. Furthermore, they perform action prediction by learning a direct mapping from perception to action, neglecting the vast dynamics of the world and the relations between actions and dynamics. In contrast, human beings are endowed with world models that depict imagination about future scenarios to plan actions accordingly. To this end, we propose 3D-VLA by introducing a new family of embodied foundation models that seamlessly link 3D perception, reasoning, and action through a generative world model. Specifically, 3D-VLA is built on top of a 3D-based large language model (LLM), and a set of interaction tokens is introduced to engage with the embodied environment. Furthermore, to inject generation abilities into the model, we train a series of embodied diffusion models and align them into the LLM for predicting the goal images and point clouds. To train our 3D-VLA, we curate a large-scale 3D embodied instruction dataset by extracting vast 3D-related information from existing robotics datasets. Our experiments on held-in datasets demonstrate that 3D-VLA significantly improves the reasoning, multimodal generation, and planning capabilities in embodied environments, showcasing its potential in real-world applications.Comment: Project page: https://vis-www.cs.umass.edu/3dvla

    DIAMOND: Taming Sample and Communication Complexities in Decentralized Bilevel Optimization

    Full text link
    Decentralized bilevel optimization has received increasing attention recently due to its foundational role in many emerging multi-agent learning paradigms (e.g., multi-agent meta-learning and multi-agent reinforcement learning) over peer-to-peer edge networks. However, to work with the limited computation and communication capabilities of edge networks, a major challenge in developing decentralized bilevel optimization techniques is to lower sample and communication complexities. This motivates us to develop a new decentralized bilevel optimization called DIAMOND (decentralized single-timescale stochastic approximation with momentum and gradient-tracking). The contributions of this paper are as follows: i) our DIAMOND algorithm adopts a single-loop structure rather than following the natural double-loop structure of bilevel optimization, which offers low computation and implementation complexity; ii) compared to existing approaches, the DIAMOND algorithm does not require any full gradient evaluations, which further reduces both sample and computational complexities; iii) through a careful integration of momentum information and gradient tracking techniques, we show that the DIAMOND algorithm enjoys O(ϔ−3/2)\mathcal{O}(\epsilon^{-3/2}) in sample and communication complexities for achieving an Ï”\epsilon-stationary solution, both of which are independent of the dataset sizes and significantly outperform existing works. Extensive experiments also verify our theoretical findings

    Regulation of hepatic autophagy by stress‐sensing transcription factor CREBH

    Full text link
    Autophagy, a lysosomal degradative pathway in response to nutrient limitation, plays an important regulatory role in lipid homeostasis upon energy demands. Here, we demonstrated that the endoplasmic reticulum–tethered, stress‐sensing transcription factor cAMP‐responsive element‐binding protein, hepatic‐specific (CREBH) functions as a major transcriptional regulator of hepatic autophagy and lysosomal biogenesis in response to nutritional or circadian signals. CREBH deficiency led to decreased hepatic autophagic activities and increased hepatic lipid accumulation upon starvation. Under unfed or during energy‐demanding phases of the circadian cycle, CREBH is activated to drive expression of the genes encoding the key enzymes or regulators in autophagosome formation or autophagic process, including microtubule‐associated protein IB‐light chain 3, autophagy‐related protein (ATG)7, ATG2b, and autophagosome formation Unc‐51 like kinase 1, and the genes encoding functions in lysosomal biogenesis and homeostasis. Upon nutrient starvation, CREBH regulates and interacts with peroxisome proliferator–activated receptor α (PPARα) and PPARÎł coactivator 1α to synergistically drive expression of the key autophagy genes and transcription factor EB, a master regulator of lysosomal biogenesis. Furthermore, CREBH regulates rhythmic expression of the key autophagy genes in the liver in a circadian‐dependent manner. In summary, we identified CREBH as a key transcriptional regulator of hepatic autophagy and lysosomal biogenesis for the purpose of maintaining hepatic lipid homeostasis under nutritional stress or circadian oscillation.—Kim, H., Williams, D., Qiu, Y., Song, Z., Yang, Z., Kimler, V., Goldberg, A., Zhang, R., Yang, Z., Chen, X., Wang, L., Fang, D., Lin, J. D., Zhang, K. Regulation of hepatic autophagy by stress‐sensing transcription factor CREBH. FASEB J. 33, 7896–7914 (2019). www.fasebj.orgPeer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/154423/1/fsb2fj201802528r-sup-0001.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154423/2/fsb2fj201802528r.pd

    Harnessing accurate mitochondrial DNA base editing mediated by DdCBEs in a predictable manner

    Get PDF
    Introduction: Mitochondrial diseases caused by mtDNA have no effective cures. Recently developed DddA-derived cytosine base editors (DdCBEs) have potential therapeutic implications in rescuing the mtDNA mutations. However, the performance of DdCBEs relies on designing different targets or improving combinations of split-DddA halves and orientations, lacking knowledge of predicting the results before its application.Methods: A series of DdCBE pairs for wide ranges of aC or tC targets was constructed, and transfected into Neuro-2a cells. The mutation rate of targets was compared to figure out the potential editing rules.Results: It is found that DdCBEs mediated mtDNA editing is predictable: 1) aC targets have a concentrated editing window for mtDNA editing in comparison with tC targets, which at 5’C8-11 (G1333) and 5’C10-13 (G1397) for aC target, while 5’C4-13 (G1333) and 5’C5-14 (G1397) for tC target with 16bp spacer. 2) G1333 mediated C>T conversion at aC targets in DddA-half-specific manner, while G1333 and G1397 mediated C>T conversion are DddA-half-prefer separately for tC and aC targets. 3) The nucleotide adjacent to the 3’ end of aC motif affects mtDNA editing. Finally, by the guidance of these rules, a cell model harboring a pathogenic mtDNA mutation was constructed with high efficiency and no bystander effects.Discussion: In summary, this discovery helps us conceive the optimal strategy for accurate mtDNA editing, avoiding time- and effort-consuming optimized screening jobs
    corecore