46 research outputs found
Empowering CAM-Based Methods with Capability to Generate Fine-Grained and High-Faithfulness Explanations
Recently, the explanation of neural network models has garnered considerable
research attention. In computer vision, CAM (Class Activation Map)-based
methods and LRP (Layer-wise Relevance Propagation) method are two common
explanation methods. However, since most CAM-based methods can only generate
global weights, they can only generate coarse-grained explanations at a deep
layer. LRP and its variants, on the other hand, can generate fine-grained
explanations. But the faithfulness of the explanations is too low. To address
these challenges, in this paper, we propose FG-CAM (Fine-Grained CAM), which
extends CAM-based methods to enable generating fine-grained and
high-faithfulness explanations. FG-CAM uses the relationship between two
adjacent layers of feature maps with resolution differences to gradually
increase the explanation resolution, while finding the contributing pixels and
filtering out the pixels that do not contribute. Our method not only solves the
shortcoming of CAM-based methods without changing their characteristics, but
also generates fine-grained explanations that have higher faithfulness than LRP
and its variants. We also present FG-CAM with denoising, which is a variant of
FG-CAM and is able to generate less noisy explanations with almost no change in
explanation faithfulness. Experimental results show that the performance of
FG-CAM is almost unaffected by the explanation resolution. FG-CAM outperforms
existing CAM-based methods significantly in both shallow and intermediate
layers, and outperforms LRP and its variants significantly in the input layer.
Our code is available at https://github.com/dongmo-qcq/FG-CAM.Comment: This paper has been accepted by AAAI202
Contextual Dictionary Lookup for Knowledge Graph Completion
Knowledge graph completion (KGC) aims to solve the incompleteness of
knowledge graphs (KGs) by predicting missing links from known triples, numbers
of knowledge graph embedding (KGE) models have been proposed to perform KGC by
learning embeddings. Nevertheless, most existing embedding models map each
relation into a unique vector, overlooking the specific fine-grained semantics
of them under different entities. Additionally, the few available fine-grained
semantic models rely on clustering algorithms, resulting in limited performance
and applicability due to the cumbersome two-stage training process. In this
paper, we present a novel method utilizing contextual dictionary lookup,
enabling conventional embedding models to learn fine-grained semantics of
relations in an end-to-end manner. More specifically, we represent each
relation using a dictionary that contains multiple latent semantics. The
composition of a given entity and the dictionary's central semantics serves as
the context for generating a lookup, thus determining the fine-grained
semantics of the relation adaptively. The proposed loss function optimizes both
the central and fine-grained semantics simultaneously to ensure their semantic
consistency. Besides, we introduce two metrics to assess the validity and
accuracy of the dictionary lookup operation. We extend several KGE models with
the method, resulting in substantial performance improvements on widely-used
benchmark datasets
3D-VLA: A 3D Vision-Language-Action Generative World Model
Recent vision-language-action (VLA) models rely on 2D inputs, lacking
integration with the broader realm of the 3D physical world. Furthermore, they
perform action prediction by learning a direct mapping from perception to
action, neglecting the vast dynamics of the world and the relations between
actions and dynamics. In contrast, human beings are endowed with world models
that depict imagination about future scenarios to plan actions accordingly. To
this end, we propose 3D-VLA by introducing a new family of embodied foundation
models that seamlessly link 3D perception, reasoning, and action through a
generative world model. Specifically, 3D-VLA is built on top of a 3D-based
large language model (LLM), and a set of interaction tokens is introduced to
engage with the embodied environment. Furthermore, to inject generation
abilities into the model, we train a series of embodied diffusion models and
align them into the LLM for predicting the goal images and point clouds. To
train our 3D-VLA, we curate a large-scale 3D embodied instruction dataset by
extracting vast 3D-related information from existing robotics datasets. Our
experiments on held-in datasets demonstrate that 3D-VLA significantly improves
the reasoning, multimodal generation, and planning capabilities in embodied
environments, showcasing its potential in real-world applications.Comment: Project page: https://vis-www.cs.umass.edu/3dvla
DIAMOND: Taming Sample and Communication Complexities in Decentralized Bilevel Optimization
Decentralized bilevel optimization has received increasing attention recently
due to its foundational role in many emerging multi-agent learning paradigms
(e.g., multi-agent meta-learning and multi-agent reinforcement learning) over
peer-to-peer edge networks. However, to work with the limited computation and
communication capabilities of edge networks, a major challenge in developing
decentralized bilevel optimization techniques is to lower sample and
communication complexities. This motivates us to develop a new decentralized
bilevel optimization called DIAMOND (decentralized single-timescale stochastic
approximation with momentum and gradient-tracking). The contributions of this
paper are as follows: i) our DIAMOND algorithm adopts a single-loop structure
rather than following the natural double-loop structure of bilevel
optimization, which offers low computation and implementation complexity; ii)
compared to existing approaches, the DIAMOND algorithm does not require any
full gradient evaluations, which further reduces both sample and computational
complexities; iii) through a careful integration of momentum information and
gradient tracking techniques, we show that the DIAMOND algorithm enjoys
in sample and communication complexities for
achieving an -stationary solution, both of which are independent of
the dataset sizes and significantly outperform existing works. Extensive
experiments also verify our theoretical findings
Regulation of hepatic autophagy by stressâsensing transcription factor CREBH
Autophagy, a lysosomal degradative pathway in response to nutrient limitation, plays an important regulatory role in lipid homeostasis upon energy demands. Here, we demonstrated that the endoplasmic reticulumâtethered, stressâsensing transcription factor cAMPâresponsive elementâbinding protein, hepaticâspecific (CREBH) functions as a major transcriptional regulator of hepatic autophagy and lysosomal biogenesis in response to nutritional or circadian signals. CREBH deficiency led to decreased hepatic autophagic activities and increased hepatic lipid accumulation upon starvation. Under unfed or during energyâdemanding phases of the circadian cycle, CREBH is activated to drive expression of the genes encoding the key enzymes or regulators in autophagosome formation or autophagic process, including microtubuleâassociated protein IBâlight chain 3, autophagyârelated protein (ATG)7, ATG2b, and autophagosome formation Uncâ51 like kinase 1, and the genes encoding functions in lysosomal biogenesis and homeostasis. Upon nutrient starvation, CREBH regulates and interacts with peroxisome proliferatorâactivated receptor α (PPARα) and PPARÎł coactivator 1α to synergistically drive expression of the key autophagy genes and transcription factor EB, a master regulator of lysosomal biogenesis. Furthermore, CREBH regulates rhythmic expression of the key autophagy genes in the liver in a circadianâdependent manner. In summary, we identified CREBH as a key transcriptional regulator of hepatic autophagy and lysosomal biogenesis for the purpose of maintaining hepatic lipid homeostasis under nutritional stress or circadian oscillation.âKim, H., Williams, D., Qiu, Y., Song, Z., Yang, Z., Kimler, V., Goldberg, A., Zhang, R., Yang, Z., Chen, X., Wang, L., Fang, D., Lin, J. D., Zhang, K. Regulation of hepatic autophagy by stressâsensing transcription factor CREBH. FASEB J. 33, 7896â7914 (2019). www.fasebj.orgPeer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/154423/1/fsb2fj201802528r-sup-0001.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154423/2/fsb2fj201802528r.pd
Harnessing accurate mitochondrial DNA base editing mediated by DdCBEs in a predictable manner
Introduction: Mitochondrial diseases caused by mtDNA have no effective cures. Recently developed DddA-derived cytosine base editors (DdCBEs) have potential therapeutic implications in rescuing the mtDNA mutations. However, the performance of DdCBEs relies on designing different targets or improving combinations of split-DddA halves and orientations, lacking knowledge of predicting the results before its application.Methods: A series of DdCBE pairs for wide ranges of aC or tC targets was constructed, and transfected into Neuro-2a cells. The mutation rate of targets was compared to figure out the potential editing rules.Results: It is found that DdCBEs mediated mtDNA editing is predictable: 1) aC targets have a concentrated editing window for mtDNA editing in comparison with tC targets, which at 5âC8-11 (G1333) and 5âC10-13 (G1397) for aC target, while 5âC4-13 (G1333) and 5âC5-14 (G1397) for tC target with 16bp spacer. 2) G1333 mediated C>T conversion at aC targets in DddA-half-specific manner, while G1333 and G1397 mediated C>T conversion are DddA-half-prefer separately for tC and aC targets. 3) The nucleotide adjacent to the 3â end of aC motif affects mtDNA editing. Finally, by the guidance of these rules, a cell model harboring a pathogenic mtDNA mutation was constructed with high efficiency and no bystander effects.Discussion: In summary, this discovery helps us conceive the optimal strategy for accurate mtDNA editing, avoiding time- and effort-consuming optimized screening jobs