475 research outputs found

    ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer

    Full text link
    Vision Transformers (ViTs) have shown impressive performance and have become a unified backbone for multiple vision tasks. But both attention and multi-layer perceptions (MLPs) in ViTs are not efficient enough due to dense multiplications, resulting in costly training and inference. To this end, we propose to reparameterize the pre-trained ViT with a mixture of multiplication primitives, e.g., bitwise shifts and additions, towards a new type of multiplication-reduced model, dubbed ShiftAddViT\textbf{ShiftAddViT}, which aims for end-to-end inference speedups on GPUs without the need of training from scratch. Specifically, all MatMuls\texttt{MatMuls} among queries, keys, and values are reparameterized by additive kernels, after mapping queries and keys to binary codes in Hamming space. The remaining MLPs or linear layers are then reparameterized by shift kernels. We utilize TVM to implement and optimize those customized kernels for practical hardware deployment on GPUs. We find that such a reparameterization on (quadratic or linear) attention maintains model accuracy, while inevitably leading to accuracy drops when being applied to MLPs. To marry the best of both worlds, we further propose a new mixture of experts (MoE) framework to reparameterize MLPs by taking multiplication or its primitives as experts, e.g., multiplication and shift, and designing a new latency-aware load-balancing loss. Such a loss helps to train a generic router for assigning a dynamic amount of input tokens to different experts according to their latency. In principle, the faster experts run, the larger amount of input tokens are assigned. Extensive experiments consistently validate the effectiveness of our proposed ShiftAddViT, achieving up to \textbf{5.18\times} latency reductions on GPUs and \textbf{42.9%} energy savings, while maintaining comparable accuracy as original or efficient ViTs.Comment: Accepted by NeurIPS 202

    Glucose Enhances Leptin Signaling through Modulation of AMPK Activity

    Get PDF
    Leptin exerts its action by binding to and activating the long form of leptin receptors (LEPRb). LEPRb activates JAK2 that subsequently phosphorylates and activates STAT3. The JAK2/STAT3 pathway is required for leptin control of energy balance and body weight. Defects in leptin signaling lead to leptin resistance, a primary risk factor for obesity. Body weight is also regulated by nutrients, including glucose. Defects in glucose sensing also contribute to obesity. Here we report crosstalk between leptin and glucose. Glucose starvation blocked the ability of leptin to stimulate tyrosyl phosphorylation and activation of JAK2 and STAT3 in a variety of cell types. Glucose dose-dependently enhanced leptin signaling. In contrast, glucose did not enhance growth hormone-stimulated phosphorylation of JAK2 and STAT5. Glucose starvation or 2-deoxyglucose-induced inhibition of glycolysis activated AMPK and inhibited leptin signaling; pharmacological inhibition of AMPK restored the ability of leptin to stimulate STAT3 phosphorylation. Conversely, pharmacological activation of AMPK was sufficient to inhibit leptin signaling and to block the ability of glucose to enhance leptin signaling. These results suggest that glucose and/or its metabolites play a permissive role in leptin signaling, and that glucose enhances leptin sensitivity at least in part by attenuating the ability of AMPK to inhibit leptin signaling

    Uncertainty Management of Dynamic Tariff Method for Congestion Management in Distribution Networks

    Get PDF

    voxel2vec: A Natural Language Processing Approach to Learning Distributed Representations for Scientific Data

    Full text link
    Relationships in scientific data, such as the numerical and spatial distribution relations of features in univariate data, the scalar-value combinations' relations in multivariate data, and the association of volumes in time-varying and ensemble data, are intricate and complex. This paper presents voxel2vec, a novel unsupervised representation learning model, which is used to learn distributed representations of scalar values/scalar-value combinations in a low-dimensional vector space. Its basic assumption is that if two scalar values/scalar-value combinations have similar contexts, they usually have high similarity in terms of features. By representing scalar values/scalar-value combinations as symbols, voxel2vec learns the similarity between them in the context of spatial distribution and then allows us to explore the overall association between volumes by transfer prediction. We demonstrate the usefulness and effectiveness of voxel2vec by comparing it with the isosurface similarity map of univariate data and applying the learned distributed representations to feature classification for multivariate data and to association analysis for time-varying and ensemble data.Comment: Accepted by IEEE Transaction on Visualization and Computer Graphics (TVCG

    NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants

    Full text link
    Tiny deep learning has attracted increasing attention driven by the substantial demand for deploying deep learning on numerous intelligent Internet-of-Things devices. However, it is still challenging to unleash tiny deep learning's full potential on both large-scale datasets and downstream tasks due to the under-fitting issues caused by the limited model capacity of tiny neural networks (TNNs). To this end, we propose a framework called NetBooster to empower tiny deep learning by augmenting the architectures of TNNs via an expansion-then-contraction strategy. Extensive experiments show that NetBooster consistently outperforms state-of-the-art tiny deep learning solutions

    A Survey of Multimodal Information Fusion for Smart Healthcare: Mapping the Journey from Data to Wisdom

    Full text link
    Multimodal medical data fusion has emerged as a transformative approach in smart healthcare, enabling a comprehensive understanding of patient health and personalized treatment plans. In this paper, a journey from data to information to knowledge to wisdom (DIKW) is explored through multimodal fusion for smart healthcare. We present a comprehensive review of multimodal medical data fusion focused on the integration of various data modalities. The review explores different approaches such as feature selection, rule-based systems, machine learning, deep learning, and natural language processing, for fusing and analyzing multimodal data. This paper also highlights the challenges associated with multimodal fusion in healthcare. By synthesizing the reviewed frameworks and theories, it proposes a generic framework for multimodal medical data fusion that aligns with the DIKW model. Moreover, it discusses future directions related to the four pillars of healthcare: Predictive, Preventive, Personalized, and Participatory approaches. The components of the comprehensive survey presented in this paper form the foundation for more successful implementation of multimodal fusion in smart healthcare. Our findings can guide researchers and practitioners in leveraging the power of multimodal fusion with the state-of-the-art approaches to revolutionize healthcare and improve patient outcomes.Comment: This work has been submitted to the ELSEVIER for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl
    • …
    corecore