61 research outputs found
Serving MoE Models on Resource-constrained Edge Devices via Dynamic Expert Swapping
Mixture of experts (MoE) is a popular technique in deep learning that
improves model capacity with conditionally-activated parallel neural network
modules (experts). However, serving MoE models in resource-constrained
latency-critical edge scenarios is challenging due to the significantly
increased model size and complexity. In this paper, we first analyze the
behavior pattern of MoE models in continuous inference scenarios, which leads
to three key observations about the expert activations, including temporal
locality, exchangeability, and skippable computation. Based on these
observations, we introduce PC-MoE, an inference framework for
resource-constrained continuous MoE model serving. The core of PC-MoE is a new
data structure, Parameter Committee, that intelligently maintains a subset of
important experts in use to reduce resource consumption. The optimal
configuration of Parameter Committee is found offline by a profiling-guided
committee planner, and expert swapping and request handling at runtime are
managed by an adaptive committee scheduler. To evaluate the effectiveness of
PC-MoE, we conduct experiments using state-of-the-art MoE models on common
computer vision and natural language processing tasks. The results demonstrate
optimal trade-offs between resource consumption and model accuracy achieved by
PC-MoE. For instance, on object detection tasks with the Swin-MoE model, our
approach can reduce memory usage and latency by 42.34% and 18.63% with only
0.10% accuracy degradation
Recursive Generalization Transformer for Image Super-Resolution
Transformer architectures have exhibited remarkable performance in image
super-resolution (SR). Since the quadratic computational complexity of the
self-attention (SA) in Transformer, existing methods tend to adopt SA in a
local region to reduce overheads. However, the local design restricts the
global context exploitation, which is crucial for accurate image
reconstruction. In this work, we propose the Recursive Generalization
Transformer (RGT) for image SR, which can capture global spatial information
and is suitable for high-resolution images. Specifically, we propose the
recursive-generalization self-attention (RG-SA). It recursively aggregates
input features into representative feature maps, and then utilizes
cross-attention to extract global information. Meanwhile, the channel
dimensions of attention matrices (query, key, and value) are further scaled to
mitigate the redundancy in the channel domain. Furthermore, we combine the
RG-SA with local self-attention to enhance the exploitation of the global
context, and propose the hybrid adaptive integration (HAI) for module
integration. The HAI allows the direct and effective fusion between features at
different levels (local or global). Extensive experiments demonstrate that our
RGT outperforms recent state-of-the-art methods quantitatively and
qualitatively. Code is released at https://github.com/zhengchen1999/RGT.Comment: Code is available at https://github.com/zhengchen1999/RG
Cross Aggregation Transformer for Image Restoration
Recently, Transformer architecture has been introduced into image restoration
to replace convolution neural network (CNN) with surprising results.
Considering the high computational complexity of Transformer with global
attention, some methods use the local square window to limit the scope of
self-attention. However, these methods lack direct interaction among different
windows, which limits the establishment of long-range dependencies. To address
the above issue, we propose a new image restoration model, Cross Aggregation
Transformer (CAT). The core of our CAT is the Rectangle-Window Self-Attention
(Rwin-SA), which utilizes horizontal and vertical rectangle window attention in
different heads parallelly to expand the attention area and aggregate the
features cross different windows. We also introduce the Axial-Shift operation
for different window interactions. Furthermore, we propose the Locality
Complementary Module to complement the self-attention mechanism, which
incorporates the inductive bias of CNN (e.g., translation invariance and
locality) into Transformer, enabling global-local coupling. Extensive
experiments demonstrate that our CAT outperforms recent state-of-the-art
methods on several image restoration applications. The code and models are
available at https://github.com/zhengchen1999/CAT.Comment: Accepted to NeurIPS 2022. Code is available at
https://github.com/zhengchen1999/CA
Natural Language based Context Modeling and Reasoning with LLMs: A Tutorial
Large language models (LLMs) have become phenomenally surging, since
2018--two decades after introducing context-awareness into computing systems.
Through taking into account the situations of ubiquitous devices, users and the
societies, context-aware computing has enabled a wide spectrum of innovative
applications, such as assisted living, location-based social network services
and so on. To recognize contexts and make decisions for actions accordingly,
various artificial intelligence technologies, such as Ontology and OWL, have
been adopted as representations for context modeling and reasoning. Recently,
with the rise of LLMs and their improved natural language understanding and
reasoning capabilities, it has become feasible to model contexts using natural
language and perform context reasoning by interacting with LLMs such as ChatGPT
and GPT-4. In this tutorial, we demonstrate the use of texts, prompts, and
autonomous agents (AutoAgents) that enable LLMs to perform context modeling and
reasoning without requiring fine-tuning of the model. We organize and introduce
works in the related field, and name this computing paradigm as the LLM-driven
Context-aware Computing (LCaC). In the LCaC paradigm, users' requests, sensors
reading data, and the command to actuators are supposed to be represented as
texts. Given the text of users' request and sensor data, the AutoAgent models
the context by prompting and sends to the LLM for context reasoning. LLM
generates a plan of actions and responds to the AutoAgent, which later follows
the action plan to foster context-awareness. To prove the concepts, we use two
showcases--(1) operating a mobile z-arm in an apartment for assisted living,
and (2) planning a trip and scheduling the itinerary in a context-aware and
personalized manner.Comment: Under revie
Xformer: Hybrid X-Shaped Transformer for Image Denoising
In this paper, we present a hybrid X-shaped vision Transformer, named
Xformer, which performs notably on image denoising tasks. We explore
strengthening the global representation of tokens from different scopes. In
detail, we adopt two types of Transformer blocks. The spatial-wise Transformer
block performs fine-grained local patches interactions across tokens defined by
spatial dimension. The channel-wise Transformer block performs direct global
context interactions across tokens defined by channel dimension. Based on the
concurrent network structure, we design two branches to conduct these two
interaction fashions. Within each branch, we employ an encoder-decoder
architecture to capture multi-scale features. Besides, we propose the
Bidirectional Connection Unit (BCU) to couple the learned representations from
these two branches while providing enhanced information fusion. The joint
designs make our Xformer powerful to conduct global information modeling in
both spatial and channel dimensions. Extensive experiments show that Xformer,
under the comparable model complexity, achieves state-of-the-art performance on
the synthetic and real-world image denoising tasks. We also provide code and
models at https://github.com/gladzhang/Xformer.Comment: Accepted to ICLR 2024. Code and models are available at
https://github.com/gladzhang/Xforme
Hierarchical Integration Diffusion Model for Realistic Image Deblurring
Diffusion models (DMs) have recently been introduced in image deblurring and
exhibited promising performance, particularly in terms of details
reconstruction. However, the diffusion model requires a large number of
inference iterations to recover the clean image from pure Gaussian noise, which
consumes massive computational resources. Moreover, the distribution
synthesized by the diffusion model is often misaligned with the target results,
leading to restrictions in distortion-based metrics. To address the above
issues, we propose the Hierarchical Integration Diffusion Model (HI-Diff), for
realistic image deblurring. Specifically, we perform the DM in a highly
compacted latent space to generate the prior feature for the deblurring
process. The deblurring process is implemented by a regression-based method to
obtain better distortion accuracy. Meanwhile, the highly compact latent space
ensures the efficiency of the DM. Furthermore, we design the hierarchical
integration module to fuse the prior into the regression-based model from
multiple scales, enabling better generalization in complex blurry scenarios.
Comprehensive experiments on synthetic and real-world blur datasets demonstrate
that our HI-Diff outperforms state-of-the-art methods. Code and trained models
are available at https://github.com/zhengchen1999/HI-Diff.Comment: Code is available at https://github.com/zhengchen1999/HI-Dif
- …