4 research outputs found
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model
Text-to-image generation (TTI) refers to the usage of models that could
process text input and generate high fidelity images based on text
descriptions. Text-to-image generation using neural networks could be traced
back to the emergence of Generative Adversial Network (GAN), followed by the
autoregressive Transformer. Diffusion models are one prominent type of
generative model used for the generation of images through the systematic
introduction of noises with repeating steps. As an effect of the impressive
results of diffusion models on image synthesis, it has been cemented as the
major image decoder used by text-to-image models and brought text-to-image
generation to the forefront of machine-learning (ML) research. In the era of
large models, scaling up model size and the integration with large language
models have further improved the performance of TTI models, resulting the
generation result nearly indistinguishable from real-world images,
revolutionizing the way we retrieval images. Our explorative study has
incentivised us to think that there are further ways of scaling text-to-image
models with the combination of innovative model architectures and prediction
enhancement techniques. We have divided the work of this survey into five main
sections wherein we detail the frameworks of major literature in order to delve
into the different types of text-to-image generation methods. Following this we
provide a detailed comparison and critique of these methods and offer possible
pathways of improvement for future work. In the future work, we argue that TTI
development could yield impressive productivity improvements for creation,
particularly in the context of the AIGC era, and could be extended to more
complex tasks such as video generation and 3D generation
Computing on Large, Sparse Datasets and Error-Prone Fabrics
In this dissertation we study problems arising from two trends: computation on large and sparse datasets and computing on error-prone fabrics.
Every year the dataset sizes are growing. However, many of these large datasets are sparse, i.e., the majority of the data is zero. Therefore, skipping the zero elements can considerably accelerate computation on these datasets. We focus on accelerating a common kernel for sparse computation, sparse matrix-matrix multiplication (SpMM), and propose a high-performance and scalable systolic accelerator that minimizes the bandwidth-to-memory requirement and accelerates this operation 9-30 times compared to state-of-the-art.
We also study sparse formats used to store sparse datasets. These formats help with reducing the required bandwidth and storage by storing only the non-zero elements. We modify the popular sparse format: CRS and propose the InCRS format that improves non-regular accesses. We show that this modification reduces the required memory accesses and consequently accelerates SpMM 5-12 times.
As transistor scaling continues, devices are getting more unreliable and result in errors in the systems built out of them. We provide a framework that allows for comparing the error tolerance of different sparse data formats and choosing the most appropriate format for an arbitrary application. As case studies, we compare the performance of different formats for two machine learning applications, RBM and PCA, and a set of linear algebra operations.
We also study error-tolerant processors built on error-prone fabrics that allow for errors in the architectural states. We formalize the minimal requirements for these processors to assure that they potentially provide useful results are progress, preventing the error effects to accumulate over time, and executing the essential parts of the program.
We propose a framework to model the control flow of these processors, capturing the effects of errors and protection mechanisms, and to verify the reliability properties on them. As case studies, we verify these properties on two recent error-tolerant processors, PPU and ERSA, and propose modifications to these designs to satisfy the minimal reliability requirements
Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?
This study examines the impact of optimizing the Stable Diffusion (SD) guided
inference pipeline. We propose optimizing certain denoising steps by limiting
the noise computation to conditional noise and eliminating unconditional noise
computation, thereby reducing the complexity of the target iterations by 50%.
Additionally, we demonstrate that later iterations of the SD are less sensitive
to optimization, making them ideal candidates for applying the suggested
optimization. Our experiments show that optimizing the last 20% of the
denoising loop iterations results in an 8.2% reduction in inference time with
almost no perceivable changes to the human eye. Furthermore, we found that by
extending the optimization to 50% of the last iterations, we can reduce
inference time by approximately 20.3%, while still generating visually pleasing
images.Comment: 7 page