84 research outputs found
Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI
Influenced by the great success of deep learning via cloud computing and the
rapid development of edge chips, research in artificial intelligence (AI) has
shifted to both of the computing paradigms, i.e., cloud computing and edge
computing. In recent years, we have witnessed significant progress in
developing more advanced AI models on cloud servers that surpass traditional
deep learning models owing to model innovations (e.g., Transformers, Pretrained
families), explosion of training data and soaring computing capabilities.
However, edge computing, especially edge and cloud collaborative computing, are
still in its infancy to announce their success due to the resource-constrained
IoT scenarios with very limited algorithms deployed. In this survey, we conduct
a systematic review for both cloud and edge AI. Specifically, we are the first
to set up the collaborative learning mechanism for cloud and edge modeling with
a thorough review of the architectures that enable such mechanism. We also
discuss potentials and practical experiences of some on-going advanced edge AI
topics including pretraining models, graph neural networks and reinforcement
learning. Finally, we discuss the promising directions and challenges in this
field.Comment: 20 pages, Transactions on Knowledge and Data Engineerin
Towards Scalable, Private and Practical Deep Learning
Deep Learning (DL) models have drastically improved the performance of Artificial Intelligence (AI) tasks such as image recognition, word prediction, translation, among many others, on which traditional Machine Learning (ML) models fall short. However, DL models are costly to design, train, and deploy due to their computing and memory demands. Designing DL models usually requires extensive expertise and significant manual tuning efforts. Even with the latest accelerators such as Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU), training DL models can take prohibitively long time, therefore training large DL models in a distributed manner is a norm. Massive amount of data is made available thanks to the prevalence of mobile and internet-of-things (IoT) devices. However, regulations such as HIPAA and GDPR limit the access and transmission of personal data to protect security and privacy. Therefore, enabling DL model training in a decentralized but private fashion is urgent and critical. Deploying trained DL models in a real world environment usually requires meeting Quality of Service (QoS) standards, which makes adaptability of DL models an important yet challenging matter. In this dissertation, we aim to address the above challenges to make a step towards scalable, private, and practical deep learning. To simplify DL model design, we propose Efficient Progressive Neural-Architecture Search (EPNAS) and FedCust to automatically design model architectures and tune hyperparameters, respectively. To provide efficient and robust distributed training while preserving privacy, we design LEASGD, TiFL, and HDFL. We further conduct a study on the security aspect of distributed learning by focusing on how data heterogeneity affects backdoor attacks and how to mitigate such threats. Finally, we use super resolution (SR) as an example application to explore model adaptability for cross platform deployment and dynamic runtime environment. Specifically, we propose DySR and AdaSR frameworks which enable SR models to meet QoS by dynamically adapting to available resources instantly and seamlessly without excessive memory overheads
HIGH-THROUGHPUT AREA-EFFICIENT INTEGER TRANSFORMS FOR VIDEO CODING
Ph.DDOCTOR OF PHILOSOPH
Full Stack Optimization of Transformer Inference: a Survey
Recent advances in state-of-the-art DNN architecture design have been moving
toward Transformer models. These models achieve superior accuracy across a wide
range of applications. This trend has been consistent over the past several
years since Transformer models were originally introduced. However, the amount
of compute and bandwidth required for inference of recent Transformer models is
growing at a significant rate, and this has made their deployment in
latency-sensitive applications challenging. As such, there has been an
increased focus on making Transformer models more efficient, with methods that
range from changing the architecture design, all the way to developing
dedicated domain-specific accelerators. In this work, we survey different
approaches for efficient Transformer inference, including: (i) analysis and
profiling of the bottlenecks in existing Transformer architectures and their
similarities and differences with previous convolutional models; (ii)
implications of Transformer architecture on hardware, including the impact of
non-linear operations such as Layer Normalization, Softmax, and GELU, as well
as linear operations, on hardware design; (iii) approaches for optimizing a
fixed Transformer architecture; (iv) challenges in finding the right mapping
and scheduling of operations for Transformer models; and (v) approaches for
optimizing Transformer models by adapting the architecture using neural
architecture search. Finally, we perform a case study by applying the surveyed
optimizations on Gemmini, the open-source, full-stack DNN accelerator
generator, and we show how each of these approaches can yield improvements,
compared to previous benchmark results on Gemmini. Among other things, we find
that a full-stack co-design approach with the aforementioned methods can result
in up to 88.7x speedup with a minimal performance degradation for Transformer
inference
Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures
One of the significant shifts of the next-generation computing technologies will certainly be in
the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD
landmark, evolved as a widely deployed BD operating system. Its new features include
federation structure and many associated frameworks, which provide Hadoop 3.x with the
maturity to serve different markets. This dissertation addresses two leading issues involved in
exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely,
(i)Scalability that directly affects the system performance and overall throughput using
portable Docker containers. (ii) Security that spread the adoption of data protection practices
among practitioners using access controls. An Enhanced Mapreduce Environment (EME),
OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker
(BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for
data streaming to the cloud computing are the main contribution of this thesis study
Aerospace Medicine and Biology: A continuing bibliography with indexes
This bibliography lists 253 reports, articles, and other documents introduced into the NASA scientific and technical information system in October 1975
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Deep Neural Networks (DNNs) are becoming an important tool in modern
computing applications. Accelerating their training is a major challenge and
techniques range from distributed algorithms to low-level circuit design. In
this survey, we describe the problem from a theoretical perspective, followed
by approaches for its parallelization. We present trends in DNN architectures
and the resulting implications on parallelization strategies. We then review
and model the different types of concurrency in DNNs: from the single operator,
through parallelism in network inference and training, to distributed deep
learning. We discuss asynchronous stochastic optimization, distributed system
architectures, communication schemes, and neural architecture search. Based on
those approaches, we extrapolate potential directions for parallelism in deep
learning
Air Force Institute of Technology Contributions to Air Force Research and Development, Calendar Year 1987
From the introduction:The primary mission of the Air Force Institute of Technology (AFIT) is education, but research and consulting are essential integral elements in the process. This report highlights AFIT\u27s contributions to Air Force research and development activities [in 1987]
- …