Search CORE

7 research outputs found

Performance benchmarking, analysis, and optimization of deep learning inference

Author: Li Cheng
Publication venue
Publication date: 01/08/2020
Field of study

The world sees a proliferation of deep learning (DL) models and their wide adoption in different application domains. This has made the performance benchmarking, understanding, and optimization of DL inference an increasingly pressing task for both hardware designers and system providers, as they would like to offer the best possible computing system to serve DL models with the desired latency, throughput, and energy requirements while maximizing resource utilization. However, DL faces the following challenges in performance engineering. Benchmarking — While there have been significant efforts to develop benchmark suites that evaluate widely used DL models, developing, maintaining, and running benchmarks takes a non-trivial amount of effort, and DL benchmarking has been hampered in part due to the lack of representative and up-to-date benchmarking suites. Performance Understanding — Understanding the performance of DL workloads is challenging as their characteristics depend on the interplay between the models, frameworks, system libraries, and the hardware (or the HW/SW stack). Existing profiling tools are disjoint, however, and only focus on profiling within a particular level of the stack. This largely limits the types of analysis that can be performed on model execution. Optimization Advising — The current DL optimization process is manual and ad-hoc that requires a lot of effort and expertise. Existing tools lack the highly desired abilities to characterize ideal performance, identify sources of inefficiency, and quantify the benefits of potential optimizations. Such deficiencies have led to slow DL characterization/optimization cycles that cannot keep up with the fast pace at which new DL innovations are introduced. Evaluation and Comparison — The current DL landscape is fast-paced and is rife with non-uniform models, hardware/software (HW/SW) stacks, but lacks a DL benchmarking platform to facilitate evaluation and comparison of DL innovations, be it models, frameworks, libraries, or hardware. Due to the lack of a benchmarking platform, the current practice of evaluating the benefits of proposed DL innovations is both arduous and error-prone — stifling the adoption of the innovations. This thesis addresses the above challenges in DL performance engineering. First we introduce DLBricks, a composable benchmark generation design that reduces the effort of developing, maintaining, and running DL benchmarks. DLBricks decomposes DL models into a set of unique runnable networks and constructs the original model’s performance using the performance of the generated benchmarks. Then, we present XSP, an across-stack profiling design that correlates profiles from different sources to obtain a holistic and hierarchical view of DL model execution. XSP innovatively leverages distributed tracing and accurately capture the profiles at each level of the HW/SW stack in spite of profiling overhead. Next, we propose Benanza, a systematic DL benchmarking and analysis design that guides researchers to potential optimization opportunities and assesses hypothetical execution scenarios on GPUs. Finally, we design MLModelScope, a consistent, reproducible, and scalable DL benchmarking platform to facilitate evaluation and comparison of DL innovations. This thesis also briefly discusses TrIMS, TOPS, and CommScope which are developed based on the needs observed from the performance benchmarking and optimization work to solve relevant problems in the DL domain

Illinois Digital Environment for Access to Learning and Scholarship Repository

Neural network computing using on-chip accelerators

Author: Eldridge Schuyler
Publication venue
Publication date: 05/11/2016
Field of study

The use of neural networks, machine learning, or artificial intelligence, in its broadest and most controversial sense, has been a tumultuous journey involving three distinct hype cycles and a history dating back to the 1960s. Resurgent, enthusiastic interest in machine learning and its applications bolsters the case for machine learning as a fundamental computational kernel. Furthermore, researchers have demonstrated that machine learning can be utilized as an auxiliary component of applications to enhance or enable new types of computation such as approximate computing or automatic parallelization. In our view, machine learning becomes not the underlying application, but a ubiquitous component of applications. This view necessitates a different approach towards the deployment of machine learning computation that spans not only hardware design of accelerator architectures, but also user and supervisor software to enable the safe, simultaneous use of machine learning accelerator resources. In this dissertation, we propose a multi-transaction model of neural network computation to meet the needs of future machine learning applications. We demonstrate that this model, encompassing a decoupled backend accelerator for inference and learning from hardware and software for managing neural network transactions can be achieved with low overhead and integrated with a modern RISC-V microprocessor. Our extensions span user and supervisor software and data structures and, coupled with our hardware, enable multiple transactions from different address spaces to execute simultaneously, yet safely. Together, our system demonstrates the utility of a multi-transaction model to increase energy efficiency improvements and improve overall accelerator throughput for machine learning applications

Boston University Institutional Repository (OpenBU)

Recommended from our members

Intelligent Software in the Era of Deep Learning

Author: Wang Yuke
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

With the end of Moore’s Law and the rise of compute- and data-intensive deep learning (DL) applications, the focus on arduous new processor design has shifted towards a more effective and agile approach: Intelligent Software to maximize the performance gains of DL hardware like GPUs. There are several highlights of such intelligent software design. First, it would maximize the execution efficiency of existing and emerging DL algorithms on powerful platforms like GPUs. Second, it would promote the adaptiveness of systems to handle a diverse range of inputs. Third, it would maintain sufficient portability and scalability across a diverse range of platforms, such as mobile devices and high-performance clusters.In this thesis, I will first highlight the importance of software innovation to bridge the gap between the increasingly diverse DL applications and the existing powerful DL hardware platforms. The second part of my thesis will recap my research work on DL system software innovation, focusing on 1) Precision Mismatch between DL applications and high-performance GPU units like Tensor Cores (e.g., QGTC [PPoPP ’22] and APNN-TC [SC ’21]), to improve the efficiency of quantized deep learning on powerful GPU platforms, and 2) Computing Pattern Mismatch between the sparse and irregular DL applications, such as Graph Neural Networks, and the dense and regular tailored GPU computing paradigm (e.g., GNNAdvisor [OSDI ’21] and MGG [OSDI ’23]), to highlight system adaptability and scalability. Finally, I will conclude this thesis with my vision and future work for building efficient, scalable, and secure DL systems

eScholarship - University of California

Building the Hyperconnected Society- Internet of Things Research and Innovation Value Chains, Ecosystems and Markets

Author
Publication venue: 'Informa UK Limited'
Publication date: 28/11/2022
Field of study

This book aims to provide a broad overview of various topics of Internet of Things (IoT), ranging from research, innovation and development priorities to enabling technologies, nanoelectronics, cyber-physical systems, architecture, interoperability and industrial applications. All this is happening in a global context, building towards intelligent, interconnected decision making as an essential driver for new growth and co-competition across a wider set of markets. It is intended to be a standalone book in a series that covers the Internet of Things activities of the IERC – Internet of Things European Research Cluster from research to technological innovation, validation and deployment.The book builds on the ideas put forward by the European Research Cluster on the Internet of Things Strategic Research and Innovation Agenda, and presents global views and state of the art results on the challenges facing the research, innovation, development and deployment of IoT in future years. The concept of IoT could disrupt consumer and industrial product markets generating new revenues and serving as a growth driver for semiconductor, networking equipment, and service provider end-markets globally. This will create new application and product end-markets, change the value chain of companies that creates the IoT technology and deploy it in various end sectors, while impacting the business models of semiconductor, software, device, communication and service provider stakeholders. The proliferation of intelligent devices at the edge of the network with the introduction of embedded software and app-driven hardware into manufactured devices, and the ability, through embedded software/hardware developments, to monetize those device functions and features by offering novel solutions, could generate completely new types of revenue streams. Intelligent and IoT devices leverage software, software licensing, entitlement management, and Internet connectivity in ways that address many of the societal challenges that we will face in the next decade

Directory of Open Access Books (DOAB)

Building the Hyperconnected Society- Internet of Things Research and Innovation Value Chains, Ecosystems and Markets

Author
Publication venue: 'Informa UK Limited'
Publication date
Field of study

OAPEN Library

Fuelling the zero-emissions road freight of the future: routing of mobile fuellers

Author: Raeesi Ramin
Publication venue
Publication date
Field of study

The future of zero-emissions road freight is closely tied to the sufficient availability of new and clean fuel options such as electricity and Hydrogen. In goods distribution using Electric Commercial Vehicles (ECVs) and Hydrogen Fuel Cell Vehicles (HFCVs) a major challenge in the transition period would pertain to their limited autonomy and scarce and unevenly distributed refuelling stations. One viable solution to facilitate and speed up the adoption of ECVs/HFCVs by logistics, however, is to get the fuel to the point where it is needed (instead of diverting the route of delivery vehicles to refuelling stations) using "Mobile Fuellers (MFs)". These are mobile battery swapping/recharging vans or mobile Hydrogen fuellers that can travel to a running ECV/HFCV to provide the fuel they require to complete their delivery routes at a rendezvous time and space. In this presentation, new vehicle routing models will be presented for a third party company that provides MF services. In the proposed problem variant, the MF provider company receives routing plans of multiple customer companies and has to design routes for a fleet of capacitated MFs that have to synchronise their routes with the running vehicles to deliver the required amount of fuel on-the-fly. This presentation will discuss and compare several mathematical models based on different business models and collaborative logistics scenarios

Kent Academic Repository

3rd Covenant University International Conference on African Development Issues (CU-ICADI)

Author: Atayero A. A.
Publication venue
Publication date
Field of study

Covenant University Repository