172 research outputs found
Combined Scheduling, Memory Allocation and Tensor Replacement for Minimizing Off-Chip Data Accesses of DNN Accelerators
Specialized hardware accelerators have been extensively used for Deep Neural
Networks (DNNs) to provide power/performance benefits. These accelerators
contain specialized hardware that supports DNN operators, and scratchpad memory
for storing the tensor operands. Often, the size of the scratchpad is
insufficient to store all the tensors needed for the computation, and
additional data accesses are needed to move tensors back and forth from host
memory during the computation with significant power/performance overhead. The
volume of these additional data accesses depends on the operator schedule, and
memory allocation (specific locations selected for the tensors in the
scratchpad). We propose an optimization framework, named COSMA, for mapping
DNNs to an accelerator that finds the optimal operator schedule, memory
allocation and tensor replacement that minimizes the additional data accesses.
COSMA provides an Integer Linear Programming (ILP) formulation to generate the
optimal solution for mapping a DNN to the accelerator for a given scratchpad
size. We demonstrate that, using an off-the-shelf ILP solver, COSMA obtains the
optimal solution in seconds for a wide-range of state-of-the-art DNNs for
different applications. Further, it out-performs existing methods by reducing
on average 84% of the non-compulsory data accesses. We further propose a
divide-and-conquer heuristic to scale up to certain complex DNNs generated by
Neural Architecture Search, and this heuristic solution reduces on average 85%
data accesses compared with other works
Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification
Modern Systems-on-Chip (SoC) designs are increasingly heterogeneous and
contain specialized semi-programmable accelerators in addition to programmable
processors. In contrast to the pre-accelerator era, when the ISA played an
important role in verification by enabling a clean separation of concerns
between software and hardware, verification of these "accelerator-rich" SoCs
presents new challenges. From the perspective of hardware designers, there is a
lack of a common framework for the formal functional specification of
accelerator behavior. From the perspective of software developers, there exists
no unified framework for reasoning about software/hardware interactions of
programs that interact with accelerators. This paper addresses these challenges
by providing a formal specification and high-level abstraction for accelerator
functional behavior. It formalizes the concept of an Instruction Level
Abstraction (ILA), developed informally in our previous work, and shows its
application in modeling and verification of accelerators. This formal ILA
extends the familiar notion of instructions to accelerators and provides a
uniform, modular, and hierarchical abstraction for modeling software-visible
behavior of both accelerators and programmable processors. We demonstrate the
applicability of the ILA through several case studies of accelerators (for
image processing, machine learning, and cryptography), and a general-purpose
processor (RISC-V). We show how the ILA model facilitates equivalence checking
between two ILAs, and between an ILA and its hardware finite-state machine
(FSM) implementation. Further, this equivalence checking supports accelerator
upgrades using the notion of ILA compatibility, similar to processor upgrades
using ISA compatibility.Comment: 24 pages, 3 figures, 3 table
Security Verification of Low-Trust Architectures
Low-trust architectures work on, from the viewpoint of software,
always-encrypted data, and significantly reduce the amount of hardware trust to
a small software-free enclave component. In this paper, we perform a complete
formal verification of a specific low-trust architecture, the Sequestered
Encryption (SE) architecture, to show that the design is secure against direct
data disclosures and digital side channels for all possible programs. We first
define the security requirements of the ISA of SE low-trust architecture.
Looking upwards, this ISA serves as an abstraction of the hardware for the
software, and is used to show how any program comprising these instructions
cannot leak information, including through digital side channels. Looking
downwards this ISA is a specification for the hardware, and is used to define
the proof obligations for any RTL implementation arising from the ISA-level
security requirements. These cover both functional and digital side-channel
leakage. Next, we show how these proof obligations can be successfully
discharged using commercial formal verification tools. We demonstrate the
efficacy of our RTL security verification technique for seven different correct
and buggy implementations of the SE architecture.Comment: 19 pages with appendi
Exploring state-of-the-art advances in targeted nanomedicines for managing acute and chronic inflammatory lung diseases
Diagnosis and treatment of lung diseases pose serious challenges. Currently, diagnostic as well as therapeutic methods show poor efficacy toward drug-resistant bacterial infections, while chemotherapy causes toxicity and nonspecific delivery of drugs. Advanced treatment methods that cure lung-related diseases, by enabling drug bioavailability via nasal passages during mucosal formation, which interferes with drug penetration to targeted sites, are in demand. Nanotechnology confers several advantages. Currently, different nanoparticles, or their combinations, are being used to enhance targeted drug delivery. Nanomedicine, a combination of nanoparticles and therapeutic agents, that delivers drugs to targeted sites increases the bioavailability of drugs at these sites. Thus, nanotechnology is superior to conventional chemotherapeutic strategies. Here, the authors review the latest advancements in nanomedicine-based drug-delivery methods for managing acute and chronic inflammatory lung diseases
Reducing the environmental impact of surgery on a global scale: systematic review and co-prioritization with healthcare workers in 132 countries
Abstract
Background
Healthcare cannot achieve net-zero carbon without addressing operating theatres. The aim of this study was to prioritize feasible interventions to reduce the environmental impact of operating theatres.
Methods
This study adopted a four-phase Delphi consensus co-prioritization methodology. In phase 1, a systematic review of published interventions and global consultation of perioperative healthcare professionals were used to longlist interventions. In phase 2, iterative thematic analysis consolidated comparable interventions into a shortlist. In phase 3, the shortlist was co-prioritized based on patient and clinician views on acceptability, feasibility, and safety. In phase 4, ranked lists of interventions were presented by their relevance to high-income countries and low–middle-income countries.
Results
In phase 1, 43 interventions were identified, which had low uptake in practice according to 3042 professionals globally. In phase 2, a shortlist of 15 intervention domains was generated. In phase 3, interventions were deemed acceptable for more than 90 per cent of patients except for reducing general anaesthesia (84 per cent) and re-sterilization of ‘single-use’ consumables (86 per cent). In phase 4, the top three shortlisted interventions for high-income countries were: introducing recycling; reducing use of anaesthetic gases; and appropriate clinical waste processing. In phase 4, the top three shortlisted interventions for low–middle-income countries were: introducing reusable surgical devices; reducing use of consumables; and reducing the use of general anaesthesia.
Conclusion
This is a step toward environmentally sustainable operating environments with actionable interventions applicable to both high– and low–middle–income countries
Code Generation for Dual-Load-Execute Architectures
This paper studies the problem of register allocation and scheduling for Dual-LoadExecute (DLE) architectures. These are architectures which can execute an ALU instruction and two memory transfer operations (load/store) in a single instruction cycle. DLE architectures are extensively used in the design of Digital Signal Processors (DSPs) like the Motorola 56000, Analog Devices ADSP-2100, and NEC ¯PD77016. This work proves the existence of an efficient O(n) expression tree code generation algorithm for DLE architectures which have homogeneous register sets. The algorithm is an extension of the Sethi-Ullman algorithm, and produces guaranteed optimal code for a large number of expression trees in the program. The experimental results, using the NEC ¯PD77016 as the target processor, show the efficacy of the approach. 1 Introduction Digital Signal Processors (DSPs) are receiving increased attention recently due to their role in the design of modern embedded systems like video cards, ce..
- …