7,671 research outputs found
Latch-based RISC-V core with popcount instruction for CNN acceleration
Energy-efficiency is essential for vast majority of mobile and embedded battery-powered systems. Internet-of-Things paradigm combines requirements for high computational capabilities, extreme energy-efficiency and low-cost. Increasing manufacturing process variations pose formidable challenges for deep-submicron integrated circuit designs. The effects of variation are further exacerbated by lowered voltages in energy-efficient designs. Compared to traditional flip-flop-based design, latch-based design offers area, energy-efficiency and variation tolerance benefits at the cost of increased timing behavior complexity. A method for converting flip-flop-based processor core to latch-based core at register-transfer-level is presented in this work.
Convolutional neural networks have enabled image recognition in the field of computer vision at unprecedented accuracy. Performance and memory requirements of canonical convolutional neural networks have been out of reach for low-cost IoT devices. In collaboration with Tampere University, a custom popcount instruction was added to the cores for accelerating IoT optimized vehicle classification convolutional neural network.
This work compares simulation results from synthesized flip-flop-based and latch-based versions of a SCR1 RISC-V processor core and the effects of custom instruction for CNN acceleration. The latch core achieved roughly 50\% smaller energy per operation than the flip-flop core and 2.1x speedup was observed in the execution of the CNN when using the custom instruction
Handling the complexity of routing problem in modern VLSI design
In VLSI physical design, the routing task consists of using over-the-cell metal wires to connect pins and ports of circuit gates and blocks. Traditionally, VLSI routing is an important design step in the sense that the quality of routing solution has great impact on various design metrics such as circuit timing, power consumption, chip reliability and manufacturability etc. As the advancing VLSI design enters the nanometer era, the routing success (routability issue) has been arising as one of the most critical problems in back-end design. In one aspect, the degree of design complexity is increasing dramatically as more and more modules are integrated into the chip. Much higher chip density leads to higher routing demands and potentially more risks in routing failure. In another aspect, with decreasing design feature size, there are more complex design rules imposed to ensure manufacturability. These design rules are hard to satisfy and they usually create more barriers for achieving routing closure (i.e., generate DRC free routing solution) and thus affect chip time to market (TTM) plan.
In general, the behavior and performance of routing are affected by three consecutive phases: placement phase, global routing phase and detailed routing phase in a typical VLSI physical design flow. Traditional CAD tools handle each of the three phases independently and the global picture of the routability issue is neglected. Different from conventional approaches which propose tools and algorithms for one particular design phase, this thesis investigates the routability issue from all three phases and proposes a series of systematic solutions to build a more generic flow and improve quality of results (QoR). For the placement phase, we will introduce a mixed-sized placement refinement tool for alleviating congestion after placement. The tool shifts and relocates modules based on a global routing estimation. For the global routing phase, a very fast and effective global router is developed. Its performance surpasses many peer works as verified by ISPD 2008 global routing contest results. In the detailed routing phase, a tool is proposed to perform detailed routing using regular routing patterns based on a correct-by-construction methodology to improve routability as well as satisfy most design rules. Finally, the tool which integrates global routing and detailed routing is developed to remedy the inconsistency between global routing and detailed routing.
To verify the algorithms we proposed, three sets of testcases derived from ISPD98 and ISPD05/06 placement benchmark suites are proposed. The results indicate that our proposed methods construct an integrated and systematic flow for routability improvement which is better than conventional methods
Spiking Neural Networks for Inference and Learning: A Memristor-based Design Perspective
On metrics of density and power efficiency, neuromorphic technologies have
the potential to surpass mainstream computing technologies in tasks where
real-time functionality, adaptability, and autonomy are essential. While
algorithmic advances in neuromorphic computing are proceeding successfully, the
potential of memristors to improve neuromorphic computing have not yet born
fruit, primarily because they are often used as a drop-in replacement to
conventional memory. However, interdisciplinary approaches anchored in machine
learning theory suggest that multifactor plasticity rules matching neural and
synaptic dynamics to the device capabilities can take better advantage of
memristor dynamics and its stochasticity. Furthermore, such plasticity rules
generally show much higher performance than that of classical Spike Time
Dependent Plasticity (STDP) rules. This chapter reviews the recent development
in learning with spiking neural network models and their possible
implementation with memristor-based hardware
Edge Generation Scheduling for DAG Tasks using Deep Reinforcement Learning
Directed acyclic graph (DAG) tasks are currently adopted in the real-time
domain to model complex applications from the automotive, avionics, and
industrial domain that implement their functionalities through chains of
intercommunicating tasks. This paper studies the problem of scheduling
real-time DAG tasks by presenting a novel schedulability test based on the
concept of trivial schedulability. Using this schedulability test, we propose a
new DAG scheduling framework (edge generation scheduling -- EGS) that attempts
to minimize the DAG width by iteratively generating edges while guaranteeing
the deadline constraint. We study how to efficiently solve the problem of
generating edges by developing a deep reinforcement learning algorithm combined
with a graph representation neural network to learn an efficient edge
generation policy for EGS. We evaluate the effectiveness of the proposed
algorithm by comparing it with state-of-the-art DAG scheduling heuristics and
an optimal mixed-integer linear programming baseline. Experimental results show
that the proposed algorithm outperforms the state-of-the-art by requiring fewer
processors to schedule the same DAG tasks.Comment: Under revie
Robust Architectures for Embedded Wireless Network Control and Actuation
Networked Cyber-Physical Systems are fundamentally constrained by the tight coupling and closed-loop control of physical processes. To address actuation in such closed-loop wireless control systems there is a strong need to re-think the communication architectures and protocols for reliability, coordination and control. We introduce the Embedded Virtual Machine (EVM), a programming abstraction where controller tasks with their control and timing properties are maintained across physical node boundaries and functionality is capable of migrating to the most competent set of physical controllers. In the context of process and discrete control, an EVM is the distributed runtime system that dynamically selects primary-backup sets of controllers given spatial and temporal constraints of the underlying wireless network. EVM-based algorithms allow network control algorithms to operate seamlessly over less reliable wireless networks with topological changes. They introduce new capabilities such as predictable outcomes during sensor/actuator failure, adaptation to mode changes and runtime optimization of resource consumption. An automated design flow from Simulink to platform-independent domain specific languages, and subsequently, to platform-dependent code generation is presented. Through case studies in discrete and process control we demonstrate the capabilities of EVM-based wireless network control systems
A methodology for the design of dynamic accuracy operators by runtime back bias
Mobile and IoT applications must balance increasing processing demands with limited power and cost budgets. Approximate computing achieves this goal leveraging the error tolerance features common in many emerging applications to reduce power consumption. In particular, adequate (i.e., energy/quality-configurable) hardware operators are key components in an error tolerant system. Existing implementations of these operators require significant architectural modifications, hence they are often design-specific and tend to have large overheads compared to accurate units. In this paper, we propose a methodology to design adequate data-path operators in an automatic way, which uses threshold voltage scaling as a knob to dynamically control the power/accuracy tradeoff. The method overcomes the limitations of previous solutions based on supply voltage scaling, in that it introduces lower overheads and it allows fine-grain regulation of this tradeoff. We demonstrate our approach on a state-of-the-art 28nm FDSOI technology, exploiting the strong effect of back biasing on threshold voltage. Results show a power consumption reduction of as much as 39% compared to solutions based only on supply voltage scaling, at iso-accuracy
- …