1,427 research outputs found
Voltage, throughput, power, reliability, and multicore scaling
This article studies the interplay between the performance, energy, and reliability (PER) of parallel-computing systems. It describes methods supporting the meaningful cross-platform analysis of this interplay. These methods lead to the PER software tool, which helps designers analyze, compare, and explore these properties
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
We empirically evaluate an undervolting technique, i.e., underscaling the
circuit supply voltage below the nominal level, to improve the power-efficiency
of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable
Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing
faults due to excessive circuit latency increase. We evaluate the
reliability-power trade-off for such accelerators. Specifically, we
experimentally study the reduced-voltage operation of multiple components of
real FPGAs, characterize the corresponding reliability behavior of CNN
accelerators, propose techniques to minimize the drawbacks of reduced-voltage
operation, and combine undervolting with architectural CNN optimization
techniques, i.e., quantization and pruning. We investigate the effect of
environmental temperature on the reliability-power trade-off of such
accelerators. We perform experiments on three identical samples of modern
Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification
CNN benchmarks. This approach allows us to study the effects of our
undervolting technique for both software and hardware variability. We achieve
more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain
is the result of eliminating the voltage guardband region, i.e., the safe
voltage region below the nominal level that is set by FPGA vendor to ensure
correct functionality in worst-case environmental and circuit conditions. 43%
of the power-efficiency gain is due to further undervolting below the
guardband, which comes at the cost of accuracy loss in the CNN accelerator. We
evaluate an effective frequency underscaling technique that prevents this
accuracy loss, and find that it reduces the power-efficiency gain from 43% to
25%.Comment: To appear at the DSN 2020 conferenc
A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems
Recent technological advances have greatly improved the performance and
features of embedded systems. With the number of just mobile devices now
reaching nearly equal to the population of earth, embedded systems have truly
become ubiquitous. These trends, however, have also made the task of managing
their power consumption extremely challenging. In recent years, several
techniques have been proposed to address this issue. In this paper, we survey
the techniques for managing power consumption of embedded systems. We discuss
the need of power management and provide a classification of the techniques on
several important parameters to highlight their similarities and differences.
This paper is intended to help the researchers and application-developers in
gaining insights into the working of power management techniques and designing
even more efficient high-performance embedded systems of tomorrow
Performance analysis of multicore processors using multi-scaling techniques
Integrating more cores per chip enables more programs to run simultaneously, and more easily switch from one program to another, and the system performance will be improved significantly. However, this current trend of central processing unit (CPU) performance cannot be maintained since the budget of power per chip has not risen while the consumption of power per core has slowly reduced. Generally, the processor’s maximum performance is proportional to the product of the number of their cores and the frequency they are running at. However, this is usually limited by constraints of power. In this study, first, the voltage/frequency adjustment of the running cores has been analyzed for several programs to improve the processor’s performance within the constraint of power. Second, the impact of dynamically scaling the number of running cores is summarized for additional performance improvements of the active programs and applications. Finally, it has been verified that scaling the number of the running cores and their voltage/frequency simultaneously can improve the processor’s performance for a higher power dissipation or under power constraints. The performance analysis and improvements are obtained in a real-time simulation on a Linux operating system using a GEM5 simulator. Results indicated that performance improvement was attained at 59.98%, 33.33%, and 66.65% for the three scenarios, respectively
Photonic integration enabling new multiplexing concepts in optical board-to-board and rack-to-rack interconnects
New broadband applications are causing the datacenters to proliferate, raising the bar for higher interconnection speeds. So far, optical board-to-board and rack-to-rack interconnects relied primarily on low-cost commodity optical components assembled in a single package. Although this concept proved successful in the first generations of optical-interconnect modules, scalability is a daunting issue as signaling rates extend beyond 25 Gb/s. In this paper we present our work towards the development of two technology platforms for migration beyond Infiniband enhanced data rate (EDR), introducing new concepts in board-to-board and rack-to-rack interconnects.
The first platform is developed in the framework of MIRAGE European project and relies on proven VCSEL technology, exploiting the inherent cost, yield, reliability and power consumption advantages of VCSELs. Wavelength multiplexing, PAM-4 modulation and multi-core fiber (MCF) multiplexing are introduced by combining VCSELs with integrated Si and glass photonics as well as BiCMOS electronics. An in-plane MCF-to-SOI interface is demonstrated, allowing coupling from the MCF cores to 340x400 nm Si waveguides. Development of a low-power VCSEL driver with integrated feed-forward equalizer is reported, allowing PAM-4 modulation of a bandwidth-limited VCSEL beyond 25 Gbaud.
The second platform, developed within the frames of the European project PHOXTROT, considers the use of modulation formats of increased complexity in the context of optical interconnects. Powered by the evolution of DSP technology and towards an integration path between inter and intra datacenter traffic, this platform investigates optical interconnection system concepts capable to support 16QAM 40GBd data traffic, exploiting the advancements of silicon and polymer technologies
Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins
Modern large-scale computing systems (data centers, supercomputers, cloud and
edge setups and high-end cyber-physical systems) employ heterogeneous
architectures that consist of multicore CPUs, general-purpose many-core GPUs,
and programmable FPGAs. The effective utilization of these architectures poses
several challenges, among which a primary one is power consumption. Voltage
reduction is one of the most efficient methods to reduce power consumption of a
chip. With the galloping adoption of hardware accelerators (i.e., GPUs and
FPGAs) in large datacenters and other large-scale computing infrastructures, a
comprehensive evaluation of the safe voltage reduction levels for each
different chip can be employed for efficient reduction of the total power. We
present a survey of recent studies in voltage margins reduction at the system
level for modern CPUs, GPUs and FPGAs. The pessimistic voltage guardbands
inserted by the silicon vendors can be exploited in all devices for significant
power savings. On average, voltage reduction can reach 12% in multicore CPUs,
20% in manycore GPUs and 39% in FPGAs.Comment: Accepted for publication in IEEE Transactions on Device and Materials
Reliabilit
Evaluating Asymmetric Multicore Systems-on-Chip using Iso-Metrics
The end of Dennard scaling has pushed power consumption into a first order
concern for current systems, on par with performance. As a result,
near-threshold voltage computing (NTVC) has been proposed as a potential means
to tackle the limited cooling capacity of CMOS technology. Hardware operating
in NTV consumes significantly less power, at the cost of lower frequency, and
thus reduced performance, as well as increased error rates. In this paper, we
investigate if a low-power systems-on-chip, consisting of ARM's asymmetric
big.LITTLE technology, can be an alternative to conventional high performance
multicore processors in terms of power/energy in an unreliable scenario. For
our study, we use the Conjugate Gradient solver, an algorithm representative of
the computations performed by a large range of scientific and engineering
codes.Comment: Presented at HiPEAC EEHCO '15, 6 page
- …