40,768 research outputs found
Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add
The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and âreal-worldâ application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using âreal-worldâ benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P.
The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft
Architecture and Design of Medical Processor Units for Medical Networks
This paper introduces analogical and deductive methodologies for the design
medical processor units (MPUs). From the study of evolution of numerous earlier
processors, we derive the basis for the architecture of MPUs. These specialized
processors perform unique medical functions encoded as medical operational
codes (mopcs). From a pragmatic perspective, MPUs function very close to CPUs.
Both processors have unique operation codes that command the hardware to
perform a distinct chain of subprocesses upon operands and generate a specific
result unique to the opcode and the operand(s). In medical environments, MPU
decodes the mopcs and executes a series of medical sub-processes and sends out
secondary commands to the medical machine. Whereas operands in a typical
computer system are numerical and logical entities, the operands in medical
machine are objects such as such as patients, blood samples, tissues, operating
rooms, medical staff, medical bills, patient payments, etc. We follow the
functional overlap between the two processes and evolve the design of medical
computer systems and networks.Comment: 17 page
Measuring Improvement when Using HUB Formats to Implement Floating-Point Systems under Round-to-Nearest
MEC bajo TIN2013-42253-PThis paper analyzes the benefits of using HUB
formats to implement floating-point arithmetic under round-tonearest
mode from a quantitative point of view. Using HUB
formats to represent numbers allows the removal of the rounding
logic of arithmetic units, including sticky-bit computation. This
is shown for floating-point adders, multipliers, and converters.
Experimental analysis demonstrates that HUB formats and the
corresponding arithmetic units maintain the same accuracy as
conventional ones. On the other hand, the implementation of
these units, based on basic architectures, shows that HUB formats
simultaneously improve area, speed, and power consumption.
Specifically, based on data obtained from the synthesis, a HUB
single-precision adder is about 14% faster but consumes 38% less
area and 26% less power than the conventional adder. Similarly, a
HUB single-precision multiplier is 17% faster, uses 22% less area,
and consumes slightly less power than conventional multiplier. At
the same speed, the adder and multiplier achieve area and power
reductions of up to 50% and 40%, respectively
Configurable 3D-integrated focal-plane sensor-processor array architecture
A mixed-signal Cellular Visual Microprocessor architecture with digital processors is
described. An ASIC implementation is also demonstrated. The architecture is composed of a
regular sensor readout circuit array, prepared for 3D face-to-face type integration, and one or
several cascaded array of mainly identical (SIMD) processing elements. The individual array
elements derived from the same general HDL description and could be of different in size, aspect
ratio, and computing resources
Self-testing and repairing computer Patent
Self testing and repairing computer comprising control and diagnostic unit and rollback points for error correctio
DFT and BIST of a multichip module for high-energy physics experiments
Engineers at Politecnico di Torino designed a multichip module for high-energy physics experiments conducted on the Large Hadron Collider. An array of these MCMs handles multichannel data acquisition and signal processing. Testing the MCM from board to die level required a combination of DFT strategie
Adder Based Residue to Binary Number Converters for (2n - 1; 2n; 2n + 1)
Copyright © 2002 IEEEBased on an algorithm derived from the new Chinese remainder theorem I, we present three new residue-to-binary converters for the residue number system (2n-1, 2n, 2n+1) designed using 2n-bit or n-bit adders with improvements on speed, area, or dynamic range compared with various previous converters. The 2n-bit adder based converter is faster and requires about half the hardware required by previous methods. For n-bit adder-based implementations, one new converter is twice as fast as the previous method using a similar amount of hardware, whereas another new converter achieves improvement in either speed, area, or dynamic range compared with previous convertersYuke Wang, Xiaoyu Song, Mostapha Aboulhamid and Hong She
- âŠ