4,986 research outputs found
Architecture and Design of Medical Processor Units for Medical Networks
This paper introduces analogical and deductive methodologies for the design
medical processor units (MPUs). From the study of evolution of numerous earlier
processors, we derive the basis for the architecture of MPUs. These specialized
processors perform unique medical functions encoded as medical operational
codes (mopcs). From a pragmatic perspective, MPUs function very close to CPUs.
Both processors have unique operation codes that command the hardware to
perform a distinct chain of subprocesses upon operands and generate a specific
result unique to the opcode and the operand(s). In medical environments, MPU
decodes the mopcs and executes a series of medical sub-processes and sends out
secondary commands to the medical machine. Whereas operands in a typical
computer system are numerical and logical entities, the operands in medical
machine are objects such as such as patients, blood samples, tissues, operating
rooms, medical staff, medical bills, patient payments, etc. We follow the
functional overlap between the two processes and evolve the design of medical
computer systems and networks.Comment: 17 page
MLPerf Inference Benchmark
Machine-learning (ML) hardware and software system demand is burgeoning.
Driven by ML applications, the number of different ML inference systems has
exploded. Over 100 organizations are building ML inference chips, and the
systems that incorporate existing models span at least three orders of
magnitude in power consumption and five orders of magnitude in performance;
they range from embedded devices to data-center solutions. Fueling the hardware
are a dozen or more software frameworks and libraries. The myriad combinations
of ML hardware and ML software make assessing ML-system performance in an
architecture-neutral, representative, and reproducible manner challenging.
There is a clear need for industry-wide standard ML benchmarking and evaluation
criteria. MLPerf Inference answers that call. In this paper, we present our
benchmarking method for evaluating ML inference systems. Driven by more than 30
organizations as well as more than 200 ML engineers and practitioners, MLPerf
prescribes a set of rules and best practices to ensure comparability across
systems with wildly differing architectures. The first call for submissions
garnered more than 600 reproducible inference-performance measurements from 14
organizations, representing over 30 systems that showcase a wide range of
capabilities. The submissions attest to the benchmark's flexibility and
adaptability.Comment: ISCA 202
Evaluating Built-in ECC of FPGA on-chip Memories for the Mitigation of Undervolting Faults
Voltage underscaling below the nominal level is an effective solution for
improving energy efficiency in digital circuits, e.g., Field Programmable Gate
Arrays (FPGAs). However, further undervolting below a safe voltage level and
without accompanying frequency scaling leads to timing related faults,
potentially undermining the energy savings. Through experimental voltage
underscaling studies on commercial FPGAs, we observed that the rate of these
faults exponentially increases for on-chip memories, or Block RAMs (BRAMs). To
mitigate these faults, we evaluated the efficiency of the built-in
Error-Correction Code (ECC) and observed that more than 90% of the faults are
correctable and further 7% are detectable (but not correctable). This
efficiency is the result of the single-bit type of these faults, which are then
effectively covered by the Single-Error Correction and Double-Error Detection
(SECDED) design of the built-in ECC. Finally, motivated by the above
experimental observations, we evaluated an FPGA-based Neural Network (NN)
accelerator under low-voltage operations, while built-in ECC is leveraged to
mitigate undervolting faults and thus, prevent NN significant accuracy loss. In
consequence, we achieve 40% of the BRAM power saving through undervolting below
the minimum safe voltage level, with a negligible NN accuracy loss, thanks to
the substantial fault coverage by the built-in ECC.Comment: 6 pages, 2 figure
The cosmological simulation code GADGET-2
We discuss the cosmological simulation code GADGET-2, a new massively
parallel TreeSPH code, capable of following a collisionless fluid with the
N-body method, and an ideal gas by means of smoothed particle hydrodynamics
(SPH). Our implementation of SPH manifestly conserves energy and entropy in
regions free of dissipation, while allowing for fully adaptive smoothing
lengths. Gravitational forces are computed with a hierarchical multipole
expansion, which can optionally be applied in the form of a TreePM algorithm,
where only short-range forces are computed with the `tree'-method while
long-range forces are determined with Fourier techniques. Time integration is
based on a quasi-symplectic scheme where long-range and short-range forces can
be integrated with different timesteps. Individual and adaptive short-range
timesteps may also be employed. The domain decomposition used in the
parallelisation algorithm is based on a space-filling curve, resulting in high
flexibility and tree force errors that do not depend on the way the domains are
cut. The code is efficient in terms of memory consumption and required
communication bandwidth. It has been used to compute the first cosmological
N-body simulation with more than 10^10 dark matter particles, reaching a
homogeneous spatial dynamic range of 10^5 per dimension in a 3D box. It has
also been used to carry out very large cosmological SPH simulations that
account for radiative cooling and star formation, reaching total particle
numbers of more than 250 million. We present the algorithms used by the code
and discuss their accuracy and performance using a number of test problems.
GADGET-2 is publicly released to the research community.Comment: submitted to MNRAS, 31 pages, 20 figures (reduced resolution), code
available at http://www.mpa-garching.mpg.de/gadge
- …