Search CORE

4,986 research outputs found

Architecture and Design of Medical Processor Units for Medical Networks

Author: Ahamed Syed V.
Rahman Syed Shawon M.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 13/04/2011
Field of study

This paper introduces analogical and deductive methodologies for the design medical processor units (MPUs). From the study of evolution of numerous earlier processors, we derive the basis for the architecture of MPUs. These specialized processors perform unique medical functions encoded as medical operational codes (mopcs). From a pragmatic perspective, MPUs function very close to CPUs. Both processors have unique operation codes that command the hardware to perform a distinct chain of subprocesses upon operands and generate a specific result unique to the opcode and the operand(s). In medical environments, MPU decodes the mopcs and executes a series of medical sub-processes and sends out secondary commands to the medical machine. Whereas operands in a typical computer system are numerical and logical entities, the operands in medical machine are objects such as such as patients, blood samples, tissues, operating rooms, medical staff, medical bills, patient payments, etc. We follow the functional overlap between the two processes and evolve the design of medical computer systems and networks.Comment: 17 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

MLPerf Inference Benchmark

Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.Comment: ISCA 202

arXiv.org e-Print Archive

Crossref

Evaluating Built-in ECC of FPGA on-chip Memories for the Mitigation of Undervolting Faults

Author: Kestelman Adrian Cristal
Salami Behzad
Unsal Osman S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Voltage underscaling below the nominal level is an effective solution for improving energy efficiency in digital circuits, e.g., Field Programmable Gate Arrays (FPGAs). However, further undervolting below a safe voltage level and without accompanying frequency scaling leads to timing related faults, potentially undermining the energy savings. Through experimental voltage underscaling studies on commercial FPGAs, we observed that the rate of these faults exponentially increases for on-chip memories, or Block RAMs (BRAMs). To mitigate these faults, we evaluated the efficiency of the built-in Error-Correction Code (ECC) and observed that more than 90% of the faults are correctable and further 7% are detectable (but not correctable). This efficiency is the result of the single-bit type of these faults, which are then effectively covered by the Single-Error Correction and Double-Error Detection (SECDED) design of the built-in ECC. Finally, motivated by the above experimental observations, we evaluated an FPGA-based Neural Network (NN) accelerator under low-voltage operations, while built-in ECC is leveraged to mitigate undervolting faults and thus, prevent NN significant accuracy loss. In consequence, we achieve 40% of the BRAM power saving through undervolting below the minimum safe voltage level, with a negligible NN accuracy loss, thanks to the substantial fault coverage by the built-in ECC.Comment: 6 pages, 2 figure

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

The cosmological simulation code GADGET-2

Author: Abel
Appel
Ascasibar
Bagla
Bagla
Balsara
Barnes
Barnes
Bate
Bode
Bode
Bonnell
Boss
Bryan
Burkert
Cen
Cen
Cen
Couchman
Couchman
Cox
Cuadra
Davé
Davé
Dehnen
Di Matteo
Dolag
Dolag
Dolag
Dolag
Dubinski
Dubinski
Duncan
Efstathiou
Evrard
Evrard
Frenk
Fryxell
Fukushige
Gao
Gingold
Gnedin
Hairer
Heitmann
Hernquist
Hernquist
Hernquist
Hernquist
Hernquist
Hockney
Hut
Jenkins
Jenkins
Jernigan
Jubelgas
Kang
Katz
Kay
Klein
Klypin
Knebe
Kravtsov
Kravtsov
Kravtsov
Linder
Lucy
Makino
Makino
Makino
Marri
Monaghan
Monaghan
Monaghan
Monaghan
Motl
Navarro
Navarro
Norman
O'Shea
O'Shea
Owen
Pen
Poludnenko
Power
Quilis
Quinn
Rasio
Refregier
Saha
Salmon
Scannapieco
Serna
Serna
Sommer-Larsen
Springel
Springel
Springel
Springel
Springel
Springel
Springel
Springel
Stadel
Steinmetz
Steinmetz
Teyssier
Tissera
Tormen
Tornatore
Tornatore
Van Den Bosch
Volker Springel
Wadsley
Warren
Warren
White
White
White
Whitehurst
Xu
Yepes
Yoshida
Yoshida
Publication venue: 'Wiley'
Publication date: 01/01/2005
Field of study

We discuss the cosmological simulation code GADGET-2, a new massively parallel TreeSPH code, capable of following a collisionless fluid with the N-body method, and an ideal gas by means of smoothed particle hydrodynamics (SPH). Our implementation of SPH manifestly conserves energy and entropy in regions free of dissipation, while allowing for fully adaptive smoothing lengths. Gravitational forces are computed with a hierarchical multipole expansion, which can optionally be applied in the form of a TreePM algorithm, where only short-range forces are computed with the `tree'-method while long-range forces are determined with Fourier techniques. Time integration is based on a quasi-symplectic scheme where long-range and short-range forces can be integrated with different timesteps. Individual and adaptive short-range timesteps may also be employed. The domain decomposition used in the parallelisation algorithm is based on a space-filling curve, resulting in high flexibility and tree force errors that do not depend on the way the domains are cut. The code is efficient in terms of memory consumption and required communication bandwidth. It has been used to compute the first cosmological N-body simulation with more than 10^10 dark matter particles, reaching a homogeneous spatial dynamic range of 10^5 per dimension in a 3D box. It has also been used to carry out very large cosmological SPH simulations that account for radiative cooling and star formation, reaching total particle numbers of more than 250 million. We present the algorithms used by the code and discuss their accuracy and performance using a number of test problems. GADGET-2 is publicly released to the research community.Comment: submitted to MNRAS, 31 pages, 20 figures (reduced resolution), code available at http://www.mpa-garching.mpg.de/gadge

arXiv.org e-Print Archive