Search CORE

2,843 research outputs found

Terrestrial Cosmic Ray Induced Soft Errors and Large-Scale FPGA Systems in the Cloud

Author: Keller Andrew M.
Wirthlin Michael J.
Publication venue: DigitalCommons@USU
Publication date: 06/05/2019
Field of study

Radiation from outer space can cause soft errors in microelectronic devices deployed at terrestrial altitudes on Earth. Cosmic rays entering the Earth’s atmosphere create a complex cascade of radioactive particles. The most likely form of cosmic radiation to cause soft errors in microelectronics at terrestrial levels are neutrons. SRAM-based FPGAs are susceptible to terrestrial cosmic ray induced soft errors. These soft errors occur infrequently for a single device deployed at terrestrial altitudes. When many FPGAs are deployed in a large-scale system, the impact of these soft errors on reliability can be significant. This study examines terrestrial cosmic ray induced soft errors and the effects they can have on large-scale deployment of FPGAs in cloud computing. Fifteen data-center-like designs were tested for sensitivity through fault injecting. Sensitivities ranged from less than 1% to about 12% of randomly injected faults resulting in unacceptable behavior. A hypothetical but realistic large-scale FPGA system, with 100,000 node deployed at a high-altitude, running the most sensitive design would experience the dominant failure mode of silent data corruption every 3.8 hours on average. This system would only be able to retain reliability level above 0.99 for about two minutes. Some soft error detection and recover approaches are discussed

DigitalCommons@USU

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

Author: Ergin Oguz
Kestelman Adrian Cristal
Koc Fahrettin
Mutlu Onur
Onural Erhan Baturay
Salami Behzad
Sarbazi-Azad Hamid
Unsal Osman S.
Yuksel Ismail Emir
Publication venue
Publication date: 01/01/2020
Field of study

We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

TOBB ETÜ Institutional Repository

Towards Quantum Belief Propagation for LDPC Decoding in Wireless Networks

Author: Berrou C.
Chilappagari Shashi Kiran
Matsubara S.
Orlitsky A.
Sun Y.
Tran Tony T
Zyablov Victor Vasilievich
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/09/2020
Field of study

We present Quantum Belief Propagation (QBP), a Quantum Annealing (QA) based decoder design for Low Density Parity Check (LDPC) error control codes, which have found many useful applications in Wi-Fi, satellite communications, mobile cellular systems, and data storage systems. QBP reduces the LDPC decoding to a discrete optimization problem, then embeds that reduced design onto quantum annealing hardware. QBP's embedding design can support LDPC codes of block length up to 420 bits on real state-of-the-art QA hardware with 2,048 qubits. We evaluate performance on real quantum annealer hardware, performing sensitivity analyses on a variety of parameter settings. Our design achieves a bit error rate of

10^{-8}

in 20

\mu

s and a 1,500 byte frame error rate of

10^{-6}

in 50

\mu

s at SNR 9 dB over a Gaussian noise wireless channel. Further experiments measure performance over real-world wireless channels, requiring 30

\mu

s to achieve a 1,500 byte 99.99

\%

frame delivery rate at SNR 15-20 dB. QBP achieves a performance improvement over an FPGA based soft belief propagation LDPC decoder, by reaching a bit error rate of

10^{-8}

and a frame error rate of

10^{-6}

at an SNR 2.5--3.5 dB lower. In terms of limitations, QBP currently cannot realize practical protocol-sized (

\textit{e.g.,}

Wi-Fi, WiMax) LDPC codes on current QA processors. Our further studies in this work present future cost, throughput, and QA hardware trend considerations

arXiv.org e-Print Archive

Crossref

Belle II Technical Design Report

Author: Abe T.
Adachi I.
Adamczyk K.
Ahn S.
Aihara H.
Akai K.
Aloi M.
Andricek L.
Aoki K.
Arai Y.
Arefiev A.
Arinstein K.
Arita Y.
Asner D. M.
Aulchenko V.
Aushev T.
Aziz T.
Bakich A. M.
Balagura V.
Ban Y.
Barberio E.
Barvich T.
Belous K.
Bergauer T.
Bhardwaj V.
Bhuyan B.
Blyth S.
Bondar A.
Bonvicini G.
Bozek A.
Bracko M.
Brodzicka J.
Brovchenko O.
Browder T. E.
Cao G.
Chang M. -C.
Chang P.
Chao Y.
Chekelian V.
Chen A.
Chen K. -F.
Chen P.
Cheon B. G.
Chiang C. -C.
Chistov R.
Cho K.
Choi S. -K.
Chung K.
Comerma A.
Cooney M.
Cowley D. E.
Critchlow T.
Cueto A. Gaspar de Valenzuela
Dalseno J.
Danilov M.
Dieguez A.
Dierlamm A.
Dillon M.
Dingfelder J.
Dolenec R.
Dolezal Z.
Drasal Z.
Drutskoy A.
Dungel W.
Dutta D.
Eidelman S.
Enomoto A.
Epifanov D.
Esen S.
Fast J. E.
Feindt M.
Fifield T.
Fischer P.
Flanagan J.
Fourletov S.
Fourletova J.
Freixas L.
Frey A.
Friedl M.
Fruehwirth R.
Fujii H.
Fujikawa M.
Fukuma Y.
Funakoshi Y.
Furukawa K.
Fuster J.
Gabyshev N.
Garcia M. Fernandez
Garmash A.
Garrido L.
Geisler Ch.
Gfall I.
Goh Y. M.
Golob B.
Gorton I.
Grzymkowski R.
Guo H.
Ha H.
Haba J.
Hara K.
Hara T.
Haruyama T.
Hayasaka K.
Hayashi K.
Hayashii H.
Heck M.
Heindl S.
Heller C.
Hemperek T.
Higuchi T.
Horii Y.
Hou W. -S.
Hsiung Y. B.
Huang C. -H.
Hwang S.
Hyun H. J.
Igarashi Y.
Iglesias C.
Iida Y.
Iijima T.
Imamura M.
Inami K.
Irmler C.
Ishizuka M.
Itagaki K.
Itoh R.
Iwabuchi M.
Iwai G.
Iwai M.
Iwasaki M.
Iwasaki M.
Iwasaki Y.
Iwashita T.
Iwata S.
Jang H.
Ji X.
Jinno T.
Jones M.
Julius T.
Kageyama T.
Kah D. H.
Kakuno H.
Kamitani T.
Kanazawa K.
Kapusta P.
Kataoka S. U.
Katayama N.
Kawai M.
Kawai Y.
Kawasaki T.
Kennedy J.
Kichimi H.
Kiesling C.
Kikuchi M.
Kim B. K.
Kim G. N.
Kim H. J.
Kim H. O.
Kim J. -B.
Kim J. H.
Kim K. T.
Kim M. J.
Kim S. K.
Kim T. Y.
Kinoshita K.
Kishi K.
Kisielewski B.
Knopf J.
Ko B. R.
Koch M.
Kodys P.
Koffmane C.
Koga Y.
Kohriki T.
Koike S.
Koiso H.
Kondo Y.
Korpar S.
Kouzes R. T.
Kreidl Ch.
Kreps M.
Krizan P.
Krokovny P.
Krueger H.
Kruth A.
Kuhn W.
Kuhr T.
Kumar R.
Kumita T.
Kupper S.
Kuzmin A.
Kvasnicka P.
Kwon Y. -J.
Lacasta C.
Lange J. S.
Lee I. -S.
Lee M. J.
Lee M. W.
Lee S. -H.
Lemarenko M.
Li J.
Li W. D.
Li Y.
Libby J.
Limosani A.
Liu C.
Liu H.
Liu Y.
Liu Z.
Liventsev D.
Makida Y.
Mao Z. P.
Marinas C.
Martin D. Moya
Masuzawa M.
Matvienko D.
Mitaroff W.
Miyabayashi K.
Miyata H.
Miyazaki Y.
Miyoshi T.
Mizuk R.
Mohanty G. B.
Mohapatra D.
Moll A.
Mori T.
Morita A.
Morita Y.
Moser H. -G.
Mueller T.
Muenchow D.
Murakami J.
Myung S. S.
Nagamine T.
Nakamura I.
Nakamura T. T.
Nakano E.
Nakano H.
Nakao M.
Nakazawa H.
Nam S. -H.
Natkaniec Z.
Nedelkovska E.
Negishi K.
Neubauer S.
Ng C.
Ninkovic J.
Nishida S.
Nishimura K.
Novikov E.
Nozaki T.
Ogawa S.
Ohmi K.
Ohnishi Y.
Ohshima T.
Ohuchi N.
Oide K.
Olsen S. L.
Ono M.
Ono Y.
Onuki Y.
Ostrowicz W.
Ozaki H.
Pakhlov P.
Pakhlova G.
Palka H.
Park H.
Park H. K.
Peak L. S.
Peng T.
Peric I.
Pernicka M.
Pestotnik R.
Petric M.
Piilonen L. E.
Poluektov A.
Prim M.
Prothmann K.
Regimbal K.
Reisert B.
Richter R. H.
Riera-Babures J.
Ritter A.
Ritter A.
Ritter M.
Roehrken M.
Rorie J.
Rosen M.
Rozanska M.
Ruckman L.
Rummel S.
Rusinov V.
Russell R. M.
Ryu S.
Sahoo H.
Sakai K.
Sakai Y.
Santelj L.
Sasaki T.
Sato N.
Sato Y.
Scheirich J.
Schieck J.
Schwanda C.
Schwartz A. J.
Schwenker B.
Seljak A.
Senyo K.
Seon O. -S.
Sevior M. E.
Shapkin M.
Shebalin V.
Shen C. P.
Shibuya H.
Shiizuka S.
Shiu J. -G.
Shwartz B.
Simon F.
Simonis H. J.
Singh J. B.
Sinha R.
Sitarz M.
Smerkol P.
Sokolov A.
Solovieva E.
Stanic S.
Staric M.
Stypula J.
Suetsugu Y.
Sugihara S.
Sugimura T.
Sumisawa K.
Sumiyoshi T.
Suzuki K.
Suzuki S. Y.
Takagaki H.
Takasaki F.
Takeichi H.
Takubo Y.
Tanaka M.
Tanaka S.
Taniguchi N.
Tarkovsky E.
Tatishvili G.
Tawada M.
Taylor G. N.
Teramoto Y.
Tikhomirov I.
Trabelsi K.
Tsuboyama T.
Tsunada K.
Tu Y. -C.
Uchida T.
Uehara S.
Ueno K.
Uglov T.
Unno Y.
Uno S.
Urquijo P.
Ushiroda Y.
Usov Y.
Vahsen S.
Valentan M.
van Dam K. Kleese
Vanhoefer P.
Varner G.
Varvell K. E.
Vazquez P.
Vila I.
Vilella E.
Vinokurova A.
Virto A. Lopez
Visniakov J.
Vos M.
Wang C. H.
Wang J.
Wang M. -Z.
Wang P.
Wassatch A.
Watanabe M.
Watase Y.
Weiler T.
Wermes N.
Wescott R. E.
White E.
Wicht J.
Widhalm L.
Williams K. M.
Won E.
Xu H.
Yabsley B. D.
Yamamoto H.
Yamaoka H.
Yamaoka Y.
Yamauchi M.
Yin Y.
Yoon H.
Yu J.
Yuan C. Z.
Yusa Y.
Zander D.
Zdybal M.
Zhang Z. P.
Zhao J.
Zhao L.
Zhao Z.
Zhilich V.
Zhou P.
Zhulanov V.
Zivko T.
Zupanc A.
Zyukova O.
Publication venue
Publication date: 01/01/2010
Field of study

The Belle detector at the KEKB electron-positron collider has collected almost 1 billion Y(4S) events in its decade of operation. Super-KEKB, an upgrade of KEKB is under construction, to increase the luminosity by two orders of magnitude during a three-year shutdown, with an ultimate goal of 8E35 /cm^2 /s luminosity. To exploit the increased luminosity, an upgrade of the Belle detector has been proposed. A new international collaboration Belle-II, is being formed. The Technical Design Report presents physics motivation, basic methods of the accelerator upgrade, as well as key improvements of the detector.Comment: Edited by: Z. Dole\v{z}al and S. Un

arXiv.org e-Print Archive

DESY Publication Database

DESY

VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing

Author: Ardakani Arash
Gross Warren J.
Hanyu Takahiro
Leduc-Primeau François
Onizawa Naoya
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a hardware implementation. However, numerous elements and complex interconnections are usually required, leading to a large area occupation and copious power consumption. Stochastic computing has shown promising results for low-power area-efficient hardware implementations, even though existing stochastic algorithms require long streams that cause long latencies. In this paper, we propose an integer form of stochastic computation and introduce some elementary circuits. We then propose an efficient implementation of a DNN based on integral stochastic computing. The proposed architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62% average reductions in area and latency compared to the best reported architecture in literature. We also synthesize the circuits in a 65 nm CMOS technology and we show that the proposed integral stochastic architecture results in up to 21% reduction in energy consumption compared to the binary radix implementation at the same misclassification rate. Due to fault-tolerant nature of stochastic architectures, we also consider a quasi-synchronous implementation which yields 33% reduction in energy consumption w.r.t. the binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure

arXiv.org e-Print Archive

Crossref

HAL-Université de Bretagne Occidentale

HAL Descartes

PolyPublie

Hal-Diderot

New Design Techniques for Dynamic Reconfigurable Architectures

Author: BOZZOLI LUDOVICA
Publication venue: country:Italy
Publication date: 28/09/2021
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Single Event Effects Assessment of UltraScale+ MPSoC Systems under Atmospheric Radiation

Author: Agiakatsikas Dimitris
Cazzaniga Carlo
Foutris Nikos
Frost Chris
Goodacre John
Kastrioto Maria
Lujan Mikel
Psarakis Mihalis
Sari Aitzan
Souvatzoglou Ioanna
Vlagkoulis Vasileios
Ye Ruiqi
Publication venue
Publication date: 21/02/2023
Field of study

The AMD UltraScale+ XCZU9EG device is a Multi-Processor System-on-Chip (MPSoC) with embedded Programmable Logic (PL) that excels in many Edge (e.g., automotive or avionics) and Cloud (e.g., data centres) terrestrial applications. However, it incorporates a large amount of SRAM cells, making the device vulnerable to Neutron-induced Single Event Upsets (NSEUs) or otherwise soft errors. Semiconductor vendors incorporate soft error mitigation mechanisms to recover memory upsets (i.e., faults) before they propagate to the application output and become an error. But how effective are the MPSoC's mitigation schemes? Can they effectively recover upsets in high altitude or large scale applications under different workloads? This article answers the above research questions through a solid study that entails accelerated neutron radiation testing and dependability analysis. We test the device on a broad range of workloads, like multi-threaded software used for pose estimation and weather prediction or a software/hardware (SW/HW) co-design image classification application running on the AMD Deep Learning Processing Unit (DPU). Assuming a one-node MPSoC system in New York City (NYC) at 40k feet, all tested software applications achieve a Mean Time To Failure (MTTF) greater than 148 months, which shows that upsets are effectively recovered in the processing system of the MPSoC. However, the SW/HW co-design (i.e., DPU) in the same one-node system at 40k feet has an MTTF = 4 months due to the high failure rate of its PL accelerator, which emphasises that some MPSoC workloads may require additional NSEU mitigation schemes. Nevertheless, we show that the MTTF of the DPU can increase to 87 months without any overhead if one disregards the failure rate of tolerable errors since they do not affect the correctness of the classification output.Comment: This manuscript is under review at IEEE Transactions on Reliabilit

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

METICULOUS: An FPGA-based Main Memory Emulator for System Software Studies

Author: Ahmed Akram Ben
Fukai Takaaki
Hirofuchi Takahiro
Sato Kento
Takano Ryousei
Publication venue
Publication date: 07/09/2023
Field of study

Due to the scaling problem of the DRAM technology, non-volatile memory devices, which are based on different principle of operation than DRAM, are now being intensively developed to expand the main memory of computers. Disaggregated memory is also drawing attention as an emerging technology to scale up the main memory. Although system software studies need to discuss management mechanisms for the new main memory designs incorporating such emerging memory systems, there are no feasible memory emulation mechanisms that efficiently work for large-scale, privileged programs such as operating systems and hypervisors. In this paper, we propose an FPGA-based main memory emulator for system software studies on new main memory systems. It can emulate the main memory incorporating multiple memory regions with different performance characteristics. For the address region of each memory device, it emulates the latencies, bandwidths and bit-flip error rates of read/write operations, respectively. The emulator is implemented at the hardware module of an off-the-self FPGA System-on-Chip board. Any privileged/unprivileged software programs running on its powerful 64-bit CPU cores can access emulated main memory devices at a practical speed through the exactly same interface as normal DRAM main memory. We confirmed that the emulator transparently worked for CPU cores and successfully changed the performance of a memory region according to given emulation parameters; for example, the latencies measured by CPU cores were exactly proportional to the latencies inserted by the emulator, involving the minimum overhead of approximately 240 ns. As a preliminary use case, we confirmed that the emulator allows us to change the bandwidth limit and the inserted latency individually for unmodified software programs, making discussions on latency sensitivity much easier

arXiv.org e-Print Archive