Search CORE

5 research outputs found

Advances in Architectures and Tools for FPGAs and their Impact on the Design of Complex Systems for Particle Physics

Author: Bhattacharyya Shuvra
Compton Katherine
Farmahini-Farahani Amin
Gregerson Anthony
Plishker William
Schulte Michael
Xie Zaipeng
Publication venue: CERN
Publication date: 01/01/2009
Field of study

The continual improvement of semiconductor technology has provided rapid advancements in device frequency and density. Designers of electronics systems for high-energy physics (HEP) have benefited from these advancements, transitioning many designs from fixed-function ASICs to more flexible FPGA-based platforms. Today’s FPGA devices provide a significantly higher amount of resources than those available during the initial Large Hadron Collider design phase. To take advantage of the capabilities of future FPGAs in the next generation of HEP experiments, designers must not only anticipate further improvements in FPGA hardware, but must also adopt design tools and methodologies that can scale along with that hardware. In this paper, we outline the major trends in FPGA hardware, describe the design challenges these trends will present to developers of HEP electronics, and discuss a range of techniques that can be adopted to overcome these challenges

CERN Document Server

MODULAR DESIGN OF HIGH-THROUGHPUT, LOW-LATENCY SORTING UNITS

Author: Farmahini-Farahani Amin
Publication venue
Publication date: 20/05/2012
Field of study

High-throughput and low-latency sorting is a key requirement in many applications that deal with large amounts of data. Searching and highenergy physics systems require a considerable number of sorting units. The particle detectors in CERN?s Large Hadron Collider require hundreds of fast sorting units. To provide the performance and flexibility needed in high-energy physics experiments, these sorting units are often implemented using high-end FPGA devices. This thesis presents efficient techniques for designing high-throughput, low-latency sorting units. Our sorting architectures utilize modular design techniques that hierarchically construct large sorting units from smaller building blocks. The sorting units are optimized for situations in which only the M largest numbers from N inputs are needed, since this situation commonly occurs in many applications for scientific computing, data mining, network processing, digital signal processing,and high-energy physics. We utilize our proposed techniques to design parameterized, pipelined, and modular sorting units. A detailed analysis of these sorting units indicates that as the number of inputs increases their resource requirements scale linearly, their latencies scale logarithmically, and their frequencies remain almost constant. When synthesized to a 65-nm TSMC technology, a single pipelined 256-to-4 sorting unit with 19 stages can perform more than 2.7 billion sorts per second with a latency of about 7 ns per sort. When implemented on a Virtex-5 FPGA, the same sorting unit can perform roughly 200 million sorts per second with a latency of about 95 ns per sort. We also propose iterative sorting techniques, in which a small sorting unit is used several times to find the largest values

Minds@University of Wisconsin

Scalable Architecture for on-Chip Neural Network Training using Swarm Intelligence

Author: Amin Farmahini-farahani
Saeed Safari
Sied Mehdi Fakhraie
Publication venue
Publication date: 01/01/2008
Field of study

This paper presents a novel architecture for on-chip neural network training using particle swarm optimization (PSO). PSO is an evolutionary optimization algorithm with a growing field of applications which has been recently used to train neural networks. The architecture exploits PSO algorithm to evolve network weights as well as a method called layer partitioning to implement neural networks. In the proposed method, a neural network is partitioned into groups of neurons and the groups are sequentially mapped to available functional units. Thus, the architecture is reconfigurable for training and implementing different multilayer feedforward neural networks without the need for modifying the architecture. The implementation is intended for real-time applications regarding hardware cost and speed. The results show that the proposed system provides a trade-off between resource requirements and speed. 1

CiteSeerX

Crossref

Analog computing using graphene-based metalines

Author: Aieta
Amin Khavasi
Cheng
Farmahini-Farahani
Kamalodin Arik
Kildishev
Lu
Monticone
Monticone
Pors
Rejaei
Sajjad AbdollahRamezani
Silva
Vakil
Wang
Zahra Kavehvash
Publication venue: 'The Optical Society'
Publication date
Field of study

Crossref

The gem5 Simulator: Version 20.0+

Author: Ahmad Abdul,
Akram Ayaz
Ali Syed
Alian Mohammad
Amslinger Rico
Andreozzi Matteo
Armejach Adria
Asmussen Nils
Beckmann Brad
Bharadwaj Srikant
Black Gabe
Bloom Gedare
Bruce Bobby,
Castrillon Jeronimo
Chen Lizhong
Derumigny Nicolas
Diestelhorst Stephan
Elsasser Wendy
Escuin Carlos
Fariborz Marjan
Farmahini-Farahani Amin
Fotouhi Pouya
Gambord Ryan
Gandhi Jayneel
Gope Dibakar
Grass Thomas
Gutierrez Anthony
Hanindhito Bagus
Hansson Andreas
Haria Swapnil
Harris Austin
Hayes Timothy
Herrera Adrian
Horsnell Matthew
Jafri Raza
Jagtap Radhika
Jang Hanhwi
Jeyapaul Reiley
Jones Timothy,
Jung Matthias
Kannoth Subash
Khaleghzadeh Hamidreza
Kodama Yuetsu
Krishna Tushar
Lowe-Power Jason
Marinelli Tommaso
Menard Christian
Mondelli Andrea
Moreto Miquel
Mück Tiago
Naji Omar
Nathella Krishnendra
Nguyen Hoa
Nikoleris Nikos
Olson Lena,
Orr Marc
Pham Binh
Prieto Pablo
Reddy Trivikram
Rodrigues Carvalho Daniel
Roelke Alec
Samani Mahyar
Sandberg Andreas
Setoain Javier
Shingarov Boris
Sinclair Matthew,
Ta Tuan
Thakur Rahul
Travaglini Giacomo
Upton Michael
Vaish Nilay
Vougioukas Ilias
Wang William
Wang Zhengrong
Wehn Norbert
Weis Christian
Wood David,
Yoon Hongil
Zulian Éder,
Publication venue: HAL CCSD
Publication date: 29/09/2020
Field of study

The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm®, and RISC-V. The gem5 simulator has been under active development over the last nine years since the original gem5 release. In this time, there have been over 7000 commits to the codebase from over 250 unique contributors which have improved the simulator by adding new features, fixing bugs, and increasing the code quality. In this paper, we give an overview of gem5's usage and features, describe the current state of the gem5 simulator, and enumerate the major changes since the initial release of gem5. We also discuss how the gem5 simulator has transitioned to a formal governance model to enable continued improvement and community support for the next 20 years of computer architecture research

HAL-CentraleSupelec

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1