Search CORE

15 research outputs found

Microarchitectural design-space exploration of an in-order RISC-V processor in a 22nm CMOS technology

Author: Arvind
Doblas Font Max
Moreto Planas Miquel
Sonmez Nehir
Wright Andrew
Publication venue: European Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC)
Publication date: 01/01/2021
Field of study

The purpose of this paper is to explore the trade-offs between IPC and maximum clock frequency in an in-order processor design. This work evaluates the impact on the performance and frequency of different pipeline optimizations. We target ASIC implementation using an advanced synthesis tool-flow with modern technology libraries. As a result, we can analyze the processor’s critical paths in a representative environment. In this paper, we analyze and modify Riscy, an in-order processor, taking into account the consequences of considering the ASIC target for this design. We have achieved a frequency of 1.3GHz and 2.03 CoreMark/MHz in the EEMBC CoreMark.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

AxleDB: A novel programmable query processing platform on FPGA

Author: Arcas-Abella Oriol
Malazgirt Gorker Alp
Salami Behzad
Sonmez Nehir
Yurdakul Arda
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

With the rise of Big Data, providing high-performance query processing capabilities through the acceleration of the database analytic has gained significant attention. Leveraging Field Programmable Gate Array (FPGA) technology, this approach can lead to clear benefits. In this work, we present the design and implementation of AxleDB: An FPGA-based platform that enables fast query processing for database systems by melding novel database-specific accelerators with commercial-off-the-shelf (COTS) storage using modern interfaces, in a novel, unified, and a programmable environment. AxleDB can perform a large subset of SQL queries through its set of instructions that can map compute-intensive database operations, such as filter, arithmetic, aggregate, group by, table join, or sort, on to the specialized high-throughput accelerators. To minimize the amount of SSD I/O operations required, AxleDB also supports hardware MinMax indexing for databases. We evaluated AxleDB with five decision support queries from the TPC-H benchmark suite and achieved a speedup from 1.8X to 34.2X and energy efficiency from 2.8X to 62.1X, in comparison to the state-of-the-art DBMS, i.e., PostgreSQL and MonetDB.The research leading to these results has received funding from the European Union Seventh Framework Program (FP7) (under the AXLE project GA number 318633), the Ministry of Economy and Competitiveness of Spain (under contract number TIN2015-65316-p), Turkish Ministry of Development TAM Project (number 2007K120610), and Bogazici University Scientific Projects (number 7060).Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Accelerating Hash-Based Query Processing Operations on FPGAs by a Hash Table Caching Technique

Author: Arcas-Abella Oriol
Cristal Kestelman Adrián
Salami Behzad
Sonmez Nehir
Unsal Osman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/04/2017
Field of study

Extracting valuable information from the rapidly growing field of Big Data faces serious performance constraints, especially in the software-based database management systems (DBMS). In a query processing system, hash-based computational primitives such as the hash join and the group-by are the most time-consuming operations, as they frequently need to access the hash table on the high-latency off-chip memories and also to traverse whole the table. Subsequently, the hash collision is an inherent issue related to the hash tables, which can adversely degrade the overall performance. In order to alleviate this problem, in this paper, we present a novel pure hardware-based hash engine, implemented on the FPGA. In order to mitigate the high memory access latencies and also to faster resolve the hash collisions, we follow a novel design point. It is based on caching the hash table entries in the fast on-chip Block-RAMs of FPGA. Faster accesses to the correspondent hash table entries from the cache can lead to an improved overall performance. We evaluate the proposed approach by running hash-based table join and group-by operations of 5 TPC-H benchmark queries. The results show 2.9×–4.4× speedups over the cache-less FPGA-based baseline.The research leading to these results has received funding from the European Union’s Seventh Framework Program (FP7/2007-2013), for Advanced Analytics for Extremely Large European Databases (AXLE) project under grant agreement number 318633, and from the Ministry of Economy and Competitiveness of Spain under contract number TIN2015-65316-p.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

From plasma to beefarm: Design experience of an FPGA-based multicore prototype

Author: Arcas Abella Oriol
Cristal Kestelman Adrián
Hur Ibrahim
Sayilar Gokhan
Singh Satnam
Sonmez Nehir
Unsal Osman Sabri
Valero Cortés Mateo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In this paper, we take a MIPS-based open-source uniprocessor soft core, Plasma, and extend it to obtain the Beefarm infrastructure for FPGA-based multiprocessor emulation, a popular research topic of the last few years both in the FPGA and the computer architecture communities. We discuss various design tradeoffs and we demonstrate superior scalability through experimental results compared to traditional software instruction set simulators. Based on our experience of designing and building a complete FPGA-based multiprocessor emulation system that supports run-time and compiler infrastructure and on the actual executions of our experiments running Software Transactional Memory (STM) benchmarks, we comment on the pros, cons and future trends of using hardware-based emulation for research.Peer ReviewedPostprint (author's final draft

CiteSeerX

UPCommons. Portal del coneixement obert de la UPC

Functional verification of a RISC-V vector accelerator

Author: Dominguez de la Rocha Marc
Díaz Ortega Iván
Genovese Roberto Ignacio
Guglielmi Vito Luca
Jiménez Arador Víctor
Moreto Planas Miquel
Palomar Pérez Óscar
Quiroga Esparza Josué Vladimir
Rodriguez Mario
Sans Josep
Sonmez Nehir
Valente Luca
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/12/2022
Field of study

We present the functional verification efforts for an academic RISC-V based vector accelerator, successfully taped-out in the context of the European Processor Initiative. For our novel RISC-V based decoupled vector accelerator, we built a verification infrastructure consisting of a UVM environment, performing step by step co-simulation of all vector instructions, using the Spike instruction set simulator as a reference model. Furthermore, for validating this complex design connected to a scalar core using a custom interface, we provided automated constrained-random test generation, simulation and error reporting, and CI/CD infrastructure. We found 3005 errors during this process and reached 95.79% functional coverage.This research has received funding from the European High Performance Computing Joint Undertaking (JU) under Framework Partnership Agreement No 800928 (European Processor Initiative) and Specific Grant Agreement No 101036168 (EPI SGA2). The JU receives support from the European Union’s Horizon 2020 research and innovation programme and from Croatia, France, Germany, Greece, Italy, Netherlands, Portugal, Spain, Sweden, and Switzerland. The EPI-SGA2 project, PCI2022-132935 is also co-funded by MCIN/AEI /10.13039/501100011033 and by the UE NextGenerationEU/PRTR.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

DVINO: A RISC-V vector processor implemented in 65nm technology

Author: Cabo Pitarch Guillem
Candon Gerard
Carril Xavier
Cristal Kestelman Adrián
Doblas Font Max
Dominguez de la Rocha Marc
Figueras Bagué Roger
Fontova Pau
González Trejo Alberto
Hernández Calderón César Alejandro
Hernández Luz Carles
Jiménez Arador Víctor
Kostalampros Ioannis-Vatistas
Langarita Benítez Rubén
Leyva Santes Neiel Israel
López Paradís Guillem
Marimon Illana Joan
Mendoza Escobar Jonnatan
Minervini Minervini Francesco
Moll Echeto Francisco de Borja
Montabes Víctor
Moreto Planas Miquel
Palomar Pérez Óscar
Pavón Rivera Julián
Ramírez Lazo Cristóbal
Reggiani Enrico
Rodas Narcis
Rodriguez Mario
Rojas Morales Carlos
Rubio Sola Jose Antonio
Ruíz Ramírez Abraham Josafat
Sonmez Nehir
Soria Pardos Victor
Unsal Osman Sabri
Valero Cortés Mateo
Vargas Valdivieso Iván
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

This paper describes the design, verification, implementation and fabrication of the Drac Vector IN-Order (DVINO) processor, a RISC-V vector processor capable of booting Linux jointly developed by BSC, CIC-IPN, IMB-CNM (CSIC), and UPC. The DVINO processor includes an internally developed two-lane vector processor unit as well as a Phase Locked Loop (PLL) and an Analog-to-Digital Converter (ADC). The paper summarizes the design from architectural as well as logic synthesis and physical design in CMOS 65nm technology.The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politecnico Nacional (IPN) from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN).Peer ReviewedArticle signat per 43 autors/es: Guillem Cabo∗, Gerard Candón∗, Xavier Carril∗, Max Doblas∗, Marc Domínguez∗, Alberto González∗, Cesar Hernández†, Víctor Jiménez∗, Vatistas Kostalampros∗, Rubén Langarita∗, Neiel Leyva†, Guillem López-Paradís∗, Jonnatan Mendoza∗, Francesco Minervini∗, Julian Pavón∗, Cristobal Ramírez∗, Narcís Rodas∗, Enrico Reggiani∗, Mario Rodríguez∗, Carlos Rojas∗, Abraham Ruiz∗, Víctor Soria∗, Alejandro Suanes‡, Iván Vargas∗, Roger Figueras∗, Pau Fontova∗, Joan Marimon∗, Víctor Montabes∗, Adrián Cristal∗, Carles Hernández∗, Ricardo Martínez‡, Miquel Moretó∗§, Francesc Moll∗§, Oscar Palomar∗§, Marco A. Ramírez†, Antonio Rubio§, Jordi Sacristán‡, Francesc Serra-Graells‡, Nehir Sonmez∗, Lluís Terés‡, Osman Unsal∗, Mateo Valero∗§, Luís Villa† // ∗Barcelona Supercomputing Center (BSC), Barcelona, Spain. Email: [email protected]; †Centro de Investigación en Computación, Instituto Politécnico Nacional (CIC-IPN), Mexico City, Mexico; ‡ Institut de Microelectronica de Barcelona, IMB-CNM (CSIC), Spain. Email: [email protected]; §Universitat Politecnica de Catalunya (UPC), Barcelona, Spain. Email: [email protected] (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Profiling Transactional Memory applications on an atomic block basis: A Haskell case study

Author: Adrian Cristal
Mateo Valero
Nehir Sonmez
Osman S. Unsal
Tim Harris
Publication venue
Publication date
Field of study

Abstract. In many recent works in Transactional Memory (TM), researchers have profiled TM applications based on execution data using lumped pertransaction averages. This approach both omits meaningful profiling information that can be extracted from the transactional program and hides potentially useful clues that can be used to discover the bottlenecks of the TM system. In this study, we propose partitioning transactional programs into executions of their atomic blocks (AB), observing both the individual properties of these ABs, and their effects on the overall execution on a Software Transactional Memory (STM) benchmark in Haskell. Profiling on the AB-level and focusing on the intra-AB relationships helps to (i) characterize transactional programs with per-AB statistics, (ii) examine the conflict relationships that are caused between the ABs, and thus (iii) to identify and classify the shared data that often cause conflicts. Through experimentation, we show that the AB behavior in most of the Haskell STM benchmark applications is quite heterogeneous, proving the need for such fine-grained, per-AB profiling framework.

CiteSeerX

AxleDB: A novel programmable query processing platform on FPGA

Author: Arcas-Abella Oriol
Malazgirt Gorker Alp
Salami Behzad
Sonmez Nehir
Yurdakul Arda
Publication venue: 'Elsevier BV'
Publication date
Field of study

RECERCAT

Accelerating Hash-Based Query Processing Operations on FPGAs by a Hash Table Caching Technique

Author: Arcas-Abella Oriol
Cristal Kestelman Adrián
Salami Behzad
Sonmez Nehir
Unsal Osman
Publication venue: Springer International Publishing
Publication date
Field of study

RECERCAT

Dissecting transactional executions in Haskell

Author: Adrian Cristal
Cristian Perfumo
Mateo Valero
Nehir Sonmez
Osman S. Unsal
Tim Harris
Publication venue
Publication date
Field of study

In this paper, we present a Haskell Transactional Memory benchmark in order to provide a comprehensive application suite for the use of Software Transactional Memory (STM) researchers. We develop a framework to profile the execution of the benchmark applications and to collect detailed runtime data on their transactional behavior. Using a composite of the collected raw data, we propose new transactional performance metrics. We analyze key statistics relative to critical regions, transactional log-keeping and overall transactional overhead and accordingly draw conclusions on the results of our extensive analysis of the set of applications. The results advance our comprehension on different characteristics of applications under the transactional management of the pure functional programming language, Haskell

CiteSeerX