Search CORE

47 research outputs found

Efficient synthesis of graph methods: a dynamically scheduled architecture

Author: CASTELLANA VITO GIOVANNI
FERRANDI FABRIZIO
LATTUADA MARCO
Minutoli Marco
TUMEO ANTONINO
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Enabling the high level synthesis of data analytics accelerators

Author: Castellana Vito Giovanni
Ferrandi Fabrizio
Lattuada Marco
Minutoli Marco
Tumeo Antonino
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

GMT: Enabling easy development and efficient execution of irregular applications on commodity clusters

Author: Chavarria Miranda Daniel
Morari Alessandro
Tumeo Antonino
Valero Cortés Mateo
Villa Oreste
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

In this poster we introduce GMT (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT only requires a cluster with x86 nodes supporting MPI. GMT integrates the Partititioned Global Address Space (PGAS) locality-aware global data model with a fork/join control model common in single node multithreaded environments. GMT supports lightweight software multithreading to tolerate latencies for accessing data on remote nodes, and is built around data aggregation to maximize network bandwidth utilization.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

High level synthesis of RDF queries for graph analytics

Author: Castellana Vito Giovanni
Ferrandi Fabrizio
Lattuada Marco
Minutoli Marco
Morari Alessandro
Tumeo Antonino
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

In this paper we present a set of techniques that enable the synthesis of efficient custom accelerators for memory intensive, irregular applications. To address the challenges of irregular applications (large memory footprint, unpredictable fine-grained data accesses, and high synchronization intensity), and exploit their opportunities (thread level parallelism, memory level parallelism), we propose a novel accelerator design that employs an adaptive and Distributed Controller (DC) architecture, and a Memory Interface Controller (MIC) that supports concurrent and atomic memory operations on a multi-ported/multi-banked shared memory. Among the multitude of algorithms that may benefit from our solution, we focus on the acceleration of graph analytics applications and, in particular, on the synthesis of SPARQL queries on Resource Description Framework (RDF) databases. We achieve this objective by incorporating the synthesis techniques into Bambu, an Open Source high-level synthesis tools, and interfacing it with GEMS, the Graph database Engine for Multithreaded Systems. The GEMS' front-end generates optimized C implementations of the input queries, modeled as graph pattern matching algorithms, which are then automatically synthesized by Bambu. We validate our approach by synthesizing several SPARQL queries from the Lehigh University Benchmark (LUBM)

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Fused Breadth-First Probabilistic Traversals on Distributed GPU Systems

Author: Becchi Michela
Halappanavar Mahantesh
Kalyanaraman Ananth
Minutoli Marco
Neff Reece
Tumeo Antonino
Zarch Mostafa Eghbali
Publication venue
Publication date: 16/11/2023
Field of study

Probabilistic breadth-first traversals (BPTs) are used in many network science and graph machine learning applications. In this paper, we are motivated by the application of BPTs in stochastic diffusion-based graph problems such as influence maximization. These applications heavily rely on BPTs to implement a Monte-Carlo sampling step for their approximations. Given the large sampling complexity, stochasticity of the diffusion process, and the inherent irregularity in real-world graph topologies, efficiently parallelizing these BPTs remains significantly challenging. In this paper, we present a new algorithm to fuse massive number of concurrently executing BPTs with random starts on the input graph. Our algorithm is designed to fuse BPTs by combining separate traversals into a unified frontier on distributed multi-GPU systems. To show the general applicability of the fused BPT technique, we have incorporated it into two state-of-the-art influence maximization parallel implementations (gIM and Ripples). Our experiments on up to 4K nodes of the OLCF Frontier supercomputer (

32,768

GPUs and

196

K CPU cores) show strong scaling behavior, and that fused BPTs can improve the performance of these implementations up to 34

\times

(for gIM) and ~360

\times

(for Ripples).Comment: 12 pages, 11 figure

arXiv.org e-Print Archive

The SODA approach: leveraging high-level synthesis for hardware/software co-design and hardware specialization: invited

Author: Agostini Nicolas Bohm
Amatya Vinay
Castellana Vito Giovanni
Curzel Serena
Ferrandi Fabrizio
Limaye Ankur
Manzano Joseph
Minutoli Marco
Tumeo Antonino
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

The Future is Big Graphs! A Community View on Graph Processing Systems

Author: Ammar Khaled
Angles Renzo
Aref Walid
Arenas Marcelo
Besta Maciej
Boncz Peter A.
Bonifati Angela
Daudjee Khuzaima
Della Valle Emanuele
Dumbrava Stefania
Hartig Olaf
Haslhofer Bernhard
Hegeman Tim
Hidders Jan
Hose Katja
Iamnitchi Adriana
Iosup Alexandru
Kalavri Vasiliki
Kapp Hugo
Martens Wim
Peukert Eric
Plantikow Stefan
Ragab Mohamed
Ripeanu Matei R.
Sakr Sherif
Salihoglu Semih
Schulz Christian
Selmer Petra
Sequeda Juan F.
Shinavier Joshua
Szárnyas Gábor
Tommasini Riccardo
Tumeo Antonino
Uta Alexandru
Varbanescu Ana Lucia
Voigt Hannes
Wu Hsiang-Yun
Yakovets Nikolay
Yan Da
Yoneki Eiko
Özsu M. Tamer
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2020
Field of study

Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue to succeed?Comment: 12 pages, 3 figures, collaboration between the large-scale systems and data management communities, work started at the Dagstuhl Seminar 19491 on Big Graph Processing Systems, to be published in the Communications of the AC

arXiv.org e-Print Archive

Repository for Publications and Research Data

Archivio istituzionale della ricerca - Politecnico di Milano

VU Research Portal

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL

VBN

Birkbeck Institutional Research Online

Hal-Diderot

Hardware Acceleration of Complex Machine Learning Models through Modern High-Level Synthesis

Author: Antonino Tumeo
Fabrizio Ferrandi
Serena Curzel
Publication venue
Publication date: 01/01/2021
Field of study

Machine learning algorithms continue to receive significant attention from industry and research. As the models increase in complexity and accuracy, their computational and memory demands also grow, pushing for more powerful, heterogeneous architectures; custom FPGA/ASIC accelerators are often the best solution to efficiently process large amounts of data close to the sensors in large-scale scientific experiments. Previous works exploited high-level synthesis to help design dedicated compute units for machine learning inference, proposing frameworks that translate high-level models into annotated C/C++. Our proposal, instead, integrates HLS in a compiler-based tool flow with multiple levels of abstraction, enabling analysis, optimization and design space exploration along the whole process. Such an approach will also allow to explore models beyond multi-layer perceptrons and convolutional neural networks (which are often the main target of "classic" HLS frameworks), for example to address the different challenges posed by sparse and graph-based neural networks

Archivio istituzionale della ricerca - Politecnico di Milano

Exploring efficient hardware support for applications with irregular memory patterns on multinode manycore architectures

Author: CERIANI MARCO
PALERMO GIANLUCA
Secchi Simone
Tumeo Antonino
Villa Oreste
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming available for analysis. Often the data collected have complex, graph based structures, which makes them difficult to process with traditional tools. Moreover, the irregularities in the data sets, and in the analysis algorithms, hamper the scaling of performance in large distributed highperformance systems, optimized for locality exploitation and regular data structures. In this paper we present an approach to system design that enable efficient execution of applications with irregular memory patterns on a distribute, many-core architecture, based on off-the-shelf cores. We introduce a set of hardware and software components, which provide a distributed global address space, fine-grained synchronization and transparently hide the latencies of remote accesses with multithreading. An FPGA prototype has been implemented to explore the design with a set of typical irregular kernels. We finally present an analytical model that highlights the benefits of the approach and help identifying the bottlenecks in the prototypes. The experimental evaluation on graph based applications demonstrates the scalability of the architecture for different configurations of the whole system

Archivio istituzionale della ricerca - Politecnico di Milano