Search CORE

23,517 research outputs found

Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory

Author: Chen Feng
Li Dong
Liu Jialin
Liu Wei
Wu Kai
Publication venue
Publication date: 09/05/2017
Field of study

HPC applications pose high demands on I/O performance and storage capability. The emerging non-volatile memory (NVM) techniques offer low-latency, high bandwidth, and persistence for HPC applications. However, the existing I/O stack are designed and optimized based on an assumption of disk-based storage. To effectively use NVM, we must re-examine the existing high performance computing (HPC) I/O sub-system to properly integrate NVM into it. Using NVM as a fast storage, the previous assumption on the inferior performance of storage (e.g., hard drive) is not valid any more. The performance problem caused by slow storage may be mitigated; the existing mechanisms to narrow the performance gap between storage and CPU may be unnecessary and result in large overhead. Thus fully understanding the impact of introducing NVM into the HPC software stack demands a thorough performance study. In this paper, we analyze and model the performance of I/O intensive HPC applications with NVM as a block device. We study the performance from three perspectives: (1) the impact of NVM on the performance of traditional page cache; (2) a performance comparison between MPI individual I/O and POSIX I/O; and (3) the impact of NVM on the performance of collective I/O. We reveal the diminishing effects of page cache, minor performance difference between MPI individual I/O and POSIX I/O, and performance disadvantage of collective I/O on NVM due to unnecessary data shuffling. We also model the performance of MPI collective I/O and study the complex interaction between data shuffling, storage performance, and I/O access patterns.Comment: 10 page

arXiv.org e-Print Archive

Crossref

Performance evaluation of an open distributed platform for realistic traffic generation

Author: AVALLONE STEFANO
D. Emma
PESCAPE' ANTONIO
VENTRE GIORGIO
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Network researchers have dedicated a notable part of their efforts to the area of modeling traffic and to the implementation of efficient traffic generators. We feel that there is a strong demand for traffic generators capable to reproduce realistic traffic patterns according to theoretical models and at the same time with high performance. This work presents an open distributed platform for traffic generation that we called distributed internet traffic generator (D-ITG), capable of producing traffic (network, transport and application layer) at packet level and of accurately replicating appropriate stochastic processes for both inter departure time (IDT) and packet size (PS) random variables. We implemented two different versions of our distributed generator. In the first one, a log server is in charge of recording the information transmitted by senders and receivers and these communications are based either on TCP or UDP. In the other one, senders and receivers make use of the MPI library. In this work a complete performance comparison among the centralized version and the two distributed versions of D-ITG is presented

Archivio della ricerca - Università degli studi di Napoli Federico II

PaPaS: A Portable, Lightweight, and Generic Framework for Parallel Parameter Studies

Author: Day Judy
Lenhart Suzanne
Peterson Gregory D.
Ponce Eduardo
Stephenson Brittany
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/07/2018
Field of study

The current landscape of scientific research is widely based on modeling and simulation, typically with complexity in the simulation's flow of execution and parameterization properties. Execution flows are not necessarily straightforward since they may need multiple processing tasks and iterations. Furthermore, parameter and performance studies are common approaches used to characterize a simulation, often requiring traversal of a large parameter space. High-performance computers offer practical resources at the expense of users handling the setup, submission, and management of jobs. This work presents the design of PaPaS, a portable, lightweight, and generic workflow framework for conducting parallel parameter and performance studies. Workflows are defined using parameter files based on keyword-value pairs syntax, thus removing from the user the overhead of creating complex scripts to manage the workflow. A parameter set consists of any combination of environment variables, files, partial file contents, and command line arguments. PaPaS is being developed in Python 3 with support for distributed parallelization using SSH, batch systems, and C++ MPI. The PaPaS framework will run as user processes, and can be used in single/multi-node and multi-tenant computing systems. An example simulation using the BehaviorSpace tool from NetLogo and a matrix multiply using OpenMP are presented as parameter and performance studies, respectively. The results demonstrate that the PaPaS framework offers a simple method for defining and managing parameter studies, while increasing resource utilization.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US

arXiv.org e-Print Archive

Crossref

A case study for NoC based homogeneous MPSoC architectures

Author: Casu Mario Roberto
Macchiarulo Luca
Ruo Roch Massimo
Tota Sergio Vincenzo
Zamboni Maurizio
Publication venue: IEEE
Publication date: 01/01/2009
Field of study

The many-core design paradigm requires flexible and modular hardware and software components to provide the required scalability to next-generation on-chip multiprocessor architectures. A multidisciplinary approach is necessary to consider all the interactions between the different components of the design. In this paper, a complete design methodology that tackles at once the aspects of system level modeling, hardware architecture, and programming model has been successfully used for the implementation of a multiprocessor network-on-chip (NoC)-based system, the NoCRay graphic accelerator. The design, based on 16 processors, after prototyping with field-programmable gate array (FPGA), has been laid out in 90-nm technology. Post-layout results show very low power, area, as well as 500 MHz of clock frequency. Results show that an array of small and simple processors outperform a single high-end general purpose processo

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Parallel 3-D marine controlled-source electromagnetic modelling using high-order tetrahedral Nédélec elements

Author: Castillo Reyes Octavio
Cela Espín José M.
de la Puente Álvarez Josep
García-Castillo Luis Emilio
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

We present a parallel and high-order Nédélec finite element solution for the marine controlled-source electromagnetic (CSEM) forward problem in 3-D media with isotropic conductivity. Our parallel Python code is implemented on unstructured tetrahedral meshes, which support multiple-scale structures and bathymetry for general marine 3-D CSEM modelling applications. Based on a primary/secondary field approach, we solve the diffusive form of Maxwell’s equations in the low-frequency domain. We investigate the accuracy and performance advantages of our new high-order algorithm against a low-order implementation proposed in our previous work. The numerical precision of our high-order method has been successfully verified by comparisons against previously published results that are relevant in terms of scale and geological properties. A convergence study confirms that high-order polynomials offer a better trade-off between accuracy and computation time. However, the optimum choice of the polynomial order depends on both the input model and the required accuracy as revealed by our tests. Also, we extend our adaptive-meshing strategy to high-order tetrahedral elements. Using adapted meshes to both physical parameters and high-order schemes, we are able to achieve a significant reduction in computational cost without sacrificing accuracy in the modelling. Furthermore, we demonstrate the excellent performance and quasi-linear scaling of our implementation in a state-of-the-art high-performance computing architecture.This project has received funding from the European Union's Horizon 2020 programme under the Marie Sklodowska-Curie grant agreement No. 777778. Furthermore, the research leading to these results has received funding from the European Union's Horizon 2020 programme under the ChEESE Project (https://cheese-coe.eu/ ), grant agreement No. 823844. In addition, the authors would also like to thank the support of the Ministerio de Educación y Ciencia (Spain) under Projects TEC2016-80386-P and TIN2016-80957-P. The authors would like to thank the Editors-in-Chief and to both reviewers, Dr. Martin Cuma and Dr. Raphael Rochlitz, for their valuable comments and suggestions which helped to improve the quality of the manuscript. This work benefited from the valuable suggestions, comments, and proofreading of Dr. Otilio Rojas (BSC). Last but not least, Octavio Castillo-Reyes thanks Natalia Gutierrez (BSC) for her support in CSEM modeling with BSIT.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC