Search CORE

2,277 research outputs found

Developing Efﬁcient Discrete Simulations on Multicore and GPU Architectures

Author: Cagigas Muñiz Daniel
Díaz del Río Fernando
Guisado Lízar José Luís
Jiménez-Morales Francisco de Paula
López-Torres Manuel Ramón
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

In this paper we show how to efﬁciently implement parallel discrete simulations on multicoreandGPUarchitecturesthrougharealexampleofanapplication: acellularautomatamodel of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientiﬁc computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100GPU,bothrunningonAmazonWebServer(AWS)Cloud;and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.Ministerio de Economía, Industria y Competitividad, Gobierno de España (MINECO), and the Agencia Estatal de Investigación (AEI) of Spain, coﬁnanced by FEDER funds (EU) TIN2017-89842

idUS. Depósito de Investigación Universidad de Sevilla

A non-hybrid method for the PDF equations of turbulent flows on unstructured grids

Author: Abe
Arnold
Bacon
Bakosi
Cassiani
Cassiani
Chorin
Colucci
Courant
Craft
Delarue
Dopazo
Dopazo
Dreeben
Dreeben
Dreeben
Durbin
Entacher
Fox
Fox
Gardiner
Gicquel
Grigoryev
Hanjalić
Haworth
Heinz
Iwamoto
J. Bakosi
Jenny
Jones
Karatzas
Karnik
Kloeden
Lai
Langevin
Launder
Launder
Lavertu
Liu
Lundgren
Löhner
Löhner
Madnia
Mascagni
Meroney
Moser
Muradoglu
Muradoglu
P. Franzese
Pavageau
Pope
Pope
Pope
Pope
Pope
Pope
Pope
Rembold
Reynolds
Rodi
Rotta
Saad
Sawford
Sheikhi
Singhal
Speziale
Subramaniam
Tang
Taylor
van Driest
van Kampen
van Slooten
Villermaux
Wacławczyk
Warhaft
Whizman
Xu
Z. Boybeyi
Zhang
Publication venue: 'Elsevier BV'
Publication date: 02/06/2010
Field of study

In probability density function (PDF) methods of turbulent flows, the joint PDF of several flow variables is computed by numerically integrating a system of stochastic differential equations for Lagrangian particles. A set of parallel algorithms is proposed to provide an efficient solution of the PDF transport equation, modeling the joint PDF of turbulent velocity, frequency and concentration of a passive scalar in geometrically complex configurations. An unstructured Eulerian grid is employed to extract Eulerian statistics, to solve for quantities represented at fixed locations of the domain (e.g. the mean pressure) and to track particles. All three aspects regarding the grid make use of the finite element method (FEM) employing the simplest linear FEM shape functions. To model the small-scale mixing of the transported scalar, the interaction by exchange with the conditional mean model is adopted. An adaptive algorithm that computes the velocity-conditioned scalar mean is proposed that homogenizes the statistical error over the sample space with no assumption on the shape of the underlying velocity PDF. Compared to other hybrid particle-in-cell approaches for the PDF equations, the current methodology is consistent without the need for consistency conditions. The algorithm is tested by computing the dispersion of passive scalars released from concentrated sources in two different turbulent flows: the fully developed turbulent channel flow and a street canyon (or cavity) flow. Algorithmic details on estimating conditional and unconditional statistics, particle tracking and particle-number control are presented in detail. Relevant aspects of performance and parallelism on cache-based shared memory machines are discussed.Comment: Accepted in Journal of Computational Physics, Feb. 20, 200

arXiv.org e-Print Archive

Crossref

Efficient instruction level simulation of computers

Author: Campbell William B.
Fujimoto Richard M.
Publication venue: University of Utah
Publication date: 01/01/1987
Field of study

Journal ArticleA technique for creating efficient, yet highly accurate, instruction level simulation models of computers is described. In contrast to traditional approaches that use a software interpreter, this technique employs direct execution of application programs on the host computer. An assembly language program for the machine to be modeled is decompiled to a high level language, instrumented, and then recompiled and executed on the host computer. A prototype implementation modeling the Motorola MC68010 microprocessor is described, and the efficiency and accuracy of this prototype is reported. It is demonstrated that the direct execution technique can be used to produce accurate simulation models which are orders of magnitude faster than traditional, register transfer level simulators

The University of Utah: J. Willard Marriott Digital Library

Automated and accurate cache behavior analysis for codes with irregular access patterns

Author: Andrade Diego
Arenaz Silva Manuel
Doallo Ramón
Fraguela Basilio B.
Touriño Juan
Publication venue: 'Wiley'
Publication date: 01/01/2007
Field of study

This is the peer reviewed version of the following article: Andrade, D. , Arenaz, M. , Fraguela, B. B., Touriño, J. and Doallo, R. (2007), Automated and accurate cache behavior analysis for codes with irregular access patterns. Concurrency Computat.: Pract. Exper., 19: 2407-2423. doi:10.1002/cpe.1173, which has been published in final form at https://doi.org/10.1002/cpe.1173. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.[Abstract] The memory hierarchy plays an essential role in the performance of current computers, so good analysis tools that help in predicting and understanding its behavior are required. Analytical modeling is the ideal base for such tools if its traditional limitations in accuracy and scope of application can be overcome. While there has been extensive research on the modeling of codes with regular access patterns, less attention has been paid to codes with irregular patterns due to the increased difficulty in analyzing them. Nevertheless, many important applications exhibit this kind of pattern, and their lack of locality make them more cache‐demanding, which makes their study more relevant. The focus of this paper is the automation of the Probabilistic Miss Equations (PME) model, an analytical model of the cache behavior that provides fast and accurate predictions for codes with irregular access patterns. The information requirements of the PME model are defined and its integration in the XARK compiler, a research compiler oriented to automatic kernel recognition in scientific codes, is described. We show how to exploit the powerful information‐gathering capabilities provided by this compiler to allow the automated modeling of loop‐oriented scientific codes. Experimental results that validate the correctness of the automated PME model are also presented.Ministerio de Educación y Ciencia; TIN2004-07797-C02Xunta de Galicia; PGIDIT03TIC10502PRXunta de Galicia; PGIDT05PXIC10504P

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Chaos in computer performance

Author: Cook M.
Daniel Gracia Pérez
Hugues Berry
Kantz H.
Kilian J.
Kulkarni P.
Moore G. E.
Olivier Temam
Stephenson M.
Wolfram S.
Publication venue: 'AIP Publishing'
Publication date: 14/12/2005
Field of study

Modern computer microprocessors are composed of hundreds of millions of transistors that interact through intricate protocols. Their performance during program execution may be highly variable and present aperiodic oscillations. In this paper, we apply current nonlinear time series analysis techniques to the performances of modern microprocessors during the execution of prototypical programs. Our results present pieces of evidence strongly supporting that the high variability of the performance dynamics during the execution of several programs display low-dimensional deterministic chaos, with sensitivity to initial conditions comparable to textbook models. Taken together, these results show that the instantaneous performances of modern microprocessors constitute a complex (or at least complicated) system and would benefit from analysis with modern tools of nonlinear and complexity science

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Systematic analysis of the cache behavior of irregular codes

Author: Andrade Diego
Publication venue
Publication date: 01/01/2007
Field of study

[Resumen] El rendimiento de las jerarquías de memoria, en las cuales la caché juega un papel fundamental, es crítico en los computadores de proposito general actuales y en los sistemas embebidos, debido al creciente problema del cuello de botella del sistema de memoria. Desafortunadamente, el comportamiento de la caché es muy inestable y difícil de predecir. Esto es especialmente cierto en presencia de patrones de acceso irregulares, los cuales exhiben poca localidad. Tales patrones son muy comunes por ejemplo en aplicaciones en las cuales algunas referencias están afectadas por sentencias condicionales o en las que el almacenamiento comprimido de matrices dispersas da lugar a la aparición de indirecciones. SIn embargo, el comportamiento caché en presencia de patrones de acceso irregulares no ha sido estudiado ampliamente. En esta tesis presentamos extensiones de una técnica de modelado analítico sistemático basadas en PMEs (Ecuaciones probabilísticas de fallos) que permiten el análisis automático del comportamiento caché para códigos que incluyen sentencias condicionales cuyo valor de verdad puede no ser determinable en tiempo de compilación y códigos con referencias irregulares debidas a indirecciones, respectivamente. El modelo genera predicciones muy precisar a pesar de la irregularidad y tiene un bajo coste computacional siendo el primer modelo que reune estas dos características capaz de analizar automáticamente esta clase de códigos. Estas propiedades convierten al modelo en adecuado para servir de guía en optimizaciones del compilador. La extensión del modelo para códigos irregulares con indirecciones ha sido integrada en el compilador XARK, un compilador orientado al reconocimiento automático de kernels en aplicaciones científicas. Mostramos como explotar las potentes capacidades de extracción de información de este compilador para permitir el modelado automático de códigos científicos basados en bucles

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas