247 research outputs found
Parallel Implementations of Cellular Automata for Traffic Models
The Biham-Middleton-Levine (BML) traffic model is a simple two-dimensional,
discrete Cellular Automaton (CA) that has been used to study self-organization
and phase transitions arising in traffic flows. From the computational point of
view, the BML model exhibits the usual features of discrete CA, where the state
of the automaton are updated according to simple rules that depend on the state
of each cell and its neighbors. In this paper we study the impact of various
optimizations for speeding up CA computations by using the BML model as a case
study. In particular, we describe and analyze the impact of several parallel
implementations that rely on CPU features, such as multiple cores or SIMD
instructions, and on GPUs. Experimental evaluation provides quantitative
measures of the payoff of each technique in terms of speedup with respect to a
plain serial implementation. Our findings show that the performance gap between
CPU and GPU implementations of the BML traffic model can be reduced by clever
exploitation of all CPU features
A Cascade Neural Network Architecture investigating Surface Plasmon Polaritons propagation for thin metals in OpenMP
Surface plasmon polaritons (SPPs) confined along metal-dielectric interface
have attracted a relevant interest in the area of ultracompact photonic
circuits, photovoltaic devices and other applications due to their strong field
confinement and enhancement. This paper investigates a novel cascade neural
network (NN) architecture to find the dependance of metal thickness on the SPP
propagation. Additionally, a novel training procedure for the proposed cascade
NN has been developed using an OpenMP-based framework, thus greatly reducing
training time. The performed experiments confirm the effectiveness of the
proposed NN architecture for the problem at hand
OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF
The actor model of computation has been designed for a seamless support of
concurrency and distribution. However, it remains unspecific about data
parallel program flows, while available processing power of modern many core
hardware such as graphics processing units (GPUs) or coprocessors increases the
relevance of data parallelism for general-purpose computation.
In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework
(CAF). This offers a high level interface for accessing any OpenCL device
without leaving the actor paradigm. The new type of actor is integrated into
the runtime environment of CAF and gives rise to transparent message passing in
distributed systems on heterogeneous hardware. Following the actor logic in
CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence
operate in a multi-stage fashion on data resident at the GPU. Developers are
thus enabled to build complex data parallel programs from primitives without
leaving the actor paradigm, nor sacrificing performance. Our evaluations on
commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear
scaling behavior when offloading larger workloads. For sub-second duties, the
efficiency of offloading was found to largely differ between devices. Moreover,
our findings indicate a negligible overhead over programming with the native
OpenCL API.Comment: 28 page
Verification of loop parallelisations
Writing correct parallel programs becomes more and more difficult as the complexity and heterogeneity of processors increase. This issue is addressed by parallelising compilers. Various compiler directives can be used to tell these compilers where to parallelise. This paper addresses the correctness of such compiler directives for loop parallelisation. Specifically, we propose a technique based on separation logic to verify whether a loop can be parallelised. Our approach requires each loop iteration to be specified with the locations that are read and written in this iteration. If the specifications are correct, they can be used to draw conclusions about loop (in)dependences. Moreover, they also reveal where synchronisation is needed in the parallelised program. The loop iteration specifications can be verified using permission-based separation logic and seamlessly integrate with functional behaviour specifications. We formally prove the correctness of our approach and we discuss automated tool support for our technique. Additionally, we also discuss how the loop iteration contracts can be compiled into specifications for the code coming out of the parallelising compiler
Improving Scalability and Maintenance of Software for High-Performance Scientific Computing by Combining MDE and Frameworks
International audienceIn recent years, numerical simulation has attracted increasing interest within industry and among academics. Paradoxically, the development and maintenance of high performance scientific computing software has become more complex due to the diversification of hardware architectures and their related programming languages and libraries. In this paper, we share our experience in using model-driven development for numerical simulation software. Our approach called MDE4HPC proposes to tackle development complexity by using a domain specific modeling language to describe abstract views of the software. We present and analyse the results obtained with its implementation when deriving this abstract model to target Arcane, a development framework for 2D and 3D numerical simulation software
Optimistic Parallelism on GPUs
Abstract. We present speculative parallelization techniques that can exploit parallelism in loops even in the presence of dynamic irregulari-ties that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, com-putation, misspeculation check, result committing, and misspeculation recovery. While the first two phases enable exploitation of data paral-lelism, the latter three phases represent overhead costs of using specu-lation. We perform misspeculation check on the GPU to minimize its cost. We perform result committing and misspeculation recovery on the CPU to reduce the result copying and recovery overhead. The scheduling policies are designed to reduce the misspeculation rate. Our program-ming model provides API for programmers to give hints about potential misspeculations to reduce their detection cost. Our experiments yielded speedups of 3.62x-13.76x on an nVidia Tesla C1060 hosted in an Intel(R) Xeon(R) E5540 machine.
Towards High Performance Relativistic Electronic Structure Modelling: The EXP-T Program Package
Modern challenges arising in the fields of theoretical and experimental
physics require new powerful tools for high-precision electronic structure
modelling; one of the most perspective tools is the relativistic Fock space
coupled cluster method (FS-RCC). Here we present a new extensible
implementation of the FS-RCC method designed for modern parallel computers. The
underlying theoretical model, algorithms and data structures are discussed. The
performance and scaling features of the implementation are analyzed. The
software developed allows to achieve a completely new level of accuracy for
prediction of properties of atoms and molecules containing heavy and superheavy
nuclei
DeterminaciĂłn de la lĂnea de la sonrisa en pacientes asistidos en Cátedra Prostodoncia IV B FOUNC
Fil: D'itria, JosĂ© Antonio. Universidad Nacional de CĂłrdoba. Facultad de OdontologĂa. Cátedra de Prostodoncia B; Argentina.Fil: Elizondo Cassab, Elby Elizabeth. Universidad Nacional de CĂłrdoba. Facultad de OdontologĂa. Cátedra de Prostodoncia B; Argentina.Fil: Rugani, Nelson L. J. Universidad Nacional de CĂłrdoba. Facultad de OdontologĂa. Cátedra de Prostodoncia IV B; Argentina.Fil: Sánchez Dagum, Mercedes. Universidad Nacional de CĂłrdoba. Facultad de OdontologĂa. Cátedra de OdontologĂa Preventiva y Comunitaria I; Argentina.Objetivo: 1-Analizar clĂnica y fotográficamente la lĂnea de la sonrisa en pacientes parcialmente
desdentados, 2-evaluar la relaciĂłn entre la lĂnea la de sonrisa y los factores sexo y
edad. MetodologĂa: Estudio observacional, descriptivo de corte transversal. Se evaluaron pacientes
adultos entre 20-75 años, con al menos un elemento dentario anterior presente en boca, atendidos en
la Cátedra de Prostodoncia IV B, marzo y noviembre 2015, n: 100, con consentimiento informado, por
escrito, bajo normas CIEIS FO UNC. Se evaluaron las siguientes variables: sexo, edad y clasificaciĂłn
de la lĂnea de la Sonrisa (Tijan y col). El registro fotográfico se realizĂł con cámara Canon D50, trĂpode
fotográfico y con el paciente sentado de frente, ambos en posición predeterminada. El análisis
descriptivo utilizará medidas de tendencia central (base en promedios, mediana, porcentajes y desv
est. Para el análisis inferencial se utilizará análisis bivariado, prueba Chi2 y análisis de tendencia lineal
de proporciones. Se considerará como significante p<0,05.Fil: D'itria, JosĂ© Antonio. Universidad Nacional de CĂłrdoba. Facultad de OdontologĂa. Cátedra de Prostodoncia B; Argentina.Fil: Elizondo Cassab, Elby Elizabeth. Universidad Nacional de CĂłrdoba. Facultad de OdontologĂa. Cátedra de Prostodoncia B; Argentina.Fil: Rugani, Nelson L. J. Universidad Nacional de CĂłrdoba. Facultad de OdontologĂa. Cátedra de Prostodoncia IV B; Argentina.Fil: Sánchez Dagum, Mercedes. Universidad Nacional de CĂłrdoba. Facultad de OdontologĂa. Cátedra de OdontologĂa Preventiva y Comunitaria I; Argentina.Otras Ciencias de la Salu
The Network Analysis of Urban Streets: A Primal Approach
The network metaphor in the analysis of urban and territorial cases has a
long tradition especially in transportation/land-use planning and economic
geography. More recently, urban design has brought its contribution by means of
the "space syntax" methodology. All these approaches, though under different
terms like accessibility, proximity, integration,connectivity, cost or effort,
focus on the idea that some places (or streets) are more important than others
because they are more central. The study of centrality in complex
systems,however, originated in other scientific areas, namely in structural
sociology, well before its use in urban studies; moreover, as a structural
property of the system, centrality has never been extensively investigated
metrically in geographic networks as it has been topologically in a wide range
of other relational networks like social, biological or technological. After
two previous works on some structural properties of the dual and primal graph
representations of urban street networks (Porta et al. cond-mat/0411241;
Crucitti et al. physics/0504163), in this paper we provide an in-depth
investigation of centrality in the primal approach as compared to the dual one,
with a special focus on potentials for urban design.Comment: 19 page, 4 figures. Paper related to the paper "The Network Analysis
of Urban Streets: A Dual Approach" cond-mat/041124
TomograPy: A Fast, Instrument-Independent, Solar Tomography Software
Solar tomography has progressed rapidly in recent years thanks to the
development of robust algorithms and the availability of more powerful
computers. It can today provide crucial insights in solving issues related to
the line-of-sight integration present in the data of solar imagers and
coronagraphs. However, there remain challenges such as the increase of the
available volume of data, the handling of the temporal evolution of the
observed structures, and the heterogeneity of the data in multi-spacecraft
studies.
We present a generic software package that can perform fast tomographic
inversions that scales linearly with the number of measurements, linearly with
the length of the reconstruction cube (and not the number of voxels) and
linearly with the number of cores and can use data from different sources and
with a variety of physical models: TomograPy
(http://nbarbey.github.com/TomograPy/), an open-source software freely
available on the Python Package Index. For performance, TomograPy uses a
parallelized-projection algorithm. It relies on the World Coordinate System
standard to manage various data sources. A variety of inversion algorithms are
provided to perform the tomographic-map estimation. A test suite is provided
along with the code to ensure software quality. Since it makes use of the
Siddon algorithm it is restricted to rectangular parallelepiped voxels but the
spherical geometry of the corona can be handled through proper use of priors.
We describe the main features of the code and show three practical examples
of multi-spacecraft tomographic inversions using STEREO/EUVI and STEREO/COR1
data. Static and smoothly varying temporal evolution models are presented.Comment: 21 pages, 6 figures, 5 table
- …