Search CORE

144 research outputs found

Supernode Transformation On Parallel Systems With Distributed Memory – An Analytical Approach

Author: Chen Yong
Publication venue: Scholar Commons
Publication date: 21/03/2017
Field of study

Supernode transformation, or tiling, is a technique that partitions algorithms to improve data locality and parallelism by balancing computation and inter-processor communication costs to achieve shortest execution or running time. It groups multiple iterations of nested loops into supernodes to be assigned to processors for processing in parallel. A supernode transformation can be described by supernode size and shape. This research focuses on supernode transformation on multi-processor architectures with distributed memory, including computer cluster systems and General Purpose Graphic Processing Units (GPGPUs). The research involves supernode scheduling, supernode mapping to processors, and the finding of the optimal supernode size, for achieving the shortest total running time. The algorithms considered are two nested loops with regular data dependencies. The Longest Common Subsequence problem is used as an illustration. A novel mathematical model for the total running time is established as a function of the supernode size, algorithm parameters such as the problem size and the data dependence, the computation time of each loop iteration, architecture parameters such as the number of processors, and the communication cost. The optimal supernode size is derived from this closed form model. The model and the optimal supernode size provide better results than previous researches and are verified by simulations on multi-processor systems including computer cluster systems and GPGPUs

Scholar Commons - Santa Clara University

Recommended from our members

N-Dimensional Perfect Pipelining

Author: Kim Ki-Chang
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

In this paper, we introduce a technique to parallelize nested loops at the fine grain level. It is a generalization of Perfect Pipelining which was developed to parallelize a single-nested loop at the fine grain level. Previous techniques that can parallelize nested loops, e.g. DOACROSS or Wavefront method, mostly belong to the coarse grain approach. We explain our method, contrast it with the coarse grain techniques, and show the benefits of parallelizing nested loops at the fine grain level

eScholarship - University of California

Algorithms for network expansion

Author: Lanfear T. A.
Lanfear T. A.
Publication venue: Department of Electrical Engineering, Imperial College London
Publication date: 01/01/1985
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Graph Sphere: From Nodes to Supernodes in Graphical Models

Author: Beskos Alexandros
Boom Willem van den
De Iorio Maria
Jasra Ajay
Publication venue
Publication date: 18/10/2023
Field of study

High-dimensional data analysis typically focuses on low-dimensional structure, often to aid interpretation and computational efficiency. Graphical models provide a powerful methodology for learning the conditional independence structure in multivariate data by representing variables as nodes and dependencies as edges. Inference is often focused on individual edges in the latent graph. Nonetheless, there is increasing interest in determining more complex structures, such as communities of nodes, for multiple reasons, including more effective information retrieval and better interpretability. In this work, we propose a multilayer graphical model where we first cluster nodes and then, at the second layer, investigate the relationships among groups of nodes. Specifically, nodes are partitioned into "supernodes" with a data-coherent size-biased tessellation prior which combines ideas from Bayesian nonparametrics and Voronoi tessellations. This construct allows accounting also for dependence of nodes within supernodes. At the second layer, dependence structure among supernodes is modelled through a Gaussian graphical model, where the focus of inference is on "superedges". We provide theoretical justification for our modelling choices. We design tailored Markov chain Monte Carlo schemes, which also enable parallel computations. We demonstrate the effectiveness of our approach for large-scale structure learning in simulations and a transcriptomics application.Comment: 71 pages, 18 figure

arXiv.org e-Print Archive

Beyond Reuse Distance Analysis: Dynamic Analysis for Characterization of Data Locality Potential

Author: Elango Venmugil
Fauzia Naznin
Pouchet Louis-Noël
Ramanujam J.
Rastello Fabrice
Ravishankar Mahesh
Rountev Atanas
Sadayappan P.
Publication venue
Publication date: 01/12/2013
Field of study

Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations is critical in order to guide efforts to enhance data locality. Reuse distance analysis of memory address traces is a valuable tool to perform data locality characterization of programs. A single reuse distance analysis can be used to estimate the number of cache misses in a fully associative LRU cache of any size, thereby providing estimates on the minimum bandwidth requirements at different levels of the memory hierarchy to avoid being bandwidth bound. However, such an analysis only holds for the particular execution order that produced the trace. It cannot estimate potential improvement in data locality through dependence preserving transformations that change the execution schedule of the operations in the computation. In this article, we develop a novel dynamic analysis approach to characterize the inherent locality properties of a computation and thereby assess the potential for data locality enhancement via dependence preserving transformations. The execution trace of a code is analyzed to extract a computational directed acyclic graph (CDAG) of the data dependences. The CDAG is then partitioned into convex subsets, and the convex partitioning is used to reorder the operations in the execution trace to enhance data locality. The approach enables us to go beyond reuse distance analysis of a single specific order of execution of the operations of a computation in characterization of its data locality properties. It can serve a valuable role in identifying promising code regions for manual transformation, as well as assessing the effectiveness of compiler transformations for data locality enhancement. We demonstrate the effectiveness of the approach using a number of benchmarks, including case studies where the potential shown by the analysis is exploited to achieve lower data movement costs and better performance.Comment: Transaction on Architecture and Code Optimization (2014

arXiv.org e-Print Archive

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Automatic parallelisation for a class of URE problems

Author: Chen Xian
Publication venue: Newcastle University
Publication date: 01/01/1995
Field of study

PhD ThesisThis thesis deals with the methodology and software of automatic parallelisation for numerical supercomputing and supercomputers. Basically, we focus on the problem of Uniform Recurrence Equations (URE) which exists widely in numerical computations. vVepropose a complete methodology of automatic generation of parallel programs for regular array designs. The methodology starts with an introduction of a set of canonical dependencies which generates a general modelling of the various URE problems. Based on these canonical dependencies, partitioning and mapping methods are developed which gives the foundation of the universal design process. Using the theoretical results we propose the structures of parallel programs and eventually generate automatically parallel codes which run correctly and efficiently on transputer array. The achievements presented in this thesis can be regarded as a significant progress in the area of automatic generation of parallel codes and regular (systolic) array design. This methodology is integrated and self-contained, and may be the only practical working package in this area.The Research Committee of University of Newcastle upon Tyne: CVCP Overseas Research Students Awards Scheme

Newcastle University eTheses

Fine-Grained Multithreading for the Multifrontal QR Factorization of Sparse Matrices

Author: Buttari Alfredo
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2013
Field of study

International audienceThe advent of multicore processors represents a disruptive event in the history of computer science as conventional parallel programming paradigms are proving incapable of fully exploiting their potential for concurrent computations. The need for different or new programming models clearly arises from recent studies which identify fine-granularity and dynamic execution as the keys to achieving high efficiency on multicore systems. This work presents an approach to the parallelization of the multifrontal method for the

QR

factorization of sparse matrices specifically designed for multicore based systems. High efficiency is achieved through a fine-grained partitioning of data and a dynamic scheduling of computational tasks relying on a dataflow parallel programming model. Experimental results show that an implementation of the proposed approach achieves higher performance and better scalability than existing equivalent software

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

A geometric approach to the structure of complex networks

Author: García Pérez Guillermo
Publication venue: 'Edicions de la Universitat de Barcelona'
Publication date: 01/01/2018
Field of study

[eng] Complex networks are mathematical representations of the interaction patterns of complex systems. During the last 20 years of Network Science, it has been recognized that networks from utterly different domains exhibit certain universal properties. In particular, real complex networks present heterogeneous, and usually scale-free, degree distributions, a large amount of triangles, or high clustering coefficient, a very short diameter, and a clear community structure. Among the vast set of models proposed to explain the structure of real networks, geometric models have proven to be particularly promising. This thesis is developed in the framework of hidden metric spaces, in which the high level of clustering observed in real networks emerges from underlying geometric spaces encoding the similarity between nodes. Besides providing an intuitive explanation to the observed clustering coefficient, geometric models succeed at reproducing the structure of complex networks with high accuracy. Furthermore, they can be used to obtain embeddings of networks, that is, maps of real systems enabling their geometric analysis and efficient navigation. This work introduces the main concepts in the hidden metric spaces approach and presents a thorough description of the main models and embedding procedures. We generalize these models to generate networks with soft communities, that is, with correlated positions of nodes in the underlying metric space. We also explore one of the models in higher similarity-space dimensions, and show that the maximum clustering coefficient attainable decreases with the dimension, which allows us to conclude that real-world networks must have low-dimensional similarity spaces as a consequence of their high clustering coefficient. The thesis also includes a detailed geometric analysis of the international trade system. After reconstructing a yearly sequence of world trade networks covering 14 decades, we embed them into hyperbolic space to obtain a series of maps, which we named The World Trade Atlas 1870-2013. In these maps, the likelihood for two countries to be connected by a significant trade channel depends on the distance among them in the underlying space, which encodes the different factors influencing trade interactions. Our analysis of the networks and their maps reveals that the world is being shaped by three different forces acting simultaneously: globalization, localization and hierarchization. The hidden metric spaces approach can be exploited beyond network metrics. We show that similarity space defines a notion of scale in real-world networks. We present a Geometric Renormalization Group transformation that unveils a previously unknown self-similarity of real networks. Remarkably, the phenomenon is explained by the congruency of real systems with our model. This renormalization transformation provides us with two immediate applications: a method to construct high-fidelity smaller-scale replicas of real networks and a multiscale navigation protocol in hyperbolic space that outperforms single-scale versions. The geometric origin of real networks is not restricted to their binary structure, but it affects their weighted organization as well. We provide empirical evidence for this claim and propose a geometric model with the capability to reproduce the weighted features of real systems from many different domains. We also present a method to infer the level of coupling of real networks with the underlying metric space, which is generally found to be high in real systems.[cat] Les xarxes complexes representen els patrons d’interacció dels sistemes complexos. S’ha observat repetidament que xarxes d’àmbits molt diferents comparteixen certes propietats, com l’heterogeneïtat del nombre de veïns o el clustering elevat (alta presència de triangles), entre d’altres. Tot i que s’han proposat molts models per explicar aquesta universalitat, els models geomètrics han demostrat ser particularment prometedors. Aquesta tesi es desenvolupa en el context dels espais mètrics ocults, en el qual la natura del clustering s’explica geomètricament en termes de similitud entre nodes. Els models basats en aquesta assumpció no només poden reproduir l’estructura de les xarxes reals amb molta precisió, sinó que permeten obtenir mapes de xarxes reals. En aquest treball, introduïm els conceptes bàsics dels espais mètrics ocults, els seus models principals i els mètodes d’obtenció de mapes. També generalitzem aquests models al règim amb correlacions geomètriques entre nodes, i explorem la qüestió de la dimensió de l’espai de similitud. La nostra anàlisi ens permet concloure que l’espai de similitud de les xarxes reals ha de tenir dimensionalitat baixa. Incloem una anàlisi geomètrica detallada de l’evolució del sistema de comerç internacional basada en els mapes a l’espai hiperbòlic de les xarxes corresponents, al llarg de 14 dècades. En aquests mapes, la proximitat entre pa¨ısos representa la probabilitat d’interaccionar comercialment. L’anàlisi mostra que el món evoluciona d’acord amb tres forces que actuen simultàniament: la globalització, la localització i la jerarquització. Els espais de similitud defineixen una noció d’escala en xarxes reals. Proposem una transformació de renormalització que revela una auto-similitud de sistemes reals anteriorment desconeguda. A més, proposem dues aplicacions d’aquesta transformació: un mètode per a obtenir versions reduïdes de xarxes reals i un mètode multiescalar per a navegar-les. Finalment, mostrem que les estructures pesades dels sistemes reals també tenen un origen geomètric i proposem un model capaç de reproduir-les amb precisió. Desenvolupem un mètode per a inferir el nivell d’acoblament de les xarxes reals amb els espais mètrics subjacents i trobem que aquest és generalment elevat

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Diposit Digital de la Universitat de Barcelona