Search CORE

212 research outputs found

Matrix transpose on meshes with buses

Author: Békési József
Galambos Gábor
Publication venue
Publication date: 01/01/2016
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

A fast parallel algorithm for special linear systems of equations using processor arrays with reconfigurable bus systems

Author: Chaudhari N. S.
Fehr Elfriede
Wankar Rajeev
Publication venue
Publication date: 01/01/1999
Field of study

A parallel algorithm using Processor Arrays with Reconfigurable Bus Systems has been designed to solve dense Symmetric Positive Definite (SPD) systems of equations Ax = b. The key content of this report is the parallelisation of the algorithm by Delosme & Ipson [8]. In order to design a parallel algorithm for PARBS, many procedures involved in [8] are handled in a slightly different way. The parallel time and processor’s complexity of each step of the algorithm is calculated. The parallel time complexity is O(n) using 2n × 2n × 5n number of Processing Elements

Institutional Repository of the Freie Universität Berlin

Efficient parallel computation on multiprocessors with optical interconnection networks

Author: He Min
Publication venue: LSU Digital Commons
Publication date: 01/01/2002
Field of study

This dissertation studies optical interconnection networks, their architecture, address schemes, and computation and communication capabilities. We focus on a simple but powerful optical interconnection network model - the Linear Array with Reconfigurable pipelined Bus System (LARPBS). We extend the LARPBS model to a simplified higher dimensional LAPRBS and provide a set of basic computation operations. We then study the following two groups of parallel computation problems on both one dimensional LARPBS\u27s as well as multi-dimensional LARPBS\u27s: parallel comparison problems, including sorting, merging, and selection; Boolean matrix multiplication, transitive closure and their applications to connected component problems. We implement an optimal sorting algorithm on an n-processor LARPBS. With this optimal sorting algorithm at disposal, we study the sorting problem for higher dimensional LARPBS\u27s and obtain the following results: • An optimal basic Columnsort algorithm on a 2D LARPBS. • Two optimal two-way merge sort algorithms on a 2D LARPBS. • An optimal multi-way merge sorting algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 3D LARPBS. • An optimal 5-phase sorting algorithm on a 3D LARPBS. Results for selection problems are as follows: • A constant time maximum-finding algorithm on an LARPBS. • An optimal maximum-finding algorithm on an LARPBS. • An O((log log n)2) time parallel selection algorithm on an LARPBS. • An O(k(log log n)2) time parallel multi-selection algorithm on an LARPBS. While studying the computation and communication properties of the LARPBS model, we find Boolean matrix multiplication and its applications to the graph are another set of problem that can be solved efficiently on the LARPBS. Following is a list of results we have obtained in this area. • A constant time Boolean matrix multiplication algorithm. • An O(log n)-time transitive closure algorithm. • An O(log n)-time connected components algorithm. • An O(log n)-time strongly connected components algorithm. The results provided in this dissertation show the strong computation and communication power of optical interconnection networks

Louisiana State University

Performance Analysis of Mesh-based NoC’s on Routing Algorithms

Author: D’Souza Allbright
R Anala M.
Subrahmanya Amit N.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2018
Field of study

The advent of System-on-Chip (SoCs), has brought about a need to increase the scale of multi-core chip networks. Bus Based communications have proved to be limited in terms of performance and ease of scalability, the solution to both bus – based and Point-to-Point (P2P) communication systems is to use a communication infrastructure called Network-on-Chip (NoC). Performance of NoC depends on various factors such as network topology, routing strategy and switching technique and traffic patterns. In this paper, we have taken the initiative to compile together a comparative analysis of different Network on Chip infrastructures based on the classification of routing algorithm, switching technique, and traffic patterns. The goal is to show how varied combinations of the three factors perform differently based on the size of the mesh network, using NOXIM, an open source SystemC Simulator of mesh-based NoC. The analysis has shown tenable evidence highlighting the novelty of XY routing algorithm

IAES journal

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

The GNAT method for nonlinear model reduction: effective implementation and application to computational fluid dynamics and turbulent flows

Author: Amsallem
Amsallem
Amsallem
Amsallem
Antoulas
Astrid
Barrault
Bos
Bui-Thanh
Carlberg
Charbel Farhat
Chaturantabut
Chaturantabut
David Amsallem
Drohmann
Epureanu
Everson
Fang
Farhat
Galbally
Geuzaine
Grepl
Grepl
Haasdonk
Haasdonk
Hall
Hinterberger
Julien Cortial
Kevin Carlberg
Kim
Lieu
Lieu
McKay
Ngoc Cuong
Nguyen
Nguyen
Nocedal
Prud’homme
Rozza
Ryckelynck
Thomas
Veroy
Willcox
Willcox
Publication venue: 'Elsevier BV'
Publication date: 05/07/2012
Field of study

The Gauss--Newton with approximated tensors (GNAT) method is a nonlinear model reduction method that operates on fully discretized computational models. It achieves dimension reduction by a Petrov--Galerkin projection associated with residual minimization; it delivers computational efficency by a hyper-reduction procedure based on the `gappy POD' technique. Originally presented in Ref. [1], where it was applied to implicit nonlinear structural-dynamics models, this method is further developed here and applied to the solution of a benchmark turbulent viscous flow problem. To begin, this paper develops global state-space error bounds that justify the method's design and highlight its advantages in terms of minimizing components of these error bounds. Next, the paper introduces a `sample mesh' concept that enables a distributed, computationally efficient implementation of the GNAT method in finite-volume-based computational-fluid-dynamics (CFD) codes. The suitability of GNAT for parameterized problems is highlighted with the solution of an academic problem featuring moving discontinuities. Finally, the capability of this method to reduce by orders of magnitude the core-hours required for large-scale CFD computations, while preserving accuracy, is demonstrated with the simulation of turbulent flow over the Ahmed body. For an instance of this benchmark problem with over 17 million degrees of freedom, GNAT outperforms several other nonlinear model-reduction methods, reduces the required computational resources by more than two orders of magnitude, and delivers a solution that differs by less than 1% from its high-dimensional counterpart

arXiv.org e-Print Archive

Crossref

A hierarchical parallel implementation model for algebra-based CFD simulations on hybrid supercomputers

Author: Álvarez Farré Xavier
Publication venue: Universitat Politècnica de Catalunya
Publication date: 05/09/2022
Field of study

(English) Continuous enhancement in hardware technologies enables scientific computing to advance incessantly and reach further aims. Since the start of the global race for exascale high-performance computing (HPC), massively-parallel devices of various architectures have been incorporated into the newest supercomputers, leading to an increasing hybridization of HPC systems. In this context of accelerated innovation, software portability and efficiency become crucial. Traditionally, scientific computing software development is based on calculations in iterative stencil loops (ISL) over a discretized geometry—the mesh. Despite being intuitive and versatile, the interdependency between algorithms and their computational implementations in stencil applications usually results in a large number of subroutines and introduces an inevitable complexity when it comes to portability and sustainability. An alternative is to break the interdependency between algorithm and implementation to cast the calculations into a minimalist set of kernels. The portable implementation model that is the object of this thesis is not restricted to a particular numerical method or problem. However, owing to the CTTC's long tradition in computational fluid dynamics (CFD) and without loss of generality, this work is targeted to solve transient CFD simulations. By casting discrete operators and mesh functions into (sparse) matrices and vectors, it is shown that all the calculations in a typical CFD algorithm boil down to the following basic linear algebra subroutines: the sparse matrix-vector product, the linear combination of vectors, and the dot product. The proposed formulation eases the deployment of scientific computing software in massively parallel hybrid computing systems and is demonstrated in the large-scale, direct numerical simulation of transient turbulent flows.(Català) La millora contínua en tecnologies de la informàtica possibilita a la comunitat de computació científica avançar incessantment i assolir ulteriors objectius. Des de l'inici de la cursa global per a la computació d'alt rendiment (HPC) d'exa-escala, s'han incorporat dispositius massivament paral·lels de diverses arquitectures als supercomputadors més nous, donant lloc a una creixent hibridació dels sistemes HPC. En aquest context d'innovació accelerada, la portabilitat i l'eficiència del programari esdevenen crucials. Tradicionalment, el desenvolupament de programari informàtic científic es basa en càlculs en bucles de patrons iteratius (ISL) sobre una geometria discretitzada: la malla. Tot i ser intuïtiva i versàtil, la interdependència entre algorismes i les seves implementacions computacionals en aplicacions de patrons sol donar lloc a un gran nombre de subrutines i introdueix una complexitat inevitable quan es tracta de portabilitat i sostenibilitat. Una alternativa és trencar la interdependència entre l'algorisme i la implementació per reduir els càlculs a un conjunt minimalista de subrutines. El model d'implementació portable objecte d'aquesta tesi no es limita a un mètode o problema numèric concret. No obstant això, i a causa de la llarga tradició del CTTC en dinàmica de fluids computacional (CFD) i sense pèrdua de generalitat, aquest treball està dirigit a resoldre simulacions CFD transitòries. Mitjançant la conversió d'operadors discrets i funcions de malla en matrius (disperses) i vectors, es demostra que tots els càlculs d'un algorisme CFD típic es redueixen a les següents subrutines bàsiques d'àlgebra lineal: el producte dispers matriu-vector, la combinació lineal de vectors, i el producte escalar. La formulació proposada facilita el desplegament de programari de computació científica en sistemes informàtics híbrids massivament paral·lels i es demostra el seu rendiment en la simulació numèrica directa de gran escala de fluxos turbulents transitoris.Enginyeria tèrmic

Tesis Doctorals en Xarxa

A hierarchical parallel implementation model for algebra-based CFD simulations on hybrid supercomputers

Author: Álvarez Farré Xavier
Publication venue: Universitat Politècnica de Catalunya
Publication date: 05/09/2022
Field of study

UPCommons. Portal del coneixement obert de la UPC

Design Space Exploration for MPSoC Architectures

Author: Latif Khalid
Publication venue: Turku Centre for Computer Science
Publication date: 20/12/2013
Field of study

Multiprocessor system-on-chip (MPSoC) designs utilize the available technology and communication architectures to meet the requirements of the upcoming applications. In MPSoC, the communication platform is both the key enabler, as well as the key differentiator for realizing efficient MPSoCs. It provides product differentiation to meet a diverse, multi-dimensional set of design constraints, including performance, power, energy, reconfigurability, scalability, cost, reliability and time-to-market. The communication resources of a single interconnection platform cannot be fully utilized by all kind of applications, such as the availability of higher communication bandwidth for computation but not data intensive applications is often unfeasible in the practical implementation. This thesis aims to perform the architecture-level design space exploration towards efficient and scalable resource utilization for MPSoC communication architecture. In order to meet the performance requirements within the design constraints, careful selection of MPSoC communication platform, resource aware partitioning and mapping of the application play important role. To enhance the utilization of communication resources, variety of techniques such as resource sharing, multicast to avoid re-transmission of identical data, and adaptive routing can be used. For implementation, these techniques should be customized according to the platform architecture. To address the resource utilization of MPSoC communication platforms, variety of architectures with different design parameters and performance levels, namely Segmented bus (SegBus), Network-on-Chip (NoC) and Three-Dimensional NoC (3D-NoC), are selected. Average packet latency and power consumption are the evaluation parameters for the proposed techniques. In conventional computing architectures, fault on a component makes the connected fault-free components inoperative. Resource sharing approach can utilize the fault-free components to retain the system performance by reducing the impact of faults. Design space exploration also guides to narrow down the selection of MPSoC architecture, which can meet the performance requirements with design constraints.Siirretty Doriast

UTUPub