Search CORE

6,347 research outputs found

A parallel interaction potential approach coupled with the immersed boundary method for fully resolved simulations of deformable interfaces and membranes

Author: de Tullio Marco D
Lohse Detlef
Meschini Valentina
Ostilla-Monico Rodolfo
Querzoli Giorgio
Spandan Vamsi
Verzicco Roberto
Publication venue: 'Elsevier BV'
Publication date: 01/12/2016
Field of study

In this paper we show and discuss the use of a versatile interaction potential approach coupled with an immersed boundary method to simulate a variety of flows involving deformable bodies. In particular, we focus on two kinds of problems, namely (i) deformation of liquid-liquid interfaces and (ii) flow in the left ventricle of the heart with either a mechanical or a natural valve. Both examples have in common the two-way interaction of the flow with a deformable interface or a membrane. The interaction potential approach (de Tullio & Pascazio, Jou. Comp. Phys., 2016; Tanaka, Wada and Nakamura, Computational Biomechanics, 2016) with minor modifications can be used to capture the deformation dynamics in both classes of problems. We show that the approach can be used to replicate the deformation dynamics of liquid-liquid interfaces through the use of ad-hoc elastic constants. The results from our simulations agree very well with previous studies on the deformation of drops in standard flow configurations such as deforming drop in a shear flow or a cross flow. We show that the same potential approach can also be used to study the flow in the left ventricle of the heart. The flow imposed into the ventricle interacts dynamically with the mitral valve (mechanical or natural) and the ventricle which are simulated using the same model. Results from these simulations are compared with ad- hoc in-house experimental measurements. Finally, a parallelisation scheme is presented, as parallelisation is unavoidable when studying large scale problems involving several thousands of simultaneously deforming bodies on hundreds of distributed memory computing processors

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Cagliari

University of Houston Institutional Repository (UHIR)

University of Twente Research Information

MPG.PuRe

Scalability of broadcast performance in wireless network-on-chip

Author: Abadal Cavallé Sergi
Alarcón Cot Eduardo José
Cabellos Aparicio Alberto
González Colás Antonio María
Lee Heekwan
Mestres Sugrañes Albert
Nemirovsky Mario
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

On-Line Dependability Enhancement of Multiprocessor SoCs by Resource Management

Author: Braak T.D. ter
Burgess S.T.
Hurskainen H.
Kerkhoff H.G.
Vermeulen B.
Zhang Xiao
Publication venue: IEEE Circuits & Systems Society
Publication date: 01/01/2010
Field of study

This paper describes a new approach towards dependable design of homogeneous multi-processor SoCs in an example satellite-navigation application. First, the NoC dependability is functionally verified via embedded software. Then the Xentium processor tiles are periodically verified via on-line self-testing techniques, by using a new IIP Dependability Manager. Based on the Dependability Manager results, faulty tiles are electronically excluded and replaced by fault-free spare tiles via on-line resource management. This integrated approach enables fast electronic fault detection/diagnosis and repair, and hence a high system availability. The dependability application runs in parallel with the actual application, resulting in a very dependable system. All parts have been verified by simulation

University of Twente Research Information

The DUNE-ALUGrid Module

Author: Alkämper Martin
Dedner Andreas
Klöfkorn Robert
Nolte Martin
Publication venue
Publication date: 15/08/2015
Field of study

In this paper we present the new DUNE-ALUGrid module. This module contains a major overhaul of the sources from the ALUgrid library and the binding to the DUNE software framework. The main changes include user defined load balancing, parallel grid construction, and an redesign of the 2d grid which can now also be used for parallel computations. In addition many improvements have been introduced into the code to increase the parallel efficiency and to decrease the memory footprint. The original ALUGrid library is widely used within the DUNE community due to its good parallel performance for problems requiring local adaptivity and dynamic load balancing. Therefore, this new model will benefit a number of DUNE users. In addition we have added features to increase the range of problems for which the grid manager can be used, for example, introducing a 3d tetrahedral grid using a parallel newest vertex bisection algorithm for conforming grid refinement. In this paper we will discuss the new features, extensions to the DUNE interface, and explain for various examples how the code is used in parallel environments.Comment: 25 pages, 11 figure

arXiv.org e-Print Archive

UiS Brage

Scalable dimensioning of resilient Lambda Grids

Author: De Leenheer Marc
De Turck Filip
Demeester Piet
Dhoedt Bart
Thysebaert Pieter
Volckaert Bruno
Publication venue
Publication date: 01/01/2007
Field of study

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit

CiteSeerX

Ghent University Academic Bibliography

Scalable parallel communications

Author: Foudriat E. C.
Khanna S.
Maly K.
Mukkamala R.
Overstreet C. M.
Sekhar Y. S.
Zubair M.
Publication venue
Publication date
Field of study

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups

NASA Technical Reports Server

Recommended from our members

A strategy for mapping unstructured mesh computational mechanics programs onto distributed memory parallel architectures

Author: McManus Kevin
Publication venue: University of Greenwich,
Publication date: 22/02/1996
Field of study

The motivation of this thesis was to develop strategies that would enable unstructured mesh based computational mechanics codes to exploit the computational advantages offered by distributed memory parallel processors. Strategies that successfully map structured mesh codes onto parallel machines have been developed over the previous decade and used to build a toolkit for automation of the parallelisation process. Extension of the capabilities of this toolkit to include unstructured mesh codes requires new strategies to be developed. This thesis examines the method of parallelisation by geometric domain decomposition using the single program multi data programming paradigm with explicit message passing. This technique involves splitting (decomposing) the problem definition into P parts that may be distributed over P processors in a parallel machine. Each processor runs the same program and operates only on its part of the problem. Messages passed between the processors allow data exchange to maintain consistency with the original algorithm. The strategies developed to parallelise unstructured mesh codes should meet a number of requirements: The algorithms are faithfully reproduced in parallel. The code is largely unaltered in the parallel version. The parallel efficiency is maximised. The techniques should scale to highly parallel systems. The parallelisation process should become automated. Techniques and strategies that meet these requirements are developed and tested in this dissertation using a state of the art integrated computational fluid dynamics and solid mechanics code. The results presented demonstrate the importance of the problem partition in the definition of inter-processor communication and hence parallel performance. The classical measure of partition quality based on the number of cut edges in the mesh partition can be inadequate for real parallel machines. Consideration of the topology of the parallel machine in the mesh partition is demonstrated to be a more significant factor than the number of cut edges in the achieved parallel efficiency. It is shown to be advantageous to allow an increase in the volume of communication in order to achieve an efficient mapping dominated by localised communications. The limitation to parallel performance resulting from communication startup latency is clearly revealed together with strategies to minimise the effect. The generic application of the techniques to other unstructured mesh codes is discussed in the context of automation of the parallelisation process. Automation of parallelisation based on the developed strategies is presented as possible through the use of run time inspector loops to accurately determine the dependencies that define the necessary inter-processor communication

Greenwich Academic Literature Archive

Data recovery in wormhole routing networks in hypercubes and meshes

Author: Alowayed Mohammad S.
Publication venue
Publication date: 01/12/1997
Field of study

SHAREOK repository

Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

Author: Carroll Chester C.
Saha Aindam
Youngblood John N.
Publication venue
Publication date
Field of study

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed

NASA Technical Reports Server

A survey on scheduling and mapping techniques in 3D Network-on-chip

Author: Ghose Manojit
Kaur Simran Preet
Pathak Ananya
Patole Rutuja
Publication venue
Publication date: 04/11/2022
Field of study

Network-on-Chips (NoCs) have been widely employed in the design of multiprocessor system-on-chips (MPSoCs) as a scalable communication solution. NoCs enable communications between on-chip Intellectual Property (IP) cores and allow those cores to achieve higher performance by outsourcing their communication tasks. Mapping and Scheduling methodologies are key elements in assigning application tasks, allocating the tasks to the IPs, and organising communication among them to achieve some specified objectives. The goal of this paper is to present a detailed state-of-the-art of research in the field of mapping and scheduling of applications on 3D NoC, classifying the works based on several dimensions and giving some potential research directions

arXiv.org e-Print Archive