Search CORE

21 research outputs found

Time Lower Bounds for Parallel Network Computations

Author: Zago Nicola
Publication venue
Publication date: 29/01/2015
Field of study

Direct Acyclic Graphs (DAGs) are a suitable way to describe computations, expressing precedence constraints among operations. Beyond the representation of the execution of an algorithm, a DAG can effectively represent the execution of a parallel network. This last kind of DAG has a regular structure, consisting in the repetition over time of the original network; these common representations suggest a possible uniform approach in the study of execution of algorithms and emulation of networks. Both in parallel computing and computational complexity, DAGs have been extensively employed in the study of algorithmic features, as lower bounds for the execution/emulation time of algorithms/networks, the minimum quantity of memory needed for computing an algorithm or the minimum I/O complexity of an algorithm given a certain amount of fast memory cells. Developed techniques are quite different in their assumptions; one of the more fundamental differences is that some of them allow recomputation of intermediate results, while others disallow it, requiring the storage in memory of intermediate results for their further usages. In nowadays computations the trade-off between data recomputation and data storing is important both in parallel and in local elaborations, since in the former we can increase the bandwidth and reduce the latency with whom data can be accessed (by computing the same data in several points of the network), while in the latter we can avoid to pay the latency of the access in memory to reload data, by recomputing them possibly loading fewer data or using data already present in memory. So far it does not exist an universal technique able to foresee the strict lower bound for each execution of algorithm or emulation of network in each network and the known results derive from several theorems. On the contrary there are a lot of cases for which it neither exists a tight result; among these there are also emulations of extensively studied networks, such as multidimensional arrays. The first part of our thesis starts from this state-of-the-art: we propose a survey of several known lower bound techniques involving DAGs, followed by original theorems which clarify or solve open problems. In particular, in our survey we consider lower bound techniques for execution of algorithms and emulation of networks in parallel networks, showing their principles and their limits. In the discussion we show relationships among theorems, proving that no one of them is better of the others in general terms: there are counter-examples in which each theorem gives better bounds than others. We also exhibit examples where no bound among the considered techniques is tight. Moreover we generalize some theorems originally suited for network emulations, adapting them to execution of general DAGs in parallel networks, showing examples of their application. We also consider theorems for determining minimum I/O complexity, presenting similarities and differences with emulation theorems. One of the main results of the thesis is a new general technique which provides lower bounds almost tight (except for a logarithmic factor) in a class of network emulations including multidimensional arrays. We improve previously better known results which have a polynomial gap between lower bound and actual emulation time. Our theorem considers emulations with recomputation, giving results valid in the most general context. Finally we consider the role of recomputation in performance, trying to understand when it gives a real advantage respect to storing intermediate results in memory. In particular we introduce the problem in simple networks, showing a class of them in which recomputation can not improve I/O performance, ending in butterfly DAGs where recomputation can save a number of I/O accesses at least as big as the fast memory available during the computation. The approach used highlights the difficulty of exploit recomputation in executions of algorithms when their DAG representation exhibits an high bisection bandwidth

Archivio istituzionale della ricerca - Università di Padova

Work-preserving real-time emulation of meshes on butterfly networks

Author: Achilles Alf-Christian
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/1991
Field of study

The emulation of a guest network G on a host network H is work-preserving and real-time if the inefficiency, that is the ratio WG/WH of the amounts of work done in both networks, and the slowdown of the emulation are O(1). In this thesis we show that an infinite number of meshes can be emulated on a butterfly in a work-preserving real-time manner, despite the fact that any emulation of an s x s-node mesh in a butterfly with load 1 has a dilation of Ω(logs). The recursive embedding of a mesh in a butterfly presented by Koch et al. (STOC 1989), which forms the basis for our work, is corrected and generalized by relaxing unnecessary constraints. An algorithm determining the parameter for each stage of the recursion is described and a rigorous analysis of the resulting emulation shows that it is work-preserving and real-time for an infinite number of meshes. Data obtained from simulated embeddings suggests possible improvements to achieve a truly work-preserving emulation of the class of meshes on the class of butterflies

Digital Commons @ New Jersey Institute of Technology (NJIT)

Aspects of k-k-Routing in Meshes and OTIS Networks

Author: Osterloh Andre
Publication venue
Publication date: 27/10/2003
Field of study

Aspects of k-k Routing in Meshes and OTIS-Networks Abstract Efficient data transport in parallel computers build on sparse interconnection networks is crucial for their performance. A basic transport problem in such a computer is the k-k routing problem. In this thesis, aspects of the k-k routing problem on r-dimensional meshes and OTIS-G networks are discussed. The first oblivious routing algorithms for these networks are presented that solve the k-k routing problem in an asymptotically optimal running time and a constant buffer size. Furthermore, other aspects of the k-k routing problem for OTIS-G networks are analysed. In particular, lower bounds for the problem based on the diameter and bisection width of OTIS-G networks are given, and the k-k sorting problem on the OTIS-Mesh is considered. Based on OTIS-G networks, a new class of networks, called Extended OTIS-G networks, is introduced, which have smaller diameters than OTIS-G networks.Für die Leistungfähigkeit von Parallelrechnern, die über ein Verbindungsnetzwerk kommunizieren, ist ein effizienter Datentransport entscheidend. Ein grundlegendes Transportproblem in einem solchen Rechner ist das k-k Routing Problem. In dieser Arbeit werden Aspekte dieses Problems in r-dimensionalen Gittern und OTIS-G Netzwerken untersucht. Es wird der erste vergessliche (oblivious) Routing Algorithmus vorgestellt, der das k-k Routing Problem in diesen Netzwerken in einer asymptotisch optimalen Laufzeit bei konstanter Puffergröße löst. Für OTIS-G Netzwerke werden untere Laufzeitschranken für das untersuchte Problem angegeben, die auf dem Durchmesser und der Bisektionsweite der Netzwerke basieren. Weiterhin wird ein Algorithmus vorgestellt, der das k-k Sorting Problem mit einer Laufzeit löst, die nahe an der Bisektions- und Durchmesserschranke liegt. Basierend auf den OTIS-G Netzwerken, wird eine neue Klasse von Netzwerken eingeführt, die sogenannten Extended OTIS-G Netzwerke, die sich durch einen kleineren Durchmesser von OTIS-G Netzwerken unterscheiden

Digitale Bibliothek Thüringen

A Lower Bound Technique for Communication in BSP

Author: Bilardi Gianfranco
Scquizzato Michele
Silvestri Francesco
Publication venue
Publication date: 25/11/2017
Field of study

Communication is a major factor determining the performance of algorithms on current computing systems; it is therefore valuable to provide tight lower bounds on the communication complexity of computations. This paper presents a lower bound technique for the communication complexity in the bulk-synchronous parallel (BSP) model of a given class of DAG computations. The derived bound is expressed in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields tight lower bounds for the fast Fourier transform (FFT), and for any sorting and permutation network. A stronger bound is also derived for the periodic balanced sorting network, by applying this technique to suitable subnetworks. Finally, we demonstrate that the switching potential captures communication requirements even in computational models different from BSP, such as the I/O model and the LPRAM

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Delay analysis for wireless applications using a multiservice multiqueue processor sharing model

Author: Wang Y
Publication venue: RMIT University
Publication date: 01/01/2008
Field of study

The ongoing development of wireless networks supporting multimedia applications requires service providers to efficiently deliver complex Quality of Service (QoS) requirements. The wide range of new applications in these networks significantly increases the difficulty of network design and dimensioning to meet QoS requirements. Medium Access Control (MAC) protocols affect QoS achieved by wireless networks. Research on analysis and performance evaluation is important for the efficient protocol design. As wireless networks feature scarce resources that are simultaneously shared by all users, processor sharing (PS) models were proposed for modelling resource sharing mechanisms in such systems. In this thesis, multi-priority MAC protocols are proposed for handling the various service traffic types. Then, an investigation of multiservice multiqueue PS models is undertaken to analyse the delay for some recently proposed wireless applications. We start with an introduction to MAC protocols for wireless networks which are specified in IEEE standards and then review scheduling algorithms which were proposed to work with the underlying MAC protocols to cooperatively achieve QoS goals. An overview of the relevant literature is given on PS models for performance analysis and evaluation of scheduling algorithms. We propose a multiservice multiqueue PS model using a scheduling scheme in multimedia wireless networks with a comprehensive description of the analytical solution. Firstly, we describe the existing multiqueue processor sharing (MPS) model, which uses a fixed service quantum at each queue, and correct a subtle incongruity in previous solutions presented in the literature. Secondly, a new scheduling framework is proposed to extend the previous MPS model to a general case. This newly proposed analytical approach is based on the idea that the service quantum arranged by a MAC scheduling controller to service data units can be priority-based. We obtain a closed-form expression for the mean delay of each service class in this model. In summary, our new approach simplifies MAC protocols for multimedia applications into an analytical model that includes more complex and realistic traffic models without compromising details of the protocol and significantly reduces the number of MAC headers, thus the overall average delay will be decreased. In response to using the studied multiservice multiqueue PS models, we apply the MPS model to two wireless applications: Push to Talk (PTT) service over GPRS/GSM networks and the Worldwide Interoperability for Microwave Access (WiMAX) networks. We investigate the uplink delay of PTT over traditional GPRS/GSM networks and the uplink delay for WiMAX Subscriber Station scheduler under a priority-based fair scheduling. MAC structures capable of supporting dynamically varying traffic are studied for the networks, especially, with the consideration of implementation issues. The model provides useful insights into the dynamic performance behaviours of GPRS/GSM and WiMAX networks with respect to various system parameters and comprehensive traffic conditions. We then evaluate the model under some different practical traffic scenarios. Through modelling of the operation of wireless access systems, under a variety of multimedia traffic, our analytical approaches provide practical analysis guidelines for wireless network dimensioning

RMIT Research Repository

Recommended from our members

Design and Optimization of Networks-on-Chip for Future Heterogeneous Systems-on-Chip

Author: Yoon Young Jin
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

Due to the tight power budget and reduced time-to-market, Systems-on-Chip (SoC) have emerged as a power-efficient solution that provides the functionality required by target applications in embedded systems. To support a diverse set of applications such as real-time video/audio processing and sensor signal processing, SoCs consist of multiple heterogeneous components, such as software processors, digital signal processors, and application-specific hardware accelerators. These components offer different flexibility, power, and performance values so that SoCs can be designed by mix-and-matching them. With the increased amount of heterogeneous cores, however, the traditional interconnects in an SoC exhibit excessive power dissipation and poor performance scalability. As an alternative, Networks-on-Chip (NoC) have been proposed. NoCs provide modularity at design-time because communications among the cores are isolated from their computations via standard interfaces. NoCs also exploit communication parallelism at run-time because multiple data can be transferred simultaneously. In order to construct an efficient NoC, the communication behaviors of various heterogeneous components in an SoC must be considered with the large amount of NoC design parameters. Therefore, providing an efficient NoC design and optimization framework is critical to reduce the design cycle and address the complexity of future heterogeneous SoCs. This is the thesis of my dissertation. Some existing design automation tools for NoCs support very limited degrees of automation that cannot satisfy the requirements of future heterogeneous SoCs. First, these tools only support a limited number of NoC design parameters. Second, they do not provide an integrated environment for software-hardware co-development. Thus, I propose FINDNOC, an integrated framework for the generation, optimization, and validation of NoCs for future heterogeneous SoCs. The proposed framework supports software-hardware co-development, incremental NoC design-decision model, SystemC-based NoC customization and generation, and fast system protyping with FPGA emulations. Virtual channels (VC) and multiple physical (MP) networks are the two main alternative methods to provide better performance, support quality-of-service, and avoid protocol deadlocks in packet-switched NoC design. To examine the effect of using VCs and MPs with other NoC architectural parameters, I completed a comprehensive comparative analysis that combines an analytical model, synthesis-based designs for both FPGAs and standard-cell libraries, and system-level simulations. Based on the results of this analysis, I developed VENTTI, a design and simulation environment that combines a virtual platform (VP), a NoC synthesis tool, and four NoC models characterized at different abstraction levels. VENTTI facilitates an incremental decision-making process with four NoC abstraction models associated with different NoC parameters. The selected NoC parameters can be validated by running simulations with the corresponding model instantiated in the VP. I augmented this framework to complete FINDNOC by implementing ICON, a NoC generation and customization tool that dynamically combines and customizes synthesizable SystemC components from a predesigned library. Thanks to its flexibility and automatic network interface generation capabilities, ICON can generate a rich variety of NoCs that can be then integrated into any Embedded Scalable Platform (ESP) architectures for fast prototying with FPGA emulations. I designed FINDNOC in a modular way that makes it easy to augmenting it with new capabilities. This, combined with the continuous progress of the ESP design methodology, will provide a seamless SoC integration framework, where the hardware accelerators, software applications, and NoCs can be designed, validated, and integrated simultaneously, in order to reduce the design cycle of future SoC platforms

Columbia University Academic Commons

Parallel and Distributed Computing

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

Directory of Open Access Books (DOAB)

Energy-efficient Transitional Near-* Computing

Author: Graubner Pablo Karl
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2018
Field of study

Studies have shown that communication networks, devices accessing the Internet, and data centers account for 4.6% of the worldwide electricity consumption. Although data centers, core network equipment, and mobile devices are getting more energy-efficient, the amount of data that is being processed, transferred, and stored is vastly increasing. Recent computer paradigms, such as fog and edge computing, try to improve this situation by processing data near the user, the network, the devices, and the data itself. In this thesis, these trends are summarized under the new term near-* or near-everything computing. Furthermore, a novel paradigm designed to increase the energy efficiency of near-* computing is proposed: transitional computing. It transfers multi-mechanism transitions, a recently developed paradigm for a highly adaptable future Internet, from the field of communication systems to computing systems. Moreover, three types of novel transitions are introduced to achieve gains in energy efficiency in near-* environments, spanning from private Infrastructure-as-a-Service (IaaS) clouds, Software-defined Wireless Networks (SDWNs) at the edge of the network, Disruption-Tolerant Information-Centric Networks (DTN-ICNs) involving mobile devices, sensors, edge devices as well as programmable components on a mobile System-on-a-Chip (SoC). Finally, the novel idea of transitional near-* computing for emergency response applications is presented to assist rescuers and affected persons during an emergency event or a disaster, although connections to cloud services and social networks might be disturbed by network outages, and network bandwidth and battery power of mobile devices might be limited

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg