21 research outputs found

    Time Lower Bounds for Parallel Network Computations

    Get PDF
    Direct Acyclic Graphs (DAGs) are a suitable way to describe computations, expressing precedence constraints among operations. Beyond the representation of the execution of an algorithm, a DAG can effectively represent the execution of a parallel network. This last kind of DAG has a regular structure, consisting in the repetition over time of the original network; these common representations suggest a possible uniform approach in the study of execution of algorithms and emulation of networks. Both in parallel computing and computational complexity, DAGs have been extensively employed in the study of algorithmic features, as lower bounds for the execution/emulation time of algorithms/networks, the minimum quantity of memory needed for computing an algorithm or the minimum I/O complexity of an algorithm given a certain amount of fast memory cells. Developed techniques are quite different in their assumptions; one of the more fundamental differences is that some of them allow recomputation of intermediate results, while others disallow it, requiring the storage in memory of intermediate results for their further usages. In nowadays computations the trade-off between data recomputation and data storing is important both in parallel and in local elaborations, since in the former we can increase the bandwidth and reduce the latency with whom data can be accessed (by computing the same data in several points of the network), while in the latter we can avoid to pay the latency of the access in memory to reload data, by recomputing them possibly loading fewer data or using data already present in memory. So far it does not exist an universal technique able to foresee the strict lower bound for each execution of algorithm or emulation of network in each network and the known results derive from several theorems. On the contrary there are a lot of cases for which it neither exists a tight result; among these there are also emulations of extensively studied networks, such as multidimensional arrays. The first part of our thesis starts from this state-of-the-art: we propose a survey of several known lower bound techniques involving DAGs, followed by original theorems which clarify or solve open problems. In particular, in our survey we consider lower bound techniques for execution of algorithms and emulation of networks in parallel networks, showing their principles and their limits. In the discussion we show relationships among theorems, proving that no one of them is better of the others in general terms: there are counter-examples in which each theorem gives better bounds than others. We also exhibit examples where no bound among the considered techniques is tight. Moreover we generalize some theorems originally suited for network emulations, adapting them to execution of general DAGs in parallel networks, showing examples of their application. We also consider theorems for determining minimum I/O complexity, presenting similarities and differences with emulation theorems. One of the main results of the thesis is a new general technique which provides lower bounds almost tight (except for a logarithmic factor) in a class of network emulations including multidimensional arrays. We improve previously better known results which have a polynomial gap between lower bound and actual emulation time. Our theorem considers emulations with recomputation, giving results valid in the most general context. Finally we consider the role of recomputation in performance, trying to understand when it gives a real advantage respect to storing intermediate results in memory. In particular we introduce the problem in simple networks, showing a class of them in which recomputation can not improve I/O performance, ending in butterfly DAGs where recomputation can save a number of I/O accesses at least as big as the fast memory available during the computation. The approach used highlights the difficulty of exploit recomputation in executions of algorithms when their DAG representation exhibits an high bisection bandwidth

    Work-preserving real-time emulation of meshes on butterfly networks

    Get PDF
    The emulation of a guest network G on a host network H is work-preserving and real-time if the inefficiency, that is the ratio WG/WH of the amounts of work done in both networks, and the slowdown of the emulation are O(1). In this thesis we show that an infinite number of meshes can be emulated on a butterfly in a work-preserving real-time manner, despite the fact that any emulation of an s x s-node mesh in a butterfly with load 1 has a dilation of Ω(logs). The recursive embedding of a mesh in a butterfly presented by Koch et al. (STOC 1989), which forms the basis for our work, is corrected and generalized by relaxing unnecessary constraints. An algorithm determining the parameter for each stage of the recursion is described and a rigorous analysis of the resulting emulation shows that it is work-preserving and real-time for an infinite number of meshes. Data obtained from simulated embeddings suggests possible improvements to achieve a truly work-preserving emulation of the class of meshes on the class of butterflies

    Aspects of k-k-Routing in Meshes and OTIS Networks

    Get PDF
    Aspects of k-k Routing in Meshes and OTIS-Networks Abstract Efficient data transport in parallel computers build on sparse interconnection networks is crucial for their performance. A basic transport problem in such a computer is the k-k routing problem. In this thesis, aspects of the k-k routing problem on r-dimensional meshes and OTIS-G networks are discussed. The first oblivious routing algorithms for these networks are presented that solve the k-k routing problem in an asymptotically optimal running time and a constant buffer size. Furthermore, other aspects of the k-k routing problem for OTIS-G networks are analysed. In particular, lower bounds for the problem based on the diameter and bisection width of OTIS-G networks are given, and the k-k sorting problem on the OTIS-Mesh is considered. Based on OTIS-G networks, a new class of networks, called Extended OTIS-G networks, is introduced, which have smaller diameters than OTIS-G networks.Für die Leistungfähigkeit von Parallelrechnern, die über ein Verbindungsnetzwerk kommunizieren, ist ein effizienter Datentransport entscheidend. Ein grundlegendes Transportproblem in einem solchen Rechner ist das k-k Routing Problem. In dieser Arbeit werden Aspekte dieses Problems in r-dimensionalen Gittern und OTIS-G Netzwerken untersucht. Es wird der erste vergessliche (oblivious) Routing Algorithmus vorgestellt, der das k-k Routing Problem in diesen Netzwerken in einer asymptotisch optimalen Laufzeit bei konstanter Puffergröße löst. Für OTIS-G Netzwerke werden untere Laufzeitschranken für das untersuchte Problem angegeben, die auf dem Durchmesser und der Bisektionsweite der Netzwerke basieren. Weiterhin wird ein Algorithmus vorgestellt, der das k-k Sorting Problem mit einer Laufzeit löst, die nahe an der Bisektions- und Durchmesserschranke liegt. Basierend auf den OTIS-G Netzwerken, wird eine neue Klasse von Netzwerken eingeführt, die sogenannten Extended OTIS-G Netzwerke, die sich durch einen kleineren Durchmesser von OTIS-G Netzwerken unterscheiden

    A Lower Bound Technique for Communication in BSP

    Get PDF
    Communication is a major factor determining the performance of algorithms on current computing systems; it is therefore valuable to provide tight lower bounds on the communication complexity of computations. This paper presents a lower bound technique for the communication complexity in the bulk-synchronous parallel (BSP) model of a given class of DAG computations. The derived bound is expressed in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields tight lower bounds for the fast Fourier transform (FFT), and for any sorting and permutation network. A stronger bound is also derived for the periodic balanced sorting network, by applying this technique to suitable subnetworks. Finally, we demonstrate that the switching potential captures communication requirements even in computational models different from BSP, such as the I/O model and the LPRAM

    Delay analysis for wireless applications using a multiservice multiqueue processor sharing model

    Get PDF
    The ongoing development of wireless networks supporting multimedia applications requires service providers to efficiently deliver complex Quality of Service (QoS) requirements. The wide range of new applications in these networks significantly increases the difficulty of network design and dimensioning to meet QoS requirements. Medium Access Control (MAC) protocols affect QoS achieved by wireless networks. Research on analysis and performance evaluation is important for the efficient protocol design. As wireless networks feature scarce resources that are simultaneously shared by all users, processor sharing (PS) models were proposed for modelling resource sharing mechanisms in such systems. In this thesis, multi-priority MAC protocols are proposed for handling the various service traffic types. Then, an investigation of multiservice multiqueue PS models is undertaken to analyse the delay for some recently proposed wireless applications. We start with an introduction to MAC protocols for wireless networks which are specified in IEEE standards and then review scheduling algorithms which were proposed to work with the underlying MAC protocols to cooperatively achieve QoS goals. An overview of the relevant literature is given on PS models for performance analysis and evaluation of scheduling algorithms. We propose a multiservice multiqueue PS model using a scheduling scheme in multimedia wireless networks with a comprehensive description of the analytical solution. Firstly, we describe the existing multiqueue processor sharing (MPS) model, which uses a fixed service quantum at each queue, and correct a subtle incongruity in previous solutions presented in the literature. Secondly, a new scheduling framework is proposed to extend the previous MPS model to a general case. This newly proposed analytical approach is based on the idea that the service quantum arranged by a MAC scheduling controller to service data units can be priority-based. We obtain a closed-form expression for the mean delay of each service class in this model. In summary, our new approach simplifies MAC protocols for multimedia applications into an analytical model that includes more complex and realistic traffic models without compromising details of the protocol and significantly reduces the number of MAC headers, thus the overall average delay will be decreased. In response to using the studied multiservice multiqueue PS models, we apply the MPS model to two wireless applications: Push to Talk (PTT) service over GPRS/GSM networks and the Worldwide Interoperability for Microwave Access (WiMAX) networks. We investigate the uplink delay of PTT over traditional GPRS/GSM networks and the uplink delay for WiMAX Subscriber Station scheduler under a priority-based fair scheduling. MAC structures capable of supporting dynamically varying traffic are studied for the networks, especially, with the consideration of implementation issues. The model provides useful insights into the dynamic performance behaviours of GPRS/GSM and WiMAX networks with respect to various system parameters and comprehensive traffic conditions. We then evaluate the model under some different practical traffic scenarios. Through modelling of the operation of wireless access systems, under a variety of multimedia traffic, our analytical approaches provide practical analysis guidelines for wireless network dimensioning

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Energy-efficient Transitional Near-* Computing

    Get PDF
    Studies have shown that communication networks, devices accessing the Internet, and data centers account for 4.6% of the worldwide electricity consumption. Although data centers, core network equipment, and mobile devices are getting more energy-efficient, the amount of data that is being processed, transferred, and stored is vastly increasing. Recent computer paradigms, such as fog and edge computing, try to improve this situation by processing data near the user, the network, the devices, and the data itself. In this thesis, these trends are summarized under the new term near-* or near-everything computing. Furthermore, a novel paradigm designed to increase the energy efficiency of near-* computing is proposed: transitional computing. It transfers multi-mechanism transitions, a recently developed paradigm for a highly adaptable future Internet, from the field of communication systems to computing systems. Moreover, three types of novel transitions are introduced to achieve gains in energy efficiency in near-* environments, spanning from private Infrastructure-as-a-Service (IaaS) clouds, Software-defined Wireless Networks (SDWNs) at the edge of the network, Disruption-Tolerant Information-Centric Networks (DTN-ICNs) involving mobile devices, sensors, edge devices as well as programmable components on a mobile System-on-a-Chip (SoC). Finally, the novel idea of transitional near-* computing for emergency response applications is presented to assist rescuers and affected persons during an emergency event or a disaster, although connections to cloud services and social networks might be disturbed by network outages, and network bandwidth and battery power of mobile devices might be limited
    corecore