15 research outputs found

    Aspects of practical implementations of PRAM algorithms

    Get PDF
    The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results: 1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio. 2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks. 3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications

    Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks

    Get PDF
    Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment

    Some Theoretical Results of Hypercube for Parallel Architecture

    Get PDF
    This paper surveys some theoretical results of the hypercube for design of VLSI architecture. The parallel computer including the hypercube multiprocessor will become a leading technology that supports efficient computation for large uncertain systems

    Uncertainty Quantification for Numerical Models with Two Regions of Solution

    Get PDF
    Complex numerical models and simulators are essential for representing real life physical systems so that we can make predictions and get a better understanding of the systems themselves. For certain models, the outputs can behave very differently for some input parameters as compared with others, and hence, we end up with distinct bounded regions in the input space. The aim of this thesis is to develop methods for uncertainty quantification for such models. Emulators act as `black box' functions to statistically represent the relationships between complex simulator inputs and outputs. It is important not to assume continuity across the output space as there may be discontinuities between the distinct regions. Therefore, it is not possible to use one single Gaussian process emulator (GP) for the entire model. Further, model outputs can take any form and can be either qualitative or quantitative. For example, there may be computer code for a complex model that fails to run for certain input values. In such an example, the output data would correspond to separate binary outcomes of either `runs' or`fails to run'. Classification methods can be used to split the input space into separate regions according to their associated outputs. Existing classification methods include logistic regression, which models the probability of being classified into one of two regions. However, to make classification predictions we often draw from an independent Bernoulli distribution (0 represents one region and 1 represents the other), meaning that a distance relationship is lost from the independent draws, and so can result in many misclassifications. The first section of this thesis presents a new method for classification, where the model outputs are given distinct classifying labels, which are modelled using a latent Gaussian process. The latent variable is estimated using MCMC sampling, a unique likelihood and distinct prior specifications. The classifier is then verified by calculating a misclassification rate across the input space. By modelling the labels using a latent GP, the major problems associated with logistic regression are avoided. The novel method is applied to a range of examples, including a motivating example which models the hormones associated with the reproductive system in mammals. The two labelled outputs are high and low rates of reproduction. The remainder of this thesis looks into developing a correlated Bernoulli process to solve the independent drawing problems found when using logistic regression. If simulating chains or fields of 0’s and 1’s, it is hard to control the ‘stickiness’ of like symbols. Presented here is a novel approach for a correlated Bernoulli process to create chains of 0’s and 1’s, for which like symbols cluster together. The structure is used from de Bruijn Graphs - a directed graph, where given a set of symbols, V, and a ‘word’ length, m, the nodes of the graph consist of all possible sequences of V of length m. De Bruijn Graphs are a generalisation of Markov chains, where the ‘word’ length controls the number of states that each individual state is dependent on. This increases correlation over a wider area. A de Bruijn process is defined along with run length properties and inference. Ways of expanding this process to higher dimensions are also presented

    Multiple Bus Networks for Binary -Tree Algorithms.

    Get PDF
    Multiple bus networks (MBN) connect processors via buses. This dissertation addresses issues related to running binary-tree algorithms on MBNs. These algorithms are of a fundamental nature, and reduce inputs at leaves of a binary tree to a result at the root. We study the relationships between running time, degree (maximum number of connections per processor) and loading (maximum number of connections per bus). We also investigate fault-tolerance, meshes enhanced with MBNs, and VLSI layouts for binary-tree MBNs. We prove that the loading of optimal-time, degree-2, binary-tree MBNs is non-constant. In establishing this result, we derive three loading lower bounds Wn , W&parl0;n23&parr0; and W&parl0;nlogn&parr0; , each tighter than the previous one. We also show that if the degree is increased to 3, then the loading can be a constant. A constant loading degree-2 MBN exists, if the algorithm is allowed to run slower than the optimal. We introduce a new enhanced mesh architecture (employing binary-tree MBNs) that captures features of all existing enhanced meshes. This architecture is more flexible, allowing all existing enhanced mesh results to be ported to a more implementable platform. We present two methods for imparting tolerance to bus and processor faults in binary-tree MBNs. One of the methods is general, and can be used with any MBN and for both processor and bus faults. A key feature of this method is that it permits the network designer to designate a set of buses as unimportant and consider all faulty buses as unimportant. This minimizes the impact of faulty elements on the MBN. The second method is specific to bus faults in binary-tree MBNs, whose features it exploits to produce faster solutions. We also derive a series of results that distill the lower bound on the perimeter layout area of optimal-time, binary-tree MBNs to a single conjecture. Based on this we believe that optimal-time, binary-tree MBNs require no less area than a balanced tree topology even though such MBNs can reuse buses over various steps of the algorithm

    Control of sectioned on-chip communication

    Get PDF

    The Connected Caribbean

    Get PDF
    The modern-day Caribbean is a stunningly diverse but also intricately interconnected geo-cultural region, resulting partly from the islands’ shared colonial histories and an increasingly globalizing economy. Perhaps more importantly, before the encounter between the New and Old World took place, the indigenous societies and cultures of the pre-colonial Caribbean were already united in diversity. This work seeks to study the patterns of this pre-colonial homogeneity and diversity and uncover some of their underlying processes and dynamics. In contrast to earlier studies of its kind, this study adopts an archaeological network approach, in part derived from the network sciences. In archaeology, network approaches can be used to explore the complex relations between objects, sites or other archaeological features, and as such represents a powerful new tool for studying material culture systems. Archaeological research in general aims to uncover the social relations and human interactions underlying these material culture systems. Therefore, the interdependencies between social networks and material culture systems are another major focus of this study. This approach and theoretical framework is tested in four case studies dealing with lithic distribution networks, site assemblages as ego-networks, indigenous political networks, and the analysis of artefact styles in 2-mode networks. These were selected for their pertinence to key research themes in Caribbean archaeology, in particular the current debates about the nature of ties and interactions between culturally different communities in the region, and the structure and dynamics of pre-colonial socio-political organisation. The outcomes of these case studies show that archaeological network approaches can provide surprising new insights into longstanding questions about the patterns of pre-colonial connectivity in the region

    Reliability analysis of flood defence structures and systems in Europe

    Full text link
    corecore