66 research outputs found

    Analysis of randomized load distribution for reproduction trees in linear arrays and rings

    Get PDF
    AbstractHigh performance computing requires high quality load distribution of processes of a parallel application over processors in a parallel computer at runtime such that both maximum load and dilation are minimized. The performance of a simple randomized load distribution algorithm that dynamically supports tree-structured parallel computations on two simple static networks, namely, linear arrays and rings, is analyzed in this paper. The algorithm spreads newly created tree nodes to neighboring processors, which actually provides randomized dilation-1 tree embedding in a static network. We develop linear systems of equations that characterize expected loads on all processors, and find their closed form solutions under the reproduction tree model, which can generate trees of arbitrary size and shape. The main contribution of the paper is to show that the above simple randomized algorithm is able to generate high-quality dynamic tree embeddings even in very simple and sparse networks such as linear arrays and rings. In particular, we prove that as tree size becomes large, the asymptotic performance ratio of such a randomized dilation-1 tree embedding is N/(N−1) in linear arrays and is optimal in rings

    Aspects of practical implementations of PRAM algorithms

    Get PDF
    The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results: 1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio. 2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks. 3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications

    Time series prediction using supervised learning and tools from chaos theory

    Get PDF
    A thesis submitted to the Faculty of Science and Computing, University of Luton, in partial fulfilment of the requirements for the degree of Doctor of PhilosophyIn this work methods for performing time series prediction on complex real world time series are examined. In particular series exhibiting non-linear or chaotic behaviour are selected for analysis. A range of methodologies based on Takens' embedding theorem are considered and compared with more conventional methods. A novel combination of methods for determining the optimal embedding parameters are employed and tried out with multivariate financial time series data and with a complex series derived from an experiment in biotechnology. The results show that this combination of techniques provide accurate results while improving dramatically the time required to produce predictions and analyses, and eliminating a range of parameters that had hitherto been fixed empirically. The architecture and methodology of the prediction software developed is described along with design decisions and their justification. Sensitivity analyses are employed to justify the use of this combination of methods, and comparisons are made with more conventional predictive techniques and trivial predictors showing the superiority of the results generated by the work detailed in this thesis

    A task communication language and compiler for the NuMesh

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.Includes bibliographical references (leaves 71-72).by Jason Steven Cornez.M.Eng

    LIPIcs, Volume 258, SoCG 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 258, SoCG 2023, Complete Volum

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    Development of a Navier-Stokes algorithm for parallel-processing supercomputers

    Get PDF
    An explicit flow solver, applicable to the hierarchy of model equations ranging from Euler to full Navier-Stokes, is combined with several techniques designed to reduce computational expense. The computational domain consists of local grid refinements embedded in a global coarse mesh, where the locations of these refinements are defined by the physics of the flow. Flow characteristics are also used to determine which set of model equations is appropriate for solution in each region, thereby reducing not only the number of grid points at which the solution must be obtained, but also the computational effort required to get that solution. Acceleration to steady-state is achieved by applying multigrid on each of the subgrids, regardless of the particular model equations being solved. Since each of these components is explicit, advantage can readily be taken of the vector- and parallel-processing capabilities of machines such as the Cray X-MP and Cray-2

    Structural issues and energy efficiency in data centers

    Get PDF
    Mención Internacional en el título de doctorWith the rise of cloud computing, data centers have been called to play a main role in the Internet scenario nowadays. Despite this relevance, they are probably far from their zenith yet due to the ever increasing demand of contents to be stored in and distributed by the cloud, the need of computing power or the larger and larger amounts of data being analyzed by top companies such as Google, Microsoft or Amazon. However, everything is not always a bed of roses. Having a data center entails two major issues: they are terribly expensive to build, and they consume huge amounts of power being, therefore, terribly expensive to maintain. For this reason, cutting down the cost of building and increasing the energy efficiency (and hence reducing the carbon footprint) of data centers has been one of the hottest research topics during the last years. In this thesis we propose different techniques that can have an impact in both the building and the maintenance costs of data centers of any size, from small scale to large flagship data centers. The first part of the thesis is devoted to structural issues. We start by analyzing the bisection (band)width of a topology, of product graphs in particular, a useful parameter to compare and choose among different data center topologies. In that same part we describe the problem of deploying the servers in a data center as a Multidimensional Arrangement Problem (MAP) and propose a heuristic to reduce the deployment and wiring costs. We target energy efficiency in data centers in the second part of the thesis. We first propose a method to reduce the energy consumption in the data center network: rate adaptation. Rate adaptation is based on the idea of energy proportionality and aims to consume power on network devices proportionally to the load on their links. Our analysis proves that just using rate adaptation we may achieve average energy savings in the order of a 30-40% and up to a 60% depending on the network topology. We continue by characterizing the power requirements of a data center server given that, in order to properly increase the energy efficiency of a data center, we first need to understand how energy is being consumed. We present an exhaustive empirical characterization of the power requirements of multiple components of data center servers, namely, the CPU, the disks, and the network card. To do so, we devise different experiments to stress these components, taking into account the multiple available frequencies as well as the fact that we are working with multicore servers. In these experiments, we measure their energy consumption and identify their optimal operational points. Our study proves that the curve that defines the minimal power consumption of the CPU, as a function of the load in Active Cycles Per Second (ACPS), is neither concave nor purely convex. Moreover, it definitively has a superlinear dependence on the load. We also validate the accuracy of the model derived from our characterization by running different Hadoop applications in diverse scenarios obtaining an error below 4:1% on average. The last topic we study is the Virtual Machine Assignment problem (VMA), i.e., optimizing how virtual machines (VMs) are assigned to physical machines (PMs) in data centers. Our optimization target is to minimize the power consumed by all the PMs when considering that power consumption depends superlinearly on the load. We study four different VMA problems, depending on whether the number of PMs and their capacity are bounded or not. We study their complexity and perform an offline and online analysis of these problems. The online analysis is complemented with simulations that show that the online algorithms we propose consume substantially less power than other state of the art assignment algorithms.Programa Oficial de Doctorado en Ingeniería TelemáticaPresidente: Joerg Widmer.- Secretario: José Manuel Moya Fernández.- Vocal: Shmuel Zak

    A distributed multi-threaded data partitioner with space-filling curve orders

    Get PDF
    The problem discussed in this thesis is distributed data partitioning and data re-ordering on many-core architectures. We present extensive literature survey, with examples from various application domains - scientific computing, databases and large-scale graph processing. We propose a low-overhead partitioning framework based on geometry, that can be used to partition multi-dimensional data where the number of dimensions is >=2. The partitioner linearly orders items with good spatial locality. Partial output is stored on each process in the communication group. Space-filling curves are used to permute data - Morton order is the default curve. For dimensions <=3, we have options to generate Hilbert-like curves. Two metrics used to determine partitioning overheads are memory consumption and execution time, although these two factors are dependent on each other. The focus of this thesis is to reduce partitioning overheads as much as possible. We have described several optimizations to this end - incremental adjustments to partitions, careful dynamic memory management and using multi-threading and multi-processing to advantage. The quality of partitions is an important criteria for evaluating a partitioner. We have used graph partitioners as base-implementations against which our partitions are compared. The degree and edge-cuts of our partitions are comparable to graph partitions for regular grids. For irregular meshes, there is still room for improvement. No comparisons have been made for evaluating partitions of datasets without edges. We have deployed these partitions on two large applications - atmosphere simulation in 2D and adaptive mesh refinement in 3D. An adaptive mesh refinement benchmark was built to be part of the framework, which later became a testcase for evaluating partitions and load-balancing schemes. The performance of this benchmark is discussed in detail in the last chapter
    • …
    corecore