67 research outputs found
Deterministic Computations on a PRAM with Static Processor and Memory Faults.
We consider Parallel Random Access Machine (PRAM) which has some processors
and memory cells faulty. The faults considered are static, i.e., once the
machine starts to operate, the operational/faulty status of PRAM components
does not change. We develop a deterministic simulation of a fully operational
PRAM on a similar faulty machine which has constant fractions of faults among
processors and memory cells. The simulating PRAM has processors and
memory cells, and simulates a PRAM with processors and a constant fraction
of memory cells. The simulation is in two phases: it starts with
preprocessing, which is followed by the simulation proper performed in a
step-by-step fashion. Preprocessing is performed in time . The slowdown of a step-by-step part of the simulation is
The Parallel Persistent Memory Model
We consider a parallel computational model that consists of processors,
each with a fast local ephemeral memory of limited size, and sharing a large
persistent memory. The model allows for each processor to fault with bounded
probability, and possibly restart. On faulting all processor state and local
ephemeral memory are lost, but the persistent memory remains. This model is
motivated by upcoming non-volatile memories that are as fast as existing random
access memory, are accessible at the granularity of cache lines, and have the
capability of surviving power outages. It is further motivated by the
observation that in large parallel systems, failure of processors and their
caches is not unusual.
Within the model we develop a framework for developing locality efficient
parallel algorithms that are resilient to failures. There are several
challenges, including the need to recover from failures, the desire to do this
in an asynchronous setting (i.e., not blocking other processors when one
fails), and the need for synchronization primitives that are robust to
failures. We describe approaches to solve these challenges based on breaking
computations into what we call capsules, which have certain properties, and
developing a work-stealing scheduler that functions properly within the context
of failures. The scheduler guarantees a time bound of in expectation, where and are the work and
depth of the computation (in the absence of failures), is the average
number of processors available during the computation, and is the
probability that a capsule fails. Within the model and using the proposed
methods, we develop efficient algorithms for parallel sorting and other
primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same
nam
\~{O}ptimal Vertex Fault-Tolerant Spanners in \~{O}ptimal Time: Sequential, Distributed and Parallel
We (nearly) settle the time complexity for computing vertex fault-tolerant
(VFT) spanners with optimal sparsity (up to polylogarithmic factors). VFT
spanners are sparse subgraphs that preserve distance information, up to a small
multiplicative stretch, in the presence of vertex failures. These structures
were introduced by [Chechik et al., STOC 2009] and have received a lot of
attention since then. We provide algorithms for computing nearly optimal
-VFT spanners for any -vertex -edge graph, with near optimal running
time in several computational models:
- A randomized sequential algorithm with a runtime of
(i.e., independent in the number of faults ). The state-of-the-art time
bound is by [Bodwin, Dinitz and
Robelle, SODA 2021].
- A distributed congest algorithm of rounds. Improving
upon [Dinitz and Robelle, PODC 2020] that obtained FT spanners with
near-optimal sparsity in rounds.
- A PRAM (CRCW) algorithm with work and
depth. Prior bounds implied by [Dinitz and Krauthgamer, PODC 2011] obtained
sub-optimal FT spanners using work and
depth.
An immediate corollary provides the first nearly-optimal PRAM algorithm for
computing nearly optimal -\emph{vertex} connectivity certificates
using polylogarithmic depth and near-linear work. This improves the
state-of-the-art parallel bounds of depth and
work, by [Karger and Motwani, STOC'93].Comment: STOC 202
Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures
Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a chip being widely available and an increasing number of cores predicted for the future. In addition, the decreasing costs and increasing programmability of Graphic Processing Units (GPUs) have made these an accessible source of parallel processing power in general purpose computing. Among the many research challenges that this scenario has raised are the fundamental problems related to theoretical modeling of computation in these architectures. In this thesis we study several aspects of computation in modern parallel architectures, from modeling of computation in multi-cores and heterogeneous platforms, to multi-core cache management strategies, through the proposal of an architecture that exploits bit-parallelism on thousands of bits.
Observing that in practice multi-cores have a small number of cores, we propose a model for low-degree parallelism for these architectures. We argue that assuming a small number of processors (logarithmic in a problem's input size) simplifies the design of parallel algorithms. We show that in this model a large class of divide-and-conquer and dynamic programming algorithms can be parallelized with simple modifications to sequential programs, while achieving optimal parallel speedups. We further explore low-degree-parallelism in computation, providing evidence of fundamental differences in practice and theory between systems with a sublinear and linear number of processors, and suggesting a sharp theoretical gap between the classes of problems that are efficiently parallelizable in each case.
Efficient strategies to manage shared caches play a crucial role in multi-core performance. We propose a model for paging in multi-core shared caches, which extends classical paging to a setting in which several threads share the cache. We show that in this setting traditional cache management policies perform poorly, and that any effective strategy must partition the cache among threads, with a partition that adapts dynamically to the demands of each thread. Inspired by the shared cache setting,
we introduce the minimum cache usage problem, an extension to classical sequential paging in which algorithms must account for the amount of cache they use.
This cache-aware model seeks algorithms with good performance in terms of faults and the amount of cache used, and has applications in energy efficient caching and in shared cache scenarios.
The wide availability of GPUs has added to the parallel power of multi-cores, however, most applications underutilize the available resources. We propose a model for hybrid computation in heterogeneous systems with multi-cores and GPU, and describe strategies for generic parallelization and efficient scheduling of a large class of divide-and-conquer algorithms.
Lastly, we introduce the Ultra-Wide Word architecture and model, an extension of the word-RAM model, that allows for constant time operations on thousands of bits in parallel. We show that a large class of existing algorithms can be
implemented in the Ultra-Wide Word model, achieving speedups comparable to those of multi-threaded computations, while avoiding the more difficult aspects of parallel programming
Aspects of practical implementations of PRAM algorithms
The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results:
1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio.
2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks.
3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications
Undirected -Shortest Paths via Minor-Aggregates: Near-Optimal Deterministic Parallel & Distributed Algorithms
This paper presents near-optimal deterministic parallel and distributed
algorithms for computing -approximate single-source shortest
paths in any undirected weighted graph.
On a high level, we deterministically reduce this and other shortest-path
problems to Minor-Aggregations. A Minor-Aggregation computes an
aggregate (e.g., max or sum) of node-values for every connected component of
some subgraph.
Our reduction immediately implies:
Optimal deterministic parallel (PRAM) algorithms with depth
and near-linear work.
Universally-optimal deterministic distributed (CONGEST) algorithms, whenever
deterministic Minor-Aggregate algorithms exist. For example, an optimal
-round deterministic CONGEST algorithm for
excluded-minor networks.
Several novel tools developed for the above results are interesting in their
own right:
A local iterative approach for reducing shortest path computations "up to
distance " to computing low-diameter decompositions "up to distance
". Compared to the recursive vertex-reduction approach of [Li20],
our approach is simpler, suitable for distributed algorithms, and eliminates
many derandomization barriers.
A simple graph-based -competitive -oblivious routing
based on low-diameter decompositions that can be evaluated in near-linear work.
The previous such routing [ZGY+20] was -competitive and required
more work.
A deterministic algorithm to round any fractional single-source transshipment
flow into an integral tree solution.
The first distributed algorithms for computing Eulerian orientations
Resilience of an embedded architecture using hardware redundancy
In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound.
Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures.
In this work we study the concepts of fault tolerance and dependability and
extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators
Tight bounds for parallel randomized load balancing
Given a distributed system of n balls and n bins, how evenly can we distribute the balls to the bins, minimizing communication? The fastest non-adaptive and symmetric algorithm achieving a constant maximum bin load requires Θ(loglogn) rounds, and any such algorithm running for r∈O(1) rounds incurs a bin load of Ω((logn/loglogn)1/r). In this work, we explore the fundamental limits of the general problem. We present a simple adaptive symmetric algorithm that achieves a bin load of 2 in log∗n+O(1) communication rounds using O(n) messages in total. Our main result, however, is a matching lower bound of (1−o(1))log∗n on the time complexity of symmetric algorithms that guarantee small bin loads. The essential preconditions of the proof are (i) a limit of O(n) on the total number of messages sent by the algorithm and (ii) anonymity of bins, i.e., the port numberings of balls need not be globally consistent. In order to show that our technique yields indeed tight bounds, we provide for each assumption an algorithm violating it, in turn achieving a constant maximum bin load in constant time.German Research Foundation (DFG, reference number Le 3107/1-1)Society of Swiss Friends of the Weizmann Institute of ScienceSwiss National Fun
Turku Centre for Computer Science – Annual Report 2013
Due to a major reform of organization and responsibilities of TUCS, its role, activities, and even structures have been under reconsideration in 2013. The traditional pillar of collaboration at TUCS, doctoral training, was reorganized due to changes at both universities according to the renewed national system for doctoral education. Computer Science and Engineering and Information Systems Science are now accompanied by Mathematics and Statistics in newly established doctoral programs at both University of Turku and Åbo Akademi University. Moreover, both universities granted sufficient resources to their respective programmes for doctoral training in these fields, so that joint activities at TUCS can continue. The outcome of this reorganization has the potential of proving out to be a success in terms of scientific profile as well as the quality and quantity of scientific and educational results.
International activities that have been characteristic to TUCS since its inception continue strong. TUCS’ participation in European collaboration through EIT ICT Labs Master’s and Doctoral School is now more active than ever. The new double degree programs at MSc and PhD level between University of Turku and Fudan University in Shaghai, P.R.China were succesfully set up and are
now running for their first year. The joint students will add to the already international athmosphere of the ICT House.
The four new thematic reseach programmes set up acccording to the decision by the TUCS Board have now established themselves, and a number of events and other activities saw the light in 2013. The TUCS Distinguished Lecture Series managed to gather a large audience with its several prominent speakers. The development of these and other research centre activities continue, and
new practices and structures will be initiated to support the tradition of close academic collaboration.
The TUCS’ slogan Where Academic Tradition Meets the Exciting Future has proven true throughout these changes. Despite of the dark clouds on the national and European economic sky, science and higher education in the field have managed to retain all the key ingredients for success. Indeed, the future of ICT and Mathematics in Turku seems exciting.</p
- …