7 research outputs found

    Parallel Restructuring and Evaluation of Expressions

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryJoint Services Electronics Program / N00014-84-C-0149National Science Foundation / CCR-87-0380

    Adversarial Analyses of Window Backoff Strategies for Simple Multiple-Access Channels

    Get PDF
    Backoff strategies have typically been analyzed by making statistical assumptions on the distribution of problem inputs. Although these analyses have provided valuable insights into the efficacy of various backoff strategies, they leave open the question as to which backoff algorithms perform best in the worst case or on inputs, such as bursty inputs, that are not covered by the statistical models. This paper analyzes randomized backoff strategies using worst-case assumptions on the inputs. Specifically, we analyze algorithms for simple multiple-access channels, where the only feedback from each attempt to send a packet is a single bit indicating whether the transmission succeeded or the packet collided with another packet. We analyze a class of strategies, called window strategies, where each packet partitions time into a sequence (W₁, W₂,...) of windows. Within each window, the packet makes an access attempt during a single randomly selected slot. If its transmission is unsuccessful, it waits for its slot in the next window before retrying. We use delay-sequence arguments to show that for the batch problem, in which n packets all arrive at time 0, if every window has size W = Θ(n), then with high probability, all packets successfully transmit with makespan n lg lg n ± O(n). We use this result to analyze window backoff strategies with varying window sizes. Specifically, we show that the familiar binary exponential backoff algorithm, where Wk = Θ(2k), has makespan Θ(n lg n), and that more generally, for any constant r > 1, the r-exponential backoff algorithm in which Wk = Θ(rk) has makespan Θ(n lglg rn). We also show that for any constant r > 1, the r-polynomial backoff algorithm, in which Wk = Θ(kr), has makespan Θ((n/lg n)¹⁺¹/r). All of these batch strategies are monotonic, in the sense that the window size monotonically increases over time. We exhibit a monotonic backoff algorithm that achieves makespan Θ(n lg lg n/lg lg lg n). We prove that this algorithm, whose backoff is superpolynomial and subexponential, is optimal over all monotonic backoff schemes. In addition, we exhibit a simple backoff/backon algorithm, having window sizes that vary nonmonotonically according to a "sawtooth" pattern, that achieves the optimal makespan of Θ(n). We study the online setting using an adversarial queueing model. We define a (λ,T)-stream to be an input stream of packets in which at most n = λT packets arrive during any time interval of size T. In this model, to evaluate a given backoff algorithm (which does not know λ or T), we analyze the worst-case behavior of the algorithm over the class of (λ,T)-streams. Our results for the online setting focus on exponential backoff. We show that for any arrival rate λ, there exists a sufficiently large interval size T such that the throughput goes to 0 for some (λ,T)-stream. Moreover, there exists a sufficiently large constant c such that for any interval size T, if λ ⥠c lg lg n/lg n, the system is unstable in the sense that the arrival rate exceeds the throughput in the worst case. If, on the other hand, we have λ ⤠c/lg n for a sufficiently small constant c, then the system is stable. Surprisingly, the algorithms that guarantee smaller makespans in the batch setting require lower arrival rates to achieve stability than does exponential backoff, but when they are stable, they have better response times.Singapore-MIT Alliance (SMA

    Designing Practical Efficient Algorithms for Symmetric Multiprocessors

    Get PDF
    Symmetric multiprocessors (SMPs) dominate the high-end server market and are currently the primary candidate for constructing large scale multiprocessor systems. Yet, the design of efficient parallel algorithms for this platform currently poses several challenges. In this paper, we present a computational model for designing efficient algorithms for symmetric multiprocessors. We then use this model to create efficient solutions to two widely different types of problems - linked list prefix computations and generalized sorting. Our novel algorithm for prefix computations builds upon the sparse ruling set approach of Reid-Miller and Blelloch. Besides being somewhat simpler and requiring nearly half the number of memory accesses, we can bound our complexity with high probability instead of merely on average. Our algorithm for generalized sorting is a modification of our algorithm for sorting by regular sampling on distributed memory architectures. The algorithm is a stable sort which appears to be asymptotically faster than any of the published algorithms for SMPs. Both of our algorithms were implemented in C using POSIX threads and run on three symmetric multiprocessors - the DEC AlphaServer, the Silicon Graphics Power Challenge, and the HP-Convex Exemplar. We ran our code for each algorithm using a variety of benchmarks which we identified to examine the dependence of our algorithm on memory access patterns. In spite of the fact that the processors must compete for access to main memory, both algorithms still yielded scalable performance up to 16 processors, which was the largest platform available to us. For some problems, our prefix computation algorithm actually matched or exceeded the performance of the best sequential solution using only a single thread. Similarly, our generalized sorting algorithm always beat the performance of sequential merge sort by at least an order of magnitude, even with a single thread. (Also cross-referenced as UMIACS-TR-98-44

    Automatic Methods for Hiding Latency in Parallel and Distributed Computation

    Get PDF
    In this paper we describe methods for mitigating the degradation in performance caused by high latencies in parallel and distributed networks. For example, given any dataflow type of algorithm that runs in T steps on an n-node ring with unit link delays, we show how to run the algorithm in O(T) steps on any n-node bounded-degree connected network with average link delay O(1). This is a significant improvement over prior approaches to latency hiding, which require slowdowns proportional to the maximum link delay. In the case when the network has average link delay dave, our simulation runs in O(√daveT) steps using n/√dave processors, thereby preserving efficiency. We also show how to efficiently simulate an n × n array with unit link delays using slowdown Õ (d&frac23ave) on a two-dimensional array with average link delay dave. Last, we present results for the case in which large local databases are involved in the computation

    Data Oblivious Algorithms for Multicores

    Get PDF
    As secure processors such as Intel SGX (with hyperthreading) become widely adopted, there is a growing appetite for private analytics on big data. Most prior works on data-oblivious algorithms adopt the classical PRAM model to capture parallelism. However, it is widely understood that PRAM does not best capture realistic multicore processors, nor does it reflect parallel programming models adopted in practice. In this paper, we initiate the study of parallel data oblivious algorithms on realistic multicores, best captured by the binary fork-join model of computation. We first show that data-oblivious sorting can be accomplished by a binary fork-join algorithm with optimal total work and optimal (cache-oblivious) cache complexity, and in O(log n log log n) span (i.e., parallel time) that matches the best-known insecure algorithm. Using our sorting algorithm as a core primitive, we show how to data-obliviously simulate general PRAM algorithms in the binary fork-join model with non-trivial efficiency. We also present results for several applications including list ranking, Euler tour, tree contraction, connected components, and minimum spanning forest. For a subset of these applications, our data-oblivious algorithms asymptotically outperform the best known insecure algorithms. For other applications, we show data oblivious algorithms whose performance bounds match the best known insecure algorithms. Complementing these asymptotically efficient results, we present a practical variant of our sorting algorithm that is self-contained and potentially implementable. It has optimal caching cost, and it is only a log log n factor off from optimal work and about a log n factor off in terms of span; moreover, it achieves small constant factors in its bounds

    Parallel Computation on Hypercube-Like Machines.

    Get PDF
    The hypercube interconnection network has been recognized to be very suitable for a parallel computing architecture due to its attractive topological properties. Recently, several modified hypercubes have been propose to improve the performance of a hypercube. This dissertation deals with two modified hypercubes, the X-hypercube and the Z-cube. The X-hypercube is a variant of the hypercube, with the same amount of hardware but a diameter of only \lceil(n + 1)/2\rceil in a hypercube of dimension n. The Z-cube has only 75 percent of the edges of a hypercube with the same number vertices and the same diameter as the hypercube. In this dissertation, we investigate some topological properties and the effectiveness of the X-hypercube and the Z-cube in their combinatorial and computational aspects. We give the optimal or nearly optimal data communication algorithms including routing, broadcasting, and census function for the X-hypercube and the Z-cube. We also give the optimal embedding algorithms between the X-hypercube and the hypercube. It is shown that the average distance between vertices in a X-hypercube is roughly 13/16 of that in a hypercube. This implies that a X-hypercube achieves the better average communication performance than a hypercube. In addition, a set of fundamental SIMD algorithms for a X-hypercube is given. Our results indicate that the X-hypercube makes an improvement in performance over the hypercube, but not as much as the reduction in a diameter, and the Z-cube is a good alternative for the hypercube as far as the VLSI implementation is of major concern

    Philetas of Cos : the poetical fragments

    Get PDF
    The greatest impediment in our effort to reconstruct the history of Greek literature of the 4th c. B.C. is the almost complete loss of important poets such as Antimachus of Colophon, a loss which leaves us in the dark as to the conditions that led to the 3rd c. B.C. renaissance. In the times around 300 B.C. leading figures were active in the SE Aegean, the most prominent of whom was Philetas of Cos. Ptolemy I entrusted him with the tutorship of his son Ptolemy II. Philetas was highly esteemed by his compatriots who honoured him with a statue, and by the avant-garde among Hellenistic poets including Callimachus and Theocritus. He wrote hexameters (Hermes), narrative elegy (Demeter), Epigrams and Paegnia and perhaps a Telephus. His Ataktoi Glossai, the first ever collection of recondite dialect vocables, became instantly renowned. But his poetiy did not survive long and is now almost entirely lost; no more than 50 lines survive along with 31 second hand entries of his Atakta mainly from Athenaeus. These were last published and studied by G. Kuchenmiiller in a Berlin 1928 thesis written in Latin, a work nowadays not easily accessible. This new approach to the scanty poetical remains of Philetas brings the study of this key figure up to date, takes into consideration material published since the twenties (including two fragments, three important testimonies, Hellenistic fragments which have become available from papyri, verse-inscriptions and inscriptions from Cos). Evidence from various sources is adduced to reconstruct Philetas' poems (particularly his "Coan" Demeter, to which most of the surviving fragments are attributed) and the key epigram fr. 27 is newly interpreted to show Philetas a Callimachean before Callimachus. A detailed commentary elucidates the wide range of Philetas' sources of inspiration and the largely neglected influence of his work, often followed up to Imperial times. A list of Alleged Testimonia and another of Alleged Ascriptions are provided to discuss pseudo-Philetan references and material
    corecore