2,625 research outputs found

    Engineering Aggregation Operators for Relational In-Memory Database Systems

    Get PDF
    In this thesis we study the design and implementation of Aggregation operators in the context of relational in-memory database systems. In particular, we identify and address the following challenges: cache-efficiency, CPU-friendliness, parallelism within and across processors, robust handling of skewed data, adaptive processing, processing with constrained memory, and integration with modern database architectures. Our resulting algorithm outperforms the state-of-the-art by up to 3.7x

    A taxonomy of parallel sorting

    Get PDF
    TR 84-601In this paper, we propose a taxonomy of parallel sorting that includes a broad range of array and file sorting algorithms. We analyze the evolution of research on parallel sorting, from the earliest sorting networks to the shared memory algorithms and the VLSI sorters. In the context of sorting networks, we describe two fundamental parallel merging schemes - the odd-even and the bitonic merge. Sorting algorithms have been derived from these merging algorithms for parallel computers where processors communicate through interconnection networks such as the perfect shuffle, the mesh and a number of other sparse networks. After describing the network sorting algorithms, we show that, with a shared memory model of parallel computation, faster algorithms have been derived from parallel enumeration sorting schemes, where keys are first ranked and then rearranged according to their rank

    Parallel algorithms and architectures for VLSI pattern generation

    Get PDF

    Parallel RAM from Cyclic Circuits

    Full text link
    Known simulations of random access machines (RAMs) or parallel RAMs (PRAMs) by Boolean circuits incur significant polynomial blowup, due to the need to repeatedly simulate accesses to a large main memory. Consider two modifications to Boolean circuits: (1) remove the restriction that circuit graphs are acyclic and (2) enhance AND gates such that they output zero eagerly. If an AND gate has a zero input, it 'short circuits' and outputs zero without waiting for its second input. We call this the cyclic circuit model. Note, circuits in this model remain combinational, as they do not allow wire values to change over time. We simulate a bounded-word-size PRAM via a cyclic circuit, and the blowup from the simulation is only polylogarithmic. Consider a PRAM program PP that on a length nn input uses an arbitrary number of processors to manipulate words of size Θ(logn)\Theta(\log n) bits and then halts within W(n)W(n) work. We construct a size-O(W(n)log4n)O(W(n)\cdot \log^4 n) cyclic circuit that simulates PP. Suppose that on a particular input, PP halts in time TT; our circuit computes the same output within TO(log3n)T \cdot O(\log^3 n) gate delay. This implies theoretical feasibility of powerful parallel machines. Cyclic circuits can be implemented in hardware, and our circuit achieves performance within polylog factors of PRAM. Our simulated PRAM synchronizes processors by simply leveraging logical dependencies between wires

    Experimental Progress in Computation by Self-Assembly of DNA Tilings

    Get PDF
    Approaches to DNA-based computing by self-assembly require the use of D. T A nanostructures, called tiles, that have efficient chemistries, expressive computational power: and convenient input and output (I/O) mechanisms. We have designed two new classes of DNA tiles: TAO and TAE, both of which contain three double-helices linked by strand exchange. Structural analysis of a TAO molecule has shown that the molecule assembles efficiently from its four component strands. Here we demonstrate a novel method for I/O whereby multiple tiles assemble around a single-stranded (input) scaffold strand. Computation by tiling theoretically results in the formation of structures that contain single-stranded (output) reported strands, which can then be isolated for subsequent steps of computation if necessary. We illustrate the advantages of TAO and TAE designs by detailing two examples of massively parallel arithmetic: construction of complete XOR and addition tables by linear assemblies of DNA tiles. The three helix structures provide flexibility for topological routing of strands in the computation: allowing the implementation of string tile models

    An Introduction to Quantum Computing for Non-Physicists

    Full text link
    Richard Feynman's observation that quantum mechanical effects could not be simulated efficiently on a computer led to speculation that computation in general could be done more efficiently if it used quantum effects. This speculation appeared justified when Peter Shor described a polynomial time quantum algorithm for factoring integers. In quantum systems, the computational space increases exponentially with the size of the system which enables exponential parallelism. This parallelism could lead to exponentially faster quantum algorithms than possible classically. The catch is that accessing the results, which requires measurement, proves tricky and requires new non-traditional programming techniques. The aim of this paper is to guide computer scientists and other non-physicists through the conceptual and notational barriers that separate quantum computing from conventional computing. We introduce basic principles of quantum mechanics to explain where the power of quantum computers comes from and why it is difficult to harness. We describe quantum cryptography, teleportation, and dense coding. Various approaches to harnessing the power of quantum parallelism are explained, including Shor's algorithm, Grover's algorithm, and Hogg's algorithms. We conclude with a discussion of quantum error correction.Comment: 45 pages. To appear in ACM Computing Surveys. LATEX file. Exposition improved throughout thanks to reviewers' comment
    corecore