37 research outputs found

    Highly Parallel Processing of Relational Databases (Thesis)

    Get PDF

    VLSI Algorithms for Relational Database Operations

    Get PDF

    A survey of network-based hardware accelerators

    Get PDF
    Many practical data-processing algorithms fail to execute efficiently on general-purpose CPUs (Central Processing Units) due to the sequential matter of their operations and memory bandwidth limitations. To achieve desired performance levels, reconfigurable (FPGA (Field-Programmable Gate Array)-based) hardware accelerators are frequently explored that permit the processing units’ architectures to be better adapted to the specific problem/algorithm requirements. In particular, network-based data-processing algorithms are very well suited to implementation in reconfigurable hardware because several data-independent operations can easily and naturally be executed in parallel over as many processing blocks as actually required and technically possible. GPUs (Graphics Processing Units) have also demonstrated good results in this area but they tend to use significantly more power than FPGA, which could be a limiting factor in embedded applications. Moreover, GPUs employ a Single Instruction, Multiple Threads (SIMT) execution model and are therefore optimized to SIMD (Single Instruction, Multiple Data) operations, while in FPGAs fully custom datapaths can be built, eliminating much of the control overhead. This review paper aims to analyze, compare, and discuss different approaches to implementing network-based hardware accelerators in FPGA and programmable SoC (Systems-on-Chip). The performed analysis and the derived recommendations would be useful to hardware designers of future network-based hardware accelerators.publishe

    An Experimental Analysis of Parallel Sorting Algorithms

    Full text link

    S-HARP: A parallel dynamic spectral partitioner

    Full text link

    Sample sort on meshes

    Get PDF
    This paper provides an overview of lower and upper bounds for mesh-connected processor networks. Most attention goes to routing and sorting problems, but other problems are mentioned as well. Results from 1977 to 1995 are covered. We provide numerous results, references and open problems. The text is completed with an index. This is a worked-out version of the author's contribution to a joint paper with Grammatikakis, Hsu and Kraetzl on multicomputer routing, submitted to JPDC

    An automated OpenCL FPGA compilation framework targeting a configurable, VLIW chip multiprocessor

    Get PDF
    Modern system-on-chips augment their baseline CPU with coprocessors and accelerators to increase overall computational capacity and power efficiency, and thus have evolved into heterogeneous systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This thesis discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a customised VLIW chip multiprocessor (CMP) architecture, known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on the LE1 CPU. The framework fully automates the compilation flow and supports work-item coalescing to better utilise the CPU cores and alleviate the effects of thread divergence. This thesis discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework on a highly precise cycle-accurate simulator. This is achieved through the execution of 12 benchmarks across 240 different machine configurations, as well as further results utilising an incomplete development branch of the compiler. It is shown that the problems generally scale well with the LE1 architecture, up to eight cores, when the memory system becomes a serious bottleneck. Results demonstrate superlinear performance on certain benchmarks (x9 for the bitonic sort benchmark with 8 dual-issue cores) with further improvements from compiler optimisations (x14 for bitonic with the same configuration

    Particle based modeling and simulation of natural phenomena

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2010.Thesis (Ph. D.) -- Bilkent University, 2010.Includes bibliographical references leaves 92-108.This thesis is about modeling and simulation of fluids and cloth-like deformable objects by the physically-based simulation paradigm. Simulated objects are modeled with particles and their interaction with each other and the environment is defined by particle-to-particle forces. We propose several improvements over the existing particle simulation techniques. Neighbor search algorithms are crucial for the performance efficiency and robustness of a particle system. We present a sorting-based neighbor search method which operates on a uniform grid, and can be parallelizable. We improve upon the existing fluid surface generation methods so that our method captures surface details better since we consider the relative position of fluid particles to the fluid surface. We investigate several alternatives of particle interaction schema (i.e. Smoothed Particle Hydrodynamics, the Discrete Element Method, and Lennard-Jones potential) for the purpose of defining fluid-fluid, fluid-cloth, fluid-boundary interaction forces. We also propose a practical way to simulate knitwear and its interaction with fluids. We employ capillary pressure–based forces to simulate the absorption of fluid particles by knitwear. We also propose a method to simulate the flow of miscible fluids. Our particle simulation system is implement to exploit parallel computing capabilities of the commodity computers. Specifically, we implemented the proposed methods on multicore CPUs and programmable graphics boards. The experiments show that our method is computationally efficient and produces realistic results.Bayraktar, SerkanPh.D

    SAT and CP: Parallelisation and Applications

    Get PDF
    This thesis is considered with the parallelisation of solvers which search for either an arbitrary, or an optimum, solution to a problem stated in some formal way. We discuss the parallelisation of two solvers, and their application in three chapters.In the first chapter, we consider SAT, the decision problem of propositional logic, and algorithms for showing the satisfiability or unsatisfiability of propositional formulas. We sketch some proof-theoretic foundations which are related to the strength of different algorithmic approaches. Furthermore, we discuss details of the implementations of SAT solvers, and show how to improve upon existing sequential solvers. Lastly, we discuss the parallelisation of these solvers with a focus on clause exchange, the communication of intermediate results within a parallel solver. The second chapter is concerned with Contraint Programing (CP) with learning. Contrary to classical Constraint Programming techniques, this incorporates learning mechanisms as they are used in the field of SAT solving. We present results from parallelising CHUFFED, a learning CP solver. As this is both a kind of CP and SAT solver, it is not clear which parallelisation approaches work best here. In the final chapter, we will discuss Sorting networks, which are data oblivious sorting algorithms, i. e., the comparisons they perform do not depend on the input data. Their independence of the input data lends them to parallel implementation. We consider the question how many parallel sorting steps are needed to sort some inputs, and present both lower and upper bounds for several cases

    Extracting the Structure and Conformations of Biological Entities from Large Datasets

    Get PDF
    In biology, structure determines function, which often proceeds via changes in conformation. Efficient means for determining structure exist, but mapping conformations continue to present a serious challenge. Single-particles approaches, such as cryogenic electron microscopy (cryo-EM) and emerging diffract & destroy X-ray techniques are, in principle, ideally positioned to overcome these challenges. But the algorithmic ability to extract information from large heterogeneous datasets consisting of unsorted snapshots - each emanating from an unknown orientation of an object in an unknown conformation - remains elusive. It is the objective of this thesis to describe and validate a powerful suite of manifold-based algorithms able to extract structural and conformational information from large datasets. These computationally efficient algorithms offer a new approach to determining the structure and conformations of viruses and macromolecules. After an introduction, we demonstrate a distributed, exact k-Nearest Neighbor Graph (k-NNG) construction method, in order to establish a firm algorithmic basis for manifold-based analysis. The proposed algorithm uses Graphics Processing Units (GPUs) and exploits multiple levels of parallelism in distributed computational environment and it is scalable for different cluster sizes, with each compute node in the cluster containing multiple GPUs. Next, we present applications of manifold-based analysis in determining structure and conformational variability. Using the Diffusion Map algorithm, a new approach is presented, which is capable of determining structure of symmetric objects, such as viruses, to 1/100th of the object diameter, using low-signal diffraction snapshots. This is demonstrated by means of a successful 3D reconstruction of the Satellite Tobacco Necrosis Virus (STNV) to atomic resolution from simulated diffraction snapshots with and without noise. We next present a new approach for determining discrete conformational changes of the enzyme Adenylate kinase (ADK) from very large datasets of up to 20 million snapshots, each with ~104 pixels. This exceeds by an order of magnitude the largest dataset previously analyzed. Finally, we present a theoretical framework and an algorithmic pipeline for capturing continuous conformational changes of the ribosome from ultralow-signal (-12dB) experimental cryo-EM. Our analysis shows a smooth, concerted change in molecular structure in two-dimensional projection, which might be indicative of the way the ribosome functions as a molecular machine. The thesis ends with a summary and future prospects
    corecore