532,724 research outputs found

    Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

    Full text link
    Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel multi-core data processing as it was deemed inferior to hash joins. We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting. Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a hard to parallelize final merge step to create one complete sort order. Rather they work on the independently created runs in parallel. This way our MPSM algorithms are NUMA-affine as all the sorting is carried out on local memory partitions. An extensive experimental evaluation on a modern 32-core machine with one TB of main memory proves the competitive performance of MPSM on large main memory databases with billions of objects. It scales (almost) linearly in the number of employed cores and clearly outperforms competing hash join proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel query engine by a factor of four.Comment: VLDB201

    State-of-the-Art in Parallel Computing with R

    Get PDF
    R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly useful for general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems four different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix

    State of the Art in Parallel Computing with R

    Get PDF
    R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly suited to general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems five different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix.

    Parallel Arbitrary-precision Integer Arithmetic

    Get PDF
    Arbitrary-precision integer arithmetic computations are driven by applications in solving systems of polynomial equations and public-key cryptography. Such computations arise when high precision is required (with large input values that fit into multiple machine words), or to avoid coefficient overflow due to intermediate expression swell. Meanwhile, the growing demand for faster computation alongside the recent advances in the hardware technology have led to the development of a vast array of many-core and multi-core processors, accelerators, programming models, and language extensions (e.g. CUDA, OpenCL, and OpenACC for GPUs, and OpenMP and Cilk for multi-core CPUs). The massive computational power of parallel processors makes them attractive targets for carrying out arbitrary-precision integer arithmetic. At the same time, developing parallel algorithms, followed by implementing and optimizing them as multi-threaded parallel programs imposes a set of challenges. This work explains the current state of research on parallel arbitrary-precision integer arithmetic on GPUs and CPUs, and proposes a number of solutions for some of the challenging problems related to this subject

    Application Partitioning and Mapping Techniques for Heterogeneous Parallel Platforms

    Get PDF
    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.Parallelism has become one of the most extended paradigms used to improve performance. Legacy source code needs to be re-written so that it can take advantage of multi-core and many-core computing devices, such as GPGPU, FPGA, DSP or specific accelerators. However, it forces software developers to adapt applications and coding mechanisms in order to exploit the available computing devices. It is a time consuming and error prone task that usually results in expensive and sub-optimal parallel software. In this work, we describe a parallel programming model, a set of annotating techniques and a static scheduling algorithm for parallel applications. Their purpose is to simplify the task of transforming sequential legacy code into parallel code capable of making full use of several different computing devices with the objetive of increasing performance, lowering energy consumption and increase the productivity of the developer.European Cooperation in Science and Technology. COSTThe work presented in this paper has been partially supported by EU under the COST programme Action IC1305, ’Network for Sustainable Ultrascale Computing (NESUS)’ The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n. 609666 and by the Spanish Ministry of Economics and Competitiveness under the grant TIN2013-41350-P

    Performance analysis of a scalable hardware FPGA Skein implementation

    Get PDF
    Hashing functions are a key cryptographic primitive used in many everyday applications, such as authentication, ensuring data integrity, as well as digital signatures. The current hashing standard is defined by the National Institute of Standards and Technology (NIST) as the Secure Hash Standard (SHS), and includes SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512 . SHS\u27s level of security is waning as technology and analysis techniques continue to develop over time. As a result, after the 2005 Cryptographic Hash Workshop, NIST called for the creation of a new cryptographic hash algorithm to replace SHS. The new candidate algorithms were submitted on October 31st, 2008, and of them fourteen have advanced to round two of the competition. The competition is expected to produce a final replacement for the SHS standard by 2012. Multi-core processors, and parallel programming are the dominant force in computing, and some of the new hashing algorithms are attempting to take advantage of these resources by offering parallel tree-hashing variants to the algorithms. Tree-hashing allows multiple parts of the data on the same level of a tree to be operated on simultaneously, resulting in the potential to reduce the execution time complexity for hashing from O(n) to O(log n). Designs for tree-hashing require that the scalability and parallelism of the algorithms be researched on all platforms, including multi-core processors (CPUs), graphics processors (GPUs), as well as custom hardware (ASICs and FPGAs). Skein, the hashing function that this work has focused on, offers a tree-hashing mode with different options for the maximum tree height, and leaf node size, as well as the node fan-out. This research focuses on creating and analyzing the performance of scalable hardware designs for Skein\u27s tree hashing mode. Different ideas and approaches on how to modify sequential hashing cores, and create scalable control logic in order to provide for high-speed and low-area parallel hashing hardware are presented and analyzed. Equations were created to help understand the expected performance and potential bottlenecks of Skein in FPGAs. The equations are intended to assist the decision making process during the design phase, as well as potentially provide insight into design considerations for other tree hashing schemes in FPGAs. The results are also compared to current sequential designs of Skein, providing a complete analysis of the performance of Skein in an FPGA

    The common data acquisition platform in the Helmholtz Association

    Get PDF
    Various centres of the German Helmholtz Association (HGF) started in 2012 to developa modular data acquisition (DAQ) platform, covering the entire range from detector readout todata transfer into parallel computing environments. This platform integrates generic hardwarecomponents like the multi-purpose HGF-Advanced Mezzanine Card or a smart scientific cameraframework, adding user value with Linux drivers and board support packages. Technically the scopecomprises the DAQ-chain from FPGA-modules to computing servers, notably frontend-electronicsinterfaces, microcontrollers and GPUs with their software plus high-performance data transmissionlinks. The core idea is a generic and component-based approach, enabling the implementationof specific experiment requirements with low effort. This so called DTS-platform will supportstandards like MTCA.4 in hard- and software to ensure compatibility with commercial components.Its capability to deploy on other crate standards or FPGA-boards with PCI express or Ethernetinterfaces remains an essential feature.Competences of the participating centres are coordinated in order to provide a solid technological basis for both research topics in the Helmholtz Programme “Matter and Technology”:“Detector Technology and Systems” and “Accelerator Research and Development”. The DTSplatform aims at reducing costs and development time and will ensure access to latest technologiesfor the collaboration. Due to its flexible approach, it has the potential to be applied in other scientificprograms

    Učinkovita metoda paralelnog računanja za obradu velike količine podataka prikupljanih senzorom

    Get PDF
    In recent years we witness the advent of the Internet of Things and the wide deployment of sensors in many applications for collecting and aggregating data. Efficient techniques are required to analyze these massive data for supporting intelligent decisions making. Partial differential problems which involve large data are the most common in the engineering and scientific research. For simulations of large-scale three-dimensional partial differential equations, the intensive computation ability and large amounts of memory requirements for modeling are the main research problems. To address the two challenges, this paper provided an effective parallel method for partial differential equations. The proposed approach combines the overlapping domain decomposition strategy and the multi-core cluster technology to achieve parallel simulations of partial differential equations, uses the finite difference method to discretize equations and adopts the hybrid MPI/OpenMP programming model to exploit two-level parallelism on a multi-core cluster. The three-dimensional groundwater flow model with the parallel finite difference overlapping domain decomposition strategy was successfully set up and carried out by the parallel MPI/OpenMP implementation on a multi-core cluster with two nodes. The experimental results show that the proposed parallel approach can efficiently simulate partial differential problems with large amounts of data.Posljednjih godina svjedočimo dolasku tzv. interneta stvari i ĆĄirokoj uporabi senzora u raznim primjenama prikupljanja i objedinjavanja podataka. Učinkovite metode su potrebne za analizu velike količine podataka u svrhu podrĆĄke inteligentnom odlučivanju. Parcijalno diferencijalni problemi koji uključuju veliku količinu podataka su gotovo uobičajeni u inĆŸenjerstvu i znanstvenom istraĆŸivanju. Za simulaciju masovnih trodimenzionalnih parcijalnih diferencijalnih jednadĆŸbi potrebne su značajne računalne mogućnosti, a potreba za velikom količinom memorije za modeliranje je glavni istraĆŸivački problem. Za rjeĆĄavanje oba problema ovaj rad pruĆŸa efektivnu paralelnu metodu za parcijalne diferencijalne jednadĆŸbe. PredloĆŸeni pristup kombinira strategiju dekompozicije preklapajućih domena i tehnologiju viĆĄejezgrenih klastera za postizanje paralelnih simulacija parcijalnih diferencijalnih jednadĆŸbi, koristi metodu konačnog diferenciranja za diskretizaciju jednadĆŸbi te model MPI/OpenMP hibridnog programiranja za iskoriĆĄtavanje dvorazinskog paralelizma na viĆĄejezgrenom klasteru. Formiran je trodimenzionalan model toka podzemnih voda sa strategijom dekompozicije preklapajućih domena konačnih diferencija. Za izvođenje je koriĆĄtena paralelna MPI/OpenMP implementacija na viĆĄejezgrenom klasteru s dva čvora. Eksperimentalni rezultati su pokazali kako predloĆŸeni paralelni pristup moĆŸe učinkovito simulirati parcijalno diferencijalne probleme s velikom količinom podataka
    • 

    corecore