Search CORE

532,724 research outputs found

Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

Author: Albutiu Martina-Cezara
Kemper Alfons
Neumann Thomas
Publication venue
Publication date: 01/01/2012
Field of study

Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel multi-core data processing as it was deemed inferior to hash joins. We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting. Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a hard to parallelize final merge step to create one complete sort order. Rather they work on the independently created runs in parallel. This way our MPSM algorithms are NUMA-affine as all the sorting is carried out on local memory partitions. An extensive experimental evaluation on a modern 32-core machine with one TB of main memory proves the competitive performance of MPSM on large main memory databases with billions of objects. It scales (almost) linearly in the number of employed cores and clearly outperforms competing hash join proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel query engine by a factor of four.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

State-of-the-Art in Parallel Computing with R

Author: Eddelbuettel Dirk
Mansmann Ulrich
Morgan Martin
Schmidberger Markus
Tierney Luke
Yu Hao
Publication venue
Publication date: 01/01/2009
Field of study

R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly useful for general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems four different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix

Crossref

Directory of Open Access Journals

Open Access LMU

Journal of Statistical Software

State of the Art in Parallel Computing with R

Author: Dirk Eddelbuettel
Hao Yu
Luke Tierney
Markus Schmidberger
Martin Morgan
Ulrich Mansmann
Publication venue
Publication date
Field of study

R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly suited to general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems five different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix.

Research Papers in Economics

Parallel Arbitrary-precision Integer Arithmetic

Author: Mohajerani Davood
Publication venue: Scholarship@Western
Publication date: 19/03/2021
Field of study

Arbitrary-precision integer arithmetic computations are driven by applications in solving systems of polynomial equations and public-key cryptography. Such computations arise when high precision is required (with large input values that fit into multiple machine words), or to avoid coefficient overflow due to intermediate expression swell. Meanwhile, the growing demand for faster computation alongside the recent advances in the hardware technology have led to the development of a vast array of many-core and multi-core processors, accelerators, programming models, and language extensions (e.g. CUDA, OpenCL, and OpenACC for GPUs, and OpenMP and Cilk for multi-core CPUs). The massive computational power of parallel processors makes them attractive targets for carrying out arbitrary-precision integer arithmetic. At the same time, developing parallel algorithms, followed by implementing and optimizing them as multi-threaded parallel programs imposes a set of challenges. This work explains the current state of research on parallel arbitrary-precision integer arithmetic on GPUs and CPUs, and proposes a number of solutions for some of the challenging problems related to this subject

Scholarship@Western

Application Partitioning and Mapping Techniques for Heterogeneous Parallel Platforms

Author: García Sánchez José Daniel
Sotomayor Fernández Rafael
Publication venue
Publication date: 01/01/2016
Field of study

Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.Parallelism has become one of the most extended paradigms used to improve performance. Legacy source code needs to be re-written so that it can take advantage of multi-core and many-core computing devices, such as GPGPU, FPGA, DSP or specific accelerators. However, it forces software developers to adapt applications and coding mechanisms in order to exploit the available computing devices. It is a time consuming and error prone task that usually results in expensive and sub-optimal parallel software. In this work, we describe a parallel programming model, a set of annotating techniques and a static scheduling algorithm for parallel applications. Their purpose is to simplify the task of transforming sequential legacy code into parallel code capable of making full use of several different computing devices with the objetive of increasing performance, lowering energy consumption and increase the productivity of the developer.European Cooperation in Science and Technology. COSTThe work presented in this paper has been partially supported by EU under the COST programme Action IC1305, ’Network for Sustainable Ultrascale Computing (NESUS)’ The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n. 609666 and by the Spanish Ministry of Economics and Competitiveness under the grant TIN2013-41350-P

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Recommended from our members

JUSTGrid: a pure Java HPCC grid architecture for multi-physics solvers using complex geometries

Author: Ludewig Thorsten
Publication venue
Publication date: 01/01/2009
Field of study

After the Earth Simulator, built by NEC at the Japan Marine Science and Technology Centre (JAMSTEC) on an area of 3,250 m2 (50mx65m), began it's work in March 2002 with the outstanding performance of 35,860 Gflops (40 TFIops peak) [TRIOO], numerous scientists opted in favour of such a high-performance computation and communications (HPCC) approach, suggesting to build again Cray type vector supercomputers that dominated scientific computing in the mid seventies. Today (2009) the extended Earth Simulator has a peak performance of 131 TFIops but it was outperformed by several other systems with multi-core 1 architectures. Top 1 in June 2009 is the RoadRunner build by IBM for the DOE/NNSA/LANL with a peak performance of 1456 TFIops. Multi-core processors are now build in every PC for the consumer market and not only for HPC systems. It should be remembered that the computer games industry is responsible for the revolution in high end 3D graphics cards that convert any PC into a most powerful graphics workstation. It should be obvious, despite the computational power of the Earth Simulator, that this definitely is not the road of HPCC for general scientific and engineering computation. "I hope to concentrate my attention on my research rather then how to program", says Hitoshi Sakagami, a researcher at Japan's Himeji Institute of Technology and a Gordon Bell Prize finalist for work using the Earth Simulator [TRIOO]. I fully agree with this statement, and this is one of the major reasons that I have chosen Java as high performance computing language. Programming vector computers is a difficult task, and to obtain acceptable results with regard to announced peak performance has been notoriously cumbersome. On the other hand, multi-core systems with many processors on a single chip need to be programmed in a different, namely a multi threaded way. Threads are a substantial part of the Java programming language. Java is the only general programming language that does not need external libraries for parallel programming, because everything needed is built into the language. In addition, there are major additional advantages of the Java language (object oriented, parallelization, readability, maintainability, programmer productivity, platform independence, code safety and reliability, database connectivity, internet capability, multimedia capability, GUI (graphics user interfaces), 3D graphics (Java 3D) and portability etc.) which were discussed in this thesis. The objective of this work is to build an easy to use software framework for high performance computing dealing with complex 3D geometries. The framework should also take care of all the advantages and behaviours of modern multi-core/multi-threaded hardware architectures. In view of the increasing complexity of modern hardware, working on solutions of multi-physical problems demands for software, that makes the solving process mostly independent of the available machinery

Greenwich Academic Literature Archive

OpenGrey Repository

Performance analysis of a scalable hardware FPGA Skein implementation

Author: Schorr Aric
Publication venue: RIT Scholar Works
Publication date: 01/02/2010
Field of study

Hashing functions are a key cryptographic primitive used in many everyday applications, such as authentication, ensuring data integrity, as well as digital signatures. The current hashing standard is defined by the National Institute of Standards and Technology (NIST) as the Secure Hash Standard (SHS), and includes SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512 . SHS\u27s level of security is waning as technology and analysis techniques continue to develop over time. As a result, after the 2005 Cryptographic Hash Workshop, NIST called for the creation of a new cryptographic hash algorithm to replace SHS. The new candidate algorithms were submitted on October 31st, 2008, and of them fourteen have advanced to round two of the competition. The competition is expected to produce a final replacement for the SHS standard by 2012. Multi-core processors, and parallel programming are the dominant force in computing, and some of the new hashing algorithms are attempting to take advantage of these resources by offering parallel tree-hashing variants to the algorithms. Tree-hashing allows multiple parts of the data on the same level of a tree to be operated on simultaneously, resulting in the potential to reduce the execution time complexity for hashing from O(n) to O(log n). Designs for tree-hashing require that the scalability and parallelism of the algorithms be researched on all platforms, including multi-core processors (CPUs), graphics processors (GPUs), as well as custom hardware (ASICs and FPGAs). Skein, the hashing function that this work has focused on, offers a tree-hashing mode with different options for the maximum tree height, and leaf node size, as well as the node fan-out. This research focuses on creating and analyzing the performance of scalable hardware designs for Skein\u27s tree hashing mode. Different ideas and approaches on how to modify sequential hashing cores, and create scalable control logic in order to provide for high-speed and low-area parallel hashing hardware are presented and analyzed. Equations were created to help understand the expected performance and potential bottlenecks of Skein in FPGAs. The equations are intended to assist the decision making process during the design phase, as well as potentially provide insight into design considerations for other tree hashing schemes in FPGAs. The results are also compared to current sequential designs of Skein, providing a complete analysis of the performance of Skein in an FPGA

RIT Scholar Works

The common data acquisition platform in the Helmholtz Association

Author: Balzer M.
Kaever P.
Kopmann A.
Rongen H.
Zimmer M.
Publication venue
Publication date: 01/01/2017
Field of study

Various centres of the German Helmholtz Association (HGF) started in 2012 to developa modular data acquisition (DAQ) platform, covering the entire range from detector readout todata transfer into parallel computing environments. This platform integrates generic hardwarecomponents like the multi-purpose HGF-Advanced Mezzanine Card or a smart scientific cameraframework, adding user value with Linux drivers and board support packages. Technically the scopecomprises the DAQ-chain from FPGA-modules to computing servers, notably frontend-electronicsinterfaces, microcontrollers and GPUs with their software plus high-performance data transmissionlinks. The core idea is a generic and component-based approach, enabling the implementationof specific experiment requirements with low effort. This so called DTS-platform will supportstandards like MTCA.4 in hard- and software to ensure compatibility with commercial components.Its capability to deploy on other crate standards or FPGA-boards with PCI express or Ethernetinterfaces remains an essential feature.Competences of the participating centres are coordinated in order to provide a solid technological basis for both research topics in the Helmholtz Programme “Matter and Technology”:“Detector Technology and Systems” and “Accelerator Research and Development”. The DTSplatform aims at reducing costs and development time and will ensure access to latest technologiesfor the collaboration. Due to its flexible approach, it has the potential to be applied in other scientificprograms

Crossref

KITopen

DESY

Učinkovita metoda paralelnog računanja za obradu velike količine podataka prikupljanih senzorom

Author: Dandan Li
Qun Wang
Xiaohui Ji
Publication venue: 'KOREMA'
Publication date: 01/01/2013
Field of study

In recent years we witness the advent of the Internet of Things and the wide deployment of sensors in many applications for collecting and aggregating data. Efficient techniques are required to analyze these massive data for supporting intelligent decisions making. Partial differential problems which involve large data are the most common in the engineering and scientific research. For simulations of large-scale three-dimensional partial differential equations, the intensive computation ability and large amounts of memory requirements for modeling are the main research problems. To address the two challenges, this paper provided an effective parallel method for partial differential equations. The proposed approach combines the overlapping domain decomposition strategy and the multi-core cluster technology to achieve parallel simulations of partial differential equations, uses the finite difference method to discretize equations and adopts the hybrid MPI/OpenMP programming model to exploit two-level parallelism on a multi-core cluster. The three-dimensional groundwater flow model with the parallel finite difference overlapping domain decomposition strategy was successfully set up and carried out by the parallel MPI/OpenMP implementation on a multi-core cluster with two nodes. The experimental results show that the proposed parallel approach can efficiently simulate partial differential problems with large amounts of data.Posljednjih godina svjedočimo dolasku tzv. interneta stvari i širokoj uporabi senzora u raznim primjenama prikupljanja i objedinjavanja podataka. Učinkovite metode su potrebne za analizu velike količine podataka u svrhu podrške inteligentnom odlučivanju. Parcijalno diferencijalni problemi koji uključuju veliku količinu podataka su gotovo uobičajeni u inženjerstvu i znanstvenom istraživanju. Za simulaciju masovnih trodimenzionalnih parcijalnih diferencijalnih jednadžbi potrebne su značajne računalne mogućnosti, a potreba za velikom količinom memorije za modeliranje je glavni istraživački problem. Za rješavanje oba problema ovaj rad pruža efektivnu paralelnu metodu za parcijalne diferencijalne jednadžbe. Predloženi pristup kombinira strategiju dekompozicije preklapajućih domena i tehnologiju višejezgrenih klastera za postizanje paralelnih simulacija parcijalnih diferencijalnih jednadžbi, koristi metodu konačnog diferenciranja za diskretizaciju jednadžbi te model MPI/OpenMP hibridnog programiranja za iskorištavanje dvorazinskog paralelizma na višejezgrenom klasteru. Formiran je trodimenzionalan model toka podzemnih voda sa strategijom dekompozicije preklapajućih domena konačnih diferencija. Za izvođenje je korištena paralelna MPI/OpenMP implementacija na višejezgrenom klasteru s dva čvora. Eksperimentalni rezultati su pokazali kako predloženi paralelni pristup može učinkovito simulirati parcijalno diferencijalne probleme s velikom količinom podataka

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia