532,724 research outputs found
Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems
Two emerging hardware trends will dominate the database system technology in
the near future: increasing main memory capacities of several TB per server and
massively parallel multi-core processing. Many algorithmic and control
techniques in current database technology were devised for disk-based systems
where I/O dominated the performance. In this work we take a new look at the
well-known sort-merge join which, so far, has not been in the focus of research
in scalable massively parallel multi-core data processing as it was deemed
inferior to hash joins. We devise a suite of new massively parallel sort-merge
(MPSM) join algorithms that are based on partial partition-based sorting.
Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a
hard to parallelize final merge step to create one complete sort order. Rather
they work on the independently created runs in parallel. This way our MPSM
algorithms are NUMA-affine as all the sorting is carried out on local memory
partitions. An extensive experimental evaluation on a modern 32-core machine
with one TB of main memory proves the competitive performance of MPSM on large
main memory databases with billions of objects. It scales (almost) linearly in
the number of employed cores and clearly outperforms competing hash join
proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel
query engine by a factor of four.Comment: VLDB201
State-of-the-Art in Parallel Computing with R
R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly useful for general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems four different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix
State of the Art in Parallel Computing with R
R is a mature open-source programming language for statistical computing and graphics. Many areas of statistical research are experiencing rapid growth in the size of data sets. Methodological advances drive increased use of simulations. A common approach is to use parallel computing. This paper presents an overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing. It reviews sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Two packages (snow, Rmpi) stand out as particularly suited to general use on computer clusters. Packages for grid computing are still in development, with only one package currently available to the end user. For multi-core systems five different packages exist, but a number of issues pose challenges to early adopters. The paper concludes with ideas for further developments in high performance computing with R. Example code is available in the appendix.
Parallel Arbitrary-precision Integer Arithmetic
Arbitrary-precision integer arithmetic computations are driven by applications in solving systems of polynomial equations and public-key cryptography. Such computations arise when high precision is required (with large input values that fit into multiple machine words), or to avoid coefficient overflow due to intermediate expression swell. Meanwhile, the growing demand for faster computation alongside the recent advances in the hardware technology have led to the development of a vast array of many-core and multi-core processors, accelerators, programming models, and language extensions (e.g. CUDA, OpenCL, and OpenACC for GPUs, and OpenMP and Cilk for multi-core CPUs). The massive computational power of parallel processors makes them attractive targets for carrying out arbitrary-precision integer arithmetic. At the same time, developing parallel algorithms, followed by implementing and optimizing them as multi-threaded parallel programs imposes a set of challenges. This work explains the current state of research on parallel arbitrary-precision integer arithmetic on GPUs and CPUs, and proposes a number of solutions for some of the challenging problems related to this subject
Application Partitioning and Mapping Techniques for Heterogeneous Parallel Platforms
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.Parallelism has become one of the most extended paradigms used to improve performance. Legacy source code needs
to be re-written so that it can take advantage of multi-core and many-core computing devices, such as GPGPU,
FPGA, DSP or specific accelerators. However, it forces software developers to adapt applications and coding
mechanisms in order to exploit the available computing devices. It is a time consuming and error prone task that
usually results in expensive and sub-optimal parallel software.
In this work, we describe a parallel programming model, a set of annotating techniques and a static scheduling
algorithm for parallel applications. Their purpose is to simplify the task of transforming sequential legacy code
into parallel code capable of making full use of several different computing devices with the objetive of increasing
performance, lowering energy consumption and increase the productivity of the developer.European Cooperation in Science and Technology. COSTThe work presented in this paper has been partially supported by EU under the COST programme Action
IC1305, âNetwork for Sustainable Ultrascale Computing (NESUS)â The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n. 609666 and by the Spanish Ministry of Economics and Competitiveness under the grant TIN2013-41350-P
Recommended from our members
JUSTGrid: a pure Java HPCC grid architecture for multi-physics solvers using complex geometries
After the Earth Simulator, built by NEC at the Japan Marine Science and Technology Centre (JAMSTEC) on an area of 3,250 m2 (50mx65m), began it's work in March 2002 with the outstanding performance of 35,860 Gflops (40 TFIops peak) [TRIOO], numerous scientists opted in favour of such a high-performance computation and communications (HPCC) approach, suggesting to build again Cray type vector supercomputers that dominated scientific computing in the mid seventies. Today (2009) the extended Earth Simulator has a peak performance of 131 TFIops but it was outperformed by several other systems with multi-core 1 architectures. Top 1 in June 2009 is the RoadRunner build by IBM for the DOE/NNSA/LANL with a peak performance of 1456 TFIops. Multi-core processors are now build in every PC for the consumer market and not only for HPC systems. It should be remembered that the computer games industry is responsible for the revolution in high end 3D graphics cards that convert any PC into a most powerful graphics workstation. It should be obvious, despite the computational power of the Earth Simulator, that this definitely is not the road of HPCC for general scientific and engineering computation.
"I hope to concentrate my attention on my research rather then how to program", says Hitoshi Sakagami, a researcher at Japan's Himeji Institute of Technology and a Gordon Bell Prize finalist for work using the Earth Simulator [TRIOO].
I fully agree with this statement, and this is one of the major reasons that I have chosen Java as high performance computing language. Programming vector computers is a difficult task, and to obtain acceptable results with regard to announced peak performance has been notoriously cumbersome. On the other hand, multi-core systems with many processors on a single chip need to be programmed in a different, namely a multi threaded way. Threads are a substantial part of the Java programming language. Java is the only general programming language that does not need external libraries for parallel programming, because everything needed is built into the language. In addition, there are major additional advantages of the Java language (object oriented, parallelization, readability, maintainability, programmer productivity, platform independence, code safety and reliability, database connectivity, internet capability, multimedia capability, GUI (graphics user interfaces), 3D graphics (Java 3D) and portability etc.) which were discussed in this thesis. The objective of this work is to build an easy to use software framework for high performance computing dealing with complex 3D geometries. The framework should also take care of all the advantages and behaviours of modern multi-core/multi-threaded hardware architectures. In view of the increasing complexity of modern hardware, working on solutions of multi-physical problems demands for software, that makes the solving process mostly independent of the available machinery
Performance analysis of a scalable hardware FPGA Skein implementation
Hashing functions are a key cryptographic primitive used in many everyday applications, such as authentication, ensuring data integrity, as well as digital signatures. The current hashing standard is defined by the National Institute of Standards and Technology (NIST) as the Secure Hash Standard (SHS), and includes SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512 . SHS\u27s level of security is waning as technology and analysis techniques continue to develop over time. As a result, after the 2005 Cryptographic Hash Workshop, NIST called for the creation of a new cryptographic hash algorithm to replace SHS. The new candidate algorithms were submitted on October 31st, 2008, and of them fourteen have advanced to round two of the competition. The competition is expected to produce a final replacement for the SHS standard by 2012. Multi-core processors, and parallel programming are the dominant force in computing, and some of the new hashing algorithms are attempting to take advantage of these resources by offering parallel tree-hashing variants to the algorithms. Tree-hashing allows multiple parts of the data on the same level of a tree to be operated on simultaneously, resulting in the potential to reduce the execution time complexity for hashing from O(n) to O(log n). Designs for tree-hashing require that the scalability and parallelism of the algorithms be researched on all platforms, including multi-core processors (CPUs), graphics processors (GPUs), as well as custom hardware (ASICs and FPGAs). Skein, the hashing function that this work has focused on, offers a tree-hashing mode with different options for the maximum tree height, and leaf node size, as well as the node fan-out. This research focuses on creating and analyzing the performance of scalable hardware designs for Skein\u27s tree hashing mode. Different ideas and approaches on how to modify sequential hashing cores, and create scalable control logic in order to provide for high-speed and low-area parallel hashing hardware are presented and analyzed. Equations were created to help understand the expected performance and potential bottlenecks of Skein in FPGAs. The equations are intended to assist the decision making process during the design phase, as well as potentially provide insight into design considerations for other tree hashing schemes in FPGAs. The results are also compared to current sequential designs of Skein, providing a complete analysis of the performance of Skein in an FPGA
The common data acquisition platform in the Helmholtz Association
Various centres of the German Helmholtz Association (HGF) started in 2012 to developa modular data acquisition (DAQ) platform, covering the entire range from detector readout todata transfer into parallel computing environments. This platform integrates generic hardwarecomponents like the multi-purpose HGF-Advanced Mezzanine Card or a smart scientific cameraframework, adding user value with Linux drivers and board support packages. Technically the scopecomprises the DAQ-chain from FPGA-modules to computing servers, notably frontend-electronicsinterfaces, microcontrollers and GPUs with their software plus high-performance data transmissionlinks. The core idea is a generic and component-based approach, enabling the implementationof specific experiment requirements with low effort. This so called DTS-platform will supportstandards like MTCA.4 in hard- and software to ensure compatibility with commercial components.Its capability to deploy on other crate standards or FPGA-boards with PCI express or Ethernetinterfaces remains an essential feature.Competences of the participating centres are coordinated in order to provide a solid technological basis for both research topics in the Helmholtz Programme âMatter and Technologyâ:âDetector Technology and Systemsâ and âAccelerator Research and Developmentâ. The DTSplatform aims at reducing costs and development time and will ensure access to latest technologiesfor the collaboration. Due to its flexible approach, it has the potential to be applied in other scientificprograms
UÄinkovita metoda paralelnog raÄunanja za obradu velike koliÄine podataka prikupljanih senzorom
In recent years we witness the advent of the Internet of Things and the wide deployment of sensors in many applications for collecting and aggregating data. Efficient techniques are required to analyze these massive data for supporting intelligent decisions making. Partial differential problems which involve large data are the most common in the engineering and scientific research. For simulations of large-scale three-dimensional partial differential equations, the intensive computation ability and large amounts of memory requirements for modeling are the main research problems. To address the two challenges, this paper provided an effective parallel method for partial differential equations. The proposed approach combines the overlapping domain decomposition strategy and the multi-core cluster technology to achieve parallel simulations of partial differential equations, uses the finite difference method to discretize equations and adopts the hybrid MPI/OpenMP programming model to exploit two-level parallelism on a multi-core cluster. The three-dimensional groundwater flow model with the parallel finite difference overlapping domain decomposition strategy was successfully set up and carried out by the parallel MPI/OpenMP implementation on a multi-core cluster with two nodes. The experimental results show that the proposed parallel approach can efficiently simulate partial differential problems with large amounts of data.Posljednjih godina svjedoÄimo dolasku tzv. interneta stvari i ĆĄirokoj uporabi senzora u raznim primjenama prikupljanja i objedinjavanja podataka. UÄinkovite metode su potrebne za analizu velike koliÄine podataka u svrhu podrĆĄke inteligentnom odluÄivanju. Parcijalno diferencijalni problemi koji ukljuÄuju veliku koliÄinu podataka su gotovo uobiÄajeni u inĆŸenjerstvu i znanstvenom istraĆŸivanju. Za simulaciju masovnih trodimenzionalnih parcijalnih diferencijalnih jednadĆŸbi potrebne su znaÄajne raÄunalne moguÄnosti, a potreba za velikom koliÄinom memorije za modeliranje je glavni istraĆŸivaÄki problem. Za rjeĆĄavanje oba problema ovaj rad pruĆŸa efektivnu paralelnu metodu za parcijalne diferencijalne jednadĆŸbe. PredloĆŸeni pristup kombinira strategiju dekompozicije preklapajuÄih domena i tehnologiju viĆĄejezgrenih klastera za postizanje paralelnih simulacija parcijalnih diferencijalnih jednadĆŸbi, koristi metodu konaÄnog diferenciranja za diskretizaciju jednadĆŸbi te model MPI/OpenMP hibridnog programiranja za iskoriĆĄtavanje dvorazinskog paralelizma na viĆĄejezgrenom klasteru. Formiran je trodimenzionalan model toka podzemnih voda sa strategijom dekompozicije preklapajuÄih domena konaÄnih diferencija. Za izvoÄenje je koriĆĄtena paralelna MPI/OpenMP implementacija na viĆĄejezgrenom klasteru s dva Ävora. Eksperimentalni rezultati su pokazali kako predloĆŸeni paralelni pristup moĆŸe uÄinkovito simulirati parcijalno diferencijalne probleme s velikom koliÄinom podataka
- âŠ