776 research outputs found
An SMP Soft Classification Algorithm for Remote Sensing
This work introduces a symmetric multiprocessing (SMP) version of the continuous iterative
guided spectral class rejection (CIGSCR) algorithm, a semiautomated classification algorithm for remote
sensing (multispectral) images. The algorithm uses soft data clusters to produce a soft classification
containing inherently more information than a comparable hard classification at an increased computational
cost. Previous work suggests that similar algorithms achieve good parallel scalability, motivating the parallel
algorithm development work here. Experimental results of applying parallel CIGSCR to an image with
approximately 10^8 pixels and six bands demonstrate superlinear speedup. A soft two class classification is
generated in just over four minutes using 32 processors
Garbage collection auto-tuning for Java MapReduce on Multi-Cores
MapReduce has been widely accepted as a simple programming pattern that can form the basis for efficient, large-scale, distributed data processing. The success of the MapReduce pattern has led to a variety of implementations for different computational scenarios. In this paper we present MRJ, a MapReduce Java framework for multi-core architectures. We evaluate its scalability on a four-core, hyperthreaded Intel Core i7 processor, using a set of standard MapReduce benchmarks. We investigate the significant impact that Java runtime garbage collection has on the performance and scalability of MRJ. We propose the use of memory management auto-tuning techniques based on machine learning. With our auto-tuning approach, we are able to achieve MRJ performance within 10% of optimal on 75% of our benchmark tests
Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems
Two emerging hardware trends will dominate the database system technology in
the near future: increasing main memory capacities of several TB per server and
massively parallel multi-core processing. Many algorithmic and control
techniques in current database technology were devised for disk-based systems
where I/O dominated the performance. In this work we take a new look at the
well-known sort-merge join which, so far, has not been in the focus of research
in scalable massively parallel multi-core data processing as it was deemed
inferior to hash joins. We devise a suite of new massively parallel sort-merge
(MPSM) join algorithms that are based on partial partition-based sorting.
Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a
hard to parallelize final merge step to create one complete sort order. Rather
they work on the independently created runs in parallel. This way our MPSM
algorithms are NUMA-affine as all the sorting is carried out on local memory
partitions. An extensive experimental evaluation on a modern 32-core machine
with one TB of main memory proves the competitive performance of MPSM on large
main memory databases with billions of objects. It scales (almost) linearly in
the number of employed cores and clearly outperforms competing hash join
proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel
query engine by a factor of four.Comment: VLDB201
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
Performance Portability Through Semi-explicit Placement in Distributed Erlang
We consider the problem of adapting distributed Erlang applications to large or heterogeneous architectures to achieve good performance in a portable way. In many architectures, and especially large architectures, the communication latency between pairs of virtual machines (nodes) is no longer uniform.
We propose two language-level methods that enable programs to automatically adapt to heterogeneity and non-uniform communication latencies, and both provide information enabling a program to identify an appropriate node when spawning a process. We provide a means of recording node attributes describing the hardware and software capabilities of nodes, and mechanisms that allow an application to examine the attributes of remote nodes. We provide an abstraction of communication distances that enables an application to select nodes to facilitate efficient communication.
We have developed open source libraries that implement these ideas. We show that the use of attributes for node selection can lead to significant performance improvements if different components of the application have different processing requirements. We report a detailed empirical investigation of non-uniform communication times in several representative architectures, and show that our abstract model provides a good description of the hierarchy of communication times
Efficient Clustering-based Plagiarism Detection using IPPDC
The volume of source code available on the Internet is astronomical. When seeking to detect cases of plagiarism, one must maintain a large database of known documents. This can lead to unacceptably slow runtimes for systems designed to detect cases of source code plagiarism. We seek to use partitional and density-based clustering as well as intelligent parallelism to improve VOCS, a plagiarism detection system. In addition, we will attempt to increase the system’s usability and usefulness by expanding its programming language support and building an intuitive web interface. Finally, we propose utilizing Program Dependence Graphs to construct a hybrid approach in order to more accurately and precisely detect well-disguised plagiarism
Extending and Implementing the Self-adaptive Virtual Processor for Distributed Memory Architectures
Many-core architectures of the future are likely to have distributed memory
organizations and need fine grained concurrency management to be used
effectively. The Self-adaptive Virtual Processor (SVP) is an abstract
concurrent programming model which can provide this, but the model and its
current implementations assume a single address space shared memory. We
investigate and extend SVP to handle distributed environments, and discuss a
prototype SVP implementation which transparently supports execution on
heterogeneous distributed memory clusters over TCP/IP connections, while
retaining the original SVP programming model
- …