Search CORE

776 research outputs found

An SMP Soft Classification Algorithm for Remote Sensing

Author: Easterling David R.
Phillips Rhonda D.
Watson Layne T.
Wynne Randolph H.
Publication venue
Publication date: 01/01/2012
Field of study

This work introduces a symmetric multiprocessing (SMP) version of the continuous iterative guided spectral class rejection (CIGSCR) algorithm, a semiautomated classiﬁcation algorithm for remote sensing (multispectral) images. The algorithm uses soft data clusters to produce a soft classiﬁcation containing inherently more information than a comparable hard classiﬁcation at an increased computational cost. Previous work suggests that similar algorithms achieve good parallel scalability, motivating the parallel algorithm development work here. Experimental results of applying parallel CIGSCR to an image with approximately 10^8 pixels and six bands demonstrate superlinear speedup. A soft two class classiﬁcation is generated in just over four minutes using 32 processors

Computer Science Technical Reports @Virginia Tech

Garbage collection auto-tuning for Java MapReduce on Multi-Cores

Author: Allen E.
Chu C.
Dean J.
DeWitt D.
Gavin Brown
George Kovoor
Jeremy Singer
Kovoor G.
M
Mikel Luján
Printezis T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

MapReduce has been widely accepted as a simple programming pattern that can form the basis for efficient, large-scale, distributed data processing. The success of the MapReduce pattern has led to a variety of implementations for different computational scenarios. In this paper we present MRJ, a MapReduce Java framework for multi-core architectures. We evaluate its scalability on a four-core, hyperthreaded Intel Core i7 processor, using a set of standard MapReduce benchmarks. We investigate the significant impact that Java runtime garbage collection has on the performance and scalability of MRJ. We propose the use of memory management auto-tuning techniques based on machine learning. With our auto-tuning approach, we are able to achieve MRJ performance within 10% of optimal on 75% of our benchmark tests

Crossref

Enlighten

Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

Author: Albutiu Martina-Cezara
Kemper Alfons
Neumann Thomas
Publication venue
Publication date: 01/01/2012
Field of study

Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel multi-core data processing as it was deemed inferior to hash joins. We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting. Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a hard to parallelize final merge step to create one complete sort order. Rather they work on the independently created runs in parallel. This way our MPSM algorithms are NUMA-affine as all the sorting is carried out on local memory partitions. An extensive experimental evaluation on a modern 32-core machine with one TB of main memory proves the competitive performance of MPSM on large main memory databases with billions of objects. It scales (almost) linearly in the number of employed cores and clearly outperforms competing hash join proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel query engine by a factor of four.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

Author: A Arnold
A Faradjian
B Hess
C Schütte
G Wilson
JA Anderson
JC Phillips
KJ Bowers
KJ Bowers
L Verlet
M Eleftheriou
M Shirts
MJ Abraham
P Eastman
R Yokota
S Pronk
S Páll
U Essmann
W Humphrey
WM Brown
Y Andoh
Y Sugita
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

arXiv.org e-Print Archive

Publikationer från KTH

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

MPG.PuRe

Performance Portability Through Semi-explicit Placement in Distributed Erlang

Author: Chechina Natalia
MacKenzie Kenneth
Trinder Phil
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/08/2015
Field of study

We consider the problem of adapting distributed Erlang applications to large or heterogeneous architectures to achieve good performance in a portable way. In many architectures, and especially large architectures, the communication latency between pairs of virtual machines (nodes) is no longer uniform. We propose two language-level methods that enable programs to automatically adapt to heterogeneity and non-uniform communication latencies, and both provide information enabling a program to identify an appropriate node when spawning a process. We provide a means of recording node attributes describing the hardware and software capabilities of nodes, and mechanisms that allow an application to examine the attributes of remote nodes. We provide an abstraction of communication distances that enables an application to select nodes to facilitate efficient communication. We have developed open source libraries that implement these ideas. We show that the use of attributes for node selection can lead to significant performance improvements if different components of the application have different processing requirements. We report a detailed empirical investigation of non-uniform communication times in several representative architectures, and show that our abstract model provides a good description of the hierarchy of communication times

Enlighten

Efficient Clustering-based Plagiarism Detection using IPPDC

Author: Ohmann Anthony
Publication venue: DigitalCommons@CSB/SJU
Publication date: 01/01/2013
Field of study

The volume of source code available on the Internet is astronomical. When seeking to detect cases of plagiarism, one must maintain a large database of known documents. This can lead to unacceptably slow runtimes for systems designed to detect cases of source code plagiarism. We seek to use partitional and density-based clustering as well as intelligent parallelism to improve VOCS, a plagiarism detection system. In addition, we will attempt to increase the system’s usability and usefulness by expanding its programming language support and building an intuitive web interface. Finally, we propose utilizing Program Dependence Graphs to construct a hybrid approach in order to more accurately and precisely detect well-disguised plagiarism

College of Saint Benedict and Saint John’s University: DigitalCommons@CSB/SJU

Extending and Implementing the Self-adaptive Virtual Processor for Distributed Memory Architectures

Author: Koivisto Juha
van Tol Michiel W.
Publication venue
Publication date: 01/01/2011
Field of study

Many-core architectures of the future are likely to have distributed memory organizations and need fine grained concurrency management to be used effectively. The Self-adaptive Virtual Processor (SVP) is an abstract concurrent programming model which can provide this, but the model and its current implementations assume a single address space shared memory. We investigate and extend SVP to handle distributed environments, and discuss a prototype SVP implementation which transparently supports execution on heterogeneous distributed memory clusters over TCP/IP connections, while retaining the original SVP programming model

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications