20,365 research outputs found
The Cost of Address Translation
Modern computers are not random access machines (RAMs). They have a memory
hierarchy, multiple cores, and virtual memory. In this paper, we address the
computational cost of address translation in virtual memory. Starting point for
our work is the observation that the analysis of some simple algorithms (random
scan of an array, binary search, heapsort) in either the RAM model or the EM
model (external memory model) does not correctly predict growth rates of actual
running times. We propose the VAT model (virtual address translation) to
account for the cost of address translations and analyze the algorithms
mentioned above and others in the model. The predictions agree with the
measurements. We also analyze the VAT-cost of cache-oblivious algorithms.Comment: A extended abstract of this paper was published in the proceedings of
ALENEX13, New Orleans, US
Parallel Sort-Based Matching for Data Distribution Management on Shared-Memory Multiprocessors
In this paper we consider the problem of identifying intersections between
two sets of d-dimensional axis-parallel rectangles. This is a common problem
that arises in many agent-based simulation studies, and is of central
importance in the context of High Level Architecture (HLA), where it is at the
core of the Data Distribution Management (DDM) service. Several realizations of
the DDM service have been proposed; however, many of them are either
inefficient or inherently sequential. These are serious limitations since
multicore processors are now ubiquitous, and DDM algorithms -- being
CPU-intensive -- could benefit from additional computing power. We propose a
parallel version of the Sort-Based Matching algorithm for shared-memory
multiprocessors. Sort-Based Matching is one of the most efficient serial
algorithms for the DDM problem, but is quite difficult to parallelize due to
data dependencies. We describe the algorithm and compute its asymptotic running
time; we complete the analysis by assessing its performance and scalability
through extensive experiments on two commodity multicore systems based on a
dual socket Intel Xeon processor, and a single socket Intel Core i7 processor.Comment: Proceedings of the 21-th ACM/IEEE International Symposium on
Distributed Simulation and Real Time Applications (DS-RT 2017). Best Paper
Award @DS-RT 201
Parallel Performance of MPI Sorting Algorithms on Dual-Core Processor Windows-Based Systems
Message Passing Interface (MPI) is widely used to implement parallel
programs. Although Windowsbased architectures provide the facilities of
parallel execution and multi-threading, little attention has been focused on
using MPI on these platforms. In this paper we use the dual core Window-based
platform to study the effect of parallel processes number and also the number
of cores on the performance of three MPI parallel implementations for some
sorting algorithms
Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures
In this paper, we perform an empirical evaluation of the Parallel External
Memory (PEM) model in the context of geometric problems. In particular, we
implement the parallel distribution sweeping framework of Ajwani, Sitchinava
and Zeh to solve batched 1-dimensional stabbing max problem. While modern
processors consist of sophisticated memory systems (multiple levels of caches,
set associativity, TLB, prefetching), we empirically show that algorithms
designed in simple models, that focus on minimizing the I/O transfers between
shared memory and single level cache, can lead to efficient software on current
multicore architectures. Our implementation exhibits significantly fewer
accesses to slow DRAM and, therefore, outperforms traditional approaches based
on plane sweep and two-way divide and conquer.Comment: Longer version of ESA'13 pape
An Elegant Algorithm for the Construction of Suffix Arrays
The suffix array is a data structure that finds numerous applications in
string processing problems for both linguistic texts and biological data. It
has been introduced as a memory efficient alternative for suffix trees. The
suffix array consists of the sorted suffixes of a string. There are several
linear time suffix array construction algorithms (SACAs) known in the
literature. However, one of the fastest algorithms in practice has a worst case
run time of . The problem of designing practically and theoretically
efficient techniques remains open. In this paper we present an elegant
algorithm for suffix array construction which takes linear time with high
probability; the probability is on the space of all possible inputs. Our
algorithm is one of the simplest of the known SACAs and it opens up a new
dimension of suffix array construction that has not been explored until now.
Our algorithm is easily parallelizable. We offer parallel implementations on
various parallel models of computing. We prove a lemma on the -mers of a
random string which might find independent applications. We also present
another algorithm that utilizes the above algorithm. This algorithm is called
RadixSA and has a worst case run time of . RadixSA introduces an
idea that may find independent applications as a speedup technique for other
SACAs. An empirical comparison of RadixSA with other algorithms on various
datasets reveals that our algorithm is one of the fastest algorithms to date.
The C++ source code is freely available at
http://www.engr.uconn.edu/~man09004/radixSA.zi
DynamO: A free O(N) general event-driven molecular-dynamics simulator
Molecular-dynamics algorithms for systems of particles interacting through
discrete or "hard" potentials are fundamentally different to the methods for
continuous or "soft" potential systems. Although many software packages have
been developed for continuous potential systems, software for discrete
potential systems based on event-driven algorithms are relatively scarce and
specialized. We present DynamO, a general event-driven simulation package which
displays the optimal O(N) asymptotic scaling of the computational cost with the
number of particles N, rather than the O(N log(N)) scaling found in most
standard algorithms. DynamO provides reference implementations of the best
available event-driven algorithms. These techniques allow the rapid simulation
of both complex and large (>10^6 particles) systems for long times. The
performance of the program is benchmarked for elastic hard sphere systems,
homogeneous cooling and sheared inelastic hard spheres, and equilibrium
Lennard-Jones fluids. This software and its documentation are distributed under
the GNU General Public license and can be freely downloaded from
http://marcusbannerman.co.uk/dynamo
FORM version 4.0
We present version 4.0 of the symbolic manipulation system FORM. The most
important new features are manipulation of rational polynomials and the
factorization of expressions. Many other new functions and commands are also
added; some of them are very general, while others are designed for building
specific high level packages, such as one for Groebner bases. New is also the
checkpoint facility, that allows for periodic backups during long calculations.
Lastly, FORM 4.0 has become available as open source under the GNU General
Public License version 3.Comment: 26 pages. Uses axodra
Understanding Evolutionary Potential in Virtual CPU Instruction Set Architectures
We investigate fundamental decisions in the design of instruction set
architectures for linear genetic programs that are used as both model systems
in evolutionary biology and underlying solution representations in evolutionary
computation. We subjected digital organisms with each tested architecture to
seven different computational environments designed to present a range of
evolutionary challenges. Our goal was to engineer a general purpose
architecture that would be effective under a broad range of evolutionary
conditions. We evaluated six different types of architectural features for the
virtual CPUs: (1) genetic flexibility: we allowed digital organisms to more
precisely modify the function of genetic instructions, (2) memory: we provided
an increased number of registers in the virtual CPUs, (3) decoupled sensors and
actuators: we separated input and output operations to enable greater control
over data flow. We also tested a variety of methods to regulate expression: (4)
explicit labels that allow programs to dynamically refer to specific genome
positions, (5) position-relative search instructions, and (6) multiple new flow
control instructions, including conditionals and jumps. Each of these features
also adds complication to the instruction set and risks slowing evolution due
to epistatic interactions. Two features (multiple argument specification and
separated I/O) demonstrated substantial improvements int the majority of test
environments. Some of the remaining tested modifications were detrimental,
thought most exhibit no systematic effects on evolutionary potential,
highlighting the robustness of digital evolution. Combined, these observations
enhance our understanding of how instruction architecture impacts evolutionary
potential, enabling the creation of architectures that support more rapid
evolution of complex solutions to a broad range of challenges
- …