3,269 research outputs found
Parallel Performance of MPI Sorting Algorithms on Dual-Core Processor Windows-Based Systems
Message Passing Interface (MPI) is widely used to implement parallel
programs. Although Windowsbased architectures provide the facilities of
parallel execution and multi-threading, little attention has been focused on
using MPI on these platforms. In this paper we use the dual core Window-based
platform to study the effect of parallel processes number and also the number
of cores on the performance of three MPI parallel implementations for some
sorting algorithms
High performance deep packet inspection on multi-core platform
Deep packet inspection (DPI) provides the ability to perform quality of service (QoS) and Intrusion Detection on network packets. But since the explosive growth of Internet, performance and scalability issues have been raised due to the gap between network and end-system speeds. This article describles how a desirable DPI system with multi-gigabits throughput and good scalability should be like by exploiting parallelism on network interface card, network stack and user applications. Connection-based parallelism, affinity-based scheduling and lock-free data structure are the main technologies introduced to alleviate the performance and scalability issues. A common DPI application L7-Filter is used as an example to illustrate the applicaiton level parallelism
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The
workloads are expressed using the semantics of an Exascale computing execution
model called ParalleX. For comparison, results using conventional execution
model semantics are also presented. We find improved load balancing during
runtime and automatic parallelism discovery improving efficiency using the
advanced semantics for Exascale computing.Comment: 11 figure
Parallelizing RRT on distributed-memory architectures
This paper addresses the problem of improving the performance of the Rapidly-exploring Random Tree (RRT) algorithm by parallelizing it. For scalability reasons we do so on a distributed-memory architecture, using the message-passing paradigm. We present three parallel versions of RRT along with the technicalities involved in their implementation. We also evaluate the algorithms and study how they behave on different motion planning problems
The Potential of the Intel Xeon Phi for Supervised Deep Learning
Supervised learning of Convolutional Neural Networks (CNNs), also known as
supervised Deep Learning, is a computationally demanding process. To find the
most suitable parameters of a network for a given application, numerous
training sessions are required. Therefore, reducing the training time per
session is essential to fully utilize CNNs in practice. While numerous research
groups have addressed the training of CNNs using GPUs, so far not much
attention has been paid to the Intel Xeon Phi coprocessor. In this paper we
investigate empirically and theoretically the potential of the Intel Xeon Phi
for supervised learning of CNNs. We design and implement a parallelization
scheme named CHAOS that exploits both the thread- and SIMD-parallelism of the
coprocessor. Our approach is evaluated on the Intel Xeon Phi 7120P using the
MNIST dataset of handwritten digits for various thread counts and CNN
architectures. Results show a 103.5x speed up when training our large network
for 15 epochs using 244 threads, compared to one thread on the coprocessor.
Moreover, we develop a performance model and use it to assess our
implementation and answer what-if questions.Comment: The 17th IEEE International Conference on High Performance Computing
and Communications (HPCC 2015), Aug. 24 - 26, 2015, New York, US
Parallel symbolic state-space exploration is difficult, but what is the alternative?
State-space exploration is an essential step in many modeling and analysis
problems. Its goal is to find the states reachable from the initial state of a
discrete-state model described. The state space can used to answer important
questions, e.g., "Is there a dead state?" and "Can N become negative?", or as a
starting point for sophisticated investigations expressed in temporal logic.
Unfortunately, the state space is often so large that ordinary explicit data
structures and sequential algorithms cannot cope, prompting the exploration of
(1) parallel approaches using multiple processors, from simple workstation
networks to shared-memory supercomputers, to satisfy large memory and runtime
requirements and (2) symbolic approaches using decision diagrams to encode the
large structured sets and relations manipulated during state-space generation.
Both approaches have merits and limitations. Parallel explicit state-space
generation is challenging, but almost linear speedup can be achieved; however,
the analysis is ultimately limited by the memory and processors available.
Symbolic methods are a heuristic that can efficiently encode many, but not all,
functions over a structured and exponentially large domain; here the pitfalls
are subtler: their performance varies widely depending on the class of decision
diagram chosen, the state variable order, and obscure algorithmic parameters.
As symbolic approaches are often much more efficient than explicit ones for
many practical models, we argue for the need to parallelize symbolic
state-space generation algorithms, so that we can realize the advantage of both
approaches. This is a challenging endeavor, as the most efficient symbolic
algorithm, Saturation, is inherently sequential. We conclude by discussing
challenges, efforts, and promising directions toward this goal
- …