218,643 research outputs found
Process-Oriented Parallel Programming with an Application to Data-Intensive Computing
We introduce process-oriented programming as a natural extension of
object-oriented programming for parallel computing. It is based on the
observation that every class of an object-oriented language can be instantiated
as a process, accessible via a remote pointer. The introduction of process
pointers requires no syntax extension, identifies processes with programming
objects, and enables processes to exchange information simply by executing
remote methods. Process-oriented programming is a high-level language
alternative to multithreading, MPI and many other languages, environments and
tools currently used for parallel computations. It implements natural
object-based parallelism using only minimal syntax extension of existing
languages, such as C++ and Python, and has therefore the potential to lead to
widespread adoption of parallel programming. We implemented a prototype system
for running processes using C++ with MPI and used it to compute a large
three-dimensional Fourier transform on a computer cluster built of commodity
hardware components. Three-dimensional Fourier transform is a prototype of a
data-intensive application with a complex data-access pattern. The
process-oriented code is only a few hundred lines long, and attains very high
data throughput by achieving massive parallelism and maximizing hardware
utilization.Comment: 20 pages, 1 figur
Array operators using multiple dispatch: a design methodology for array implementations in dynamic languages
Arrays are such a rich and fundamental data type that they tend to be built
into a language, either in the compiler or in a large low-level library.
Defining this functionality at the user level instead provides greater
flexibility for application domains not envisioned by the language designer.
Only a few languages, such as C++ and Haskell, provide the necessary power to
define -dimensional arrays, but these systems rely on compile-time
abstraction, sacrificing some flexibility. In contrast, dynamic languages make
it straightforward for the user to define any behavior they might want, but at
the possible expense of performance.
As part of the Julia language project, we have developed an approach that
yields a novel trade-off between flexibility and compile-time analysis. The
core abstraction we use is multiple dispatch. We have come to believe that
while multiple dispatch has not been especially popular in most kinds of
programming, technical computing is its killer application. By expressing key
functions such as array indexing using multi-method signatures, a surprising
range of behaviors can be obtained, in a way that is both relatively easy to
write and amenable to compiler analysis. The compact factoring of concerns
provided by these methods makes it easier for user-defined types to behave
consistently with types in the standard library.Comment: 6 pages, 2 figures, workshop paper for the ARRAY '14 workshop, June
11, 2014, Edinburgh, United Kingdo
Handling Parallelism in a Concurrency Model
Programming models for concurrency are optimized for dealing with
nondeterminism, for example to handle asynchronously arriving events. To shield
the developer from data race errors effectively, such models may prevent shared
access to data altogether. However, this restriction also makes them unsuitable
for applications that require data parallelism. We present a library-based
approach for permitting parallel access to arrays while preserving the safety
guarantees of the original model. When applied to SCOOP, an object-oriented
concurrency model, the approach exhibits a negligible performance overhead
compared to ordinary threaded implementations of two parallel benchmark
programs.Comment: MUSEPAT 201
CREOLE: a Universal Language for Creating, Requesting, Updating and Deleting Resources
In the context of Service-Oriented Computing, applications can be developed
following the REST (Representation State Transfer) architectural style. This
style corresponds to a resource-oriented model, where resources are manipulated
via CRUD (Create, Request, Update, Delete) interfaces. The diversity of CRUD
languages due to the absence of a standard leads to composition problems
related to adaptation, integration and coordination of services. To overcome
these problems, we propose a pivot architecture built around a universal
language to manipulate resources, called CREOLE, a CRUD Language for Resource
Edition. In this architecture, scripts written in existing CRUD languages, like
SQL, are compiled into Creole and then executed over different CRUD interfaces.
After stating the requirements for a universal language for manipulating
resources, we formally describe the language and informally motivate its
definition with respect to the requirements. We then concretely show how the
architecture solves adaptation, integration and coordination problems in the
case of photo management in Flickr and Picasa, two well-known service-oriented
applications. Finally, we propose a roadmap for future work.Comment: In Proceedings FOCLASA 2010, arXiv:1007.499
A service oriented architecture for engineering design
Decision making in engineering design can be effectively addressed by using genetic algorithms to solve multi-objective problems. These multi-objective genetic algorithms
(MOGAs) are well suited to implementation in a Service Oriented Architecture. Often the evaluation process of the MOGA is compute-intensive due to the use of a complex computer model to represent the real-world system. The emerging paradigm of Grid Computing offers
a potential solution to the compute-intensive nature of this objective function evaluation, by
allowing access to large amounts of compute resources in a distributed manner. This paper presents a grid-enabled framework for multi-objective optimisation using genetic algorithms (MOGA-G) to aid decision making in engineering design
Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times
- …
