232 research outputs found
Dataflow Computing with Polymorphic Registers
Heterogeneous systems are becoming increasingly popular for data processing. They improve performance of simple kernels applied to large amounts of data. However, sequential data loads may have negative impact. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high speed, parallel access to performance-critical data. Furthermore, by PRF customization, specific data path features are exposed to the programmer in a very convenient way. PRFs allow additional control over the registers dimensions, and the number of elements which can be simultaneously accessed by computational units. This paper shows how PRFs can be integrated in dataflow computational platforms. In particular, starting from an annotated source code, we present a compiler-based methodology that automatically generates the customized PRFs and the enhanced computational kernels that efficiently exploit them
An Expressive Language and Efficient Execution System for Software Agents
Software agents can be used to automate many of the tedious, time-consuming
information processing tasks that humans currently have to complete manually.
However, to do so, agent plans must be capable of representing the myriad of
actions and control flows required to perform those tasks. In addition, since
these tasks can require integrating multiple sources of remote information ?
typically, a slow, I/O-bound process ? it is desirable to make execution as
efficient as possible. To address both of these needs, we present a flexible
software agent plan language and a highly parallel execution system that enable
the efficient execution of expressive agent plans. The plan language allows
complex tasks to be more easily expressed by providing a variety of operators
for flexibly processing the data as well as supporting subplans (for
modularity) and recursion (for indeterminate looping). The executor is based on
a streaming dataflow model of execution to maximize the amount of operator and
data parallelism possible at runtime. We have implemented both the language and
executor in a system called THESEUS. Our results from testing THESEUS show that
streaming dataflow execution can yield significant speedups over both
traditional serial (von Neumann) as well as non-streaming dataflow-style
execution that existing software and robot agent execution systems currently
support. In addition, we show how plans written in the language we present can
represent certain types of subtasks that cannot be accomplished using the
languages supported by network query engines. Finally, we demonstrate that the
increased expressivity of our plan language does not hamper performance;
specifically, we show how data can be integrated from multiple remote sources
just as efficiently using our architecture as is possible with a
state-of-the-art streaming-dataflow network query engine
Configurable computer systems can support dataflow computing
This work presents a practical implementation of a uni-processor system design. This design, named D2-CPU, satisfies the pure data-driven paradigm, which is a radical alternative to the conventional von Neumann paradigm and exploits the instruction-level parallelism to its full extent. The D2-CPU uses the natural flow of the program, dataflow, by minimizing redundant instructions like fetch, store, and write back. This leads to a design with the better performance, lower power consumption and efficient use of the on-chip resources. This extraordinary performance is the result of a simple, pipelined and superscalar architecture with a very wide data bus and a completely out of order execution of instructions. This creates a program counter less, distributed controlled system design with the realization of intelligent memories. Upon the availability of data, the instructions advance further in the memory hierarchy and ultimately to the execution units by themselves, instead of having the CPU fetch the required instructions from the memory as in controlled flow processors. This application (data) oriented execution process is in contrast to application ignorant CPUs in conventional machines. The D2-CPU solves current architectural challenges and puts into practice a pure data-driven microprocessor. This work employs an FPGA implementation of the D2-CPU to prove the practicability of the data-driven computer paradigm using configurable logic. A relative analysis at the end confirms its superiority in performance, resource utilization and ease of programming over conventional CPUs
The Case for Polymorphic Registers in Dataflow Computing
Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050 GPU. We improve the throughput up to 56.17X and show that the PRF-augmented system outperforms the GPU for 9×9
or larger mask sizes, even in bandwidth-constrained systems
Implementation of a dna compression algorithm using dataflow computing
The amount of DNA sequences databases has increased a lot in the last years, the amount of space required to store the sequences is increasing more than the space available to store them, that means a higher cost to
store DNA sequences and also the read sequences which are fragments of the whole sequence. This situation has led to the use of compression algorithms for storing DNA files. The main objective of the project is to increase the efficiency of the compression of DNA sequences because the process requires a lot of
compute. An FPGA with dataflow architecture has been used to develop the project with the aim of exploiting the available parallelism in the algorithm chosen. The compression method has been developed to process sequence reads with a fixed amount of mutations per read and the test has been developed for 4, 8, 12 and 16 mutations per reads using an architecture that allows up to 160 reads to be processed in only one thick. Experimental results showed that even with a low amount of processing units, the performance increases a lot using the DFE architecture, the only disadvantage is the store/reading time. Palabras claves : Compression, Dataflow Engine (DFE), FPGA, CPU, DNA, Maxele
A SURVEY ON TRADITIONAL PLATFORMS AND NEW TRENDS IN PARALLEL COMPUTATION
The information processing is in continuous progress. High Performance Computing is now a trend. The Parallel Computing is a synonymous phrase for High Performance Computing. Parallel Computing is the main field of information processing in our age. Both, the hardware systems and the software platforms are developing very fast to support the simple, rational and easy parallel data processing and programming. This paper shows an overview of issues and improvements in parallel processing. This paper deals with likewise the qualities and the favorable circumstances of distinctive stages for parallelism. Herewith are dealt with different architectures, innovations for parallelism and comparisons of results of parallel processing. The main question is to introduce the Dataflow model and some solid illustrations of Dataflow arrangement. The paper compares the traditional control flow parallel platform in contrasts to the data flow innovation
A SURVEY ON TRADITIONAL PLATFORMS AND NEW TRENDS IN PARALLEL COMPUTATION
The information processing is in continuous progress. High Performance Computing is now a trend. The Parallel Computing is a synonymous phrase for High Performance Computing. Parallel Computing is the main field of information processing in our age. Both, the hardware systems and the software platforms are developing very fast to support the simple, rational and easy parallel data processing and programming. This paper shows an overview of issues and improvements in parallel processing. This paper deals with likewise the qualities and the favorable circumstances of distinctive stages for parallelism. Herewith are dealt with different architectures, innovations for parallelism and comparisons of results of parallel processing. The main question is to introduce the Dataflow model and some solid illustrations of Dataflow arrangement. The paper compares the traditional control flow parallel platform in contrasts to the data flow innovation
Acceleration of Image Segmentation Algorithm for (Breast) Mammogram Images Using High-Performance Reconfigurable Dataflow Computers
Image segmentation is one of the most common procedures in medical imaging applications. It is also a very important task in breast cancer detection. Breast cancer detection procedure based on mammography can be divided into several stages. The first stage is the extraction of the region of interest from a breast image, followed by the identification of suspicious mass regions, their classification, and comparison with the existing image database. It is often the case that already existing image databases have large sets of data whose processing requires a lot of time, and thus the acceleration of each of the processing stages in breast cancer detection is a very important issue. In this paper, the implementation of the already existing algorithm for region-of-interest based image segmentation for mammogram images on High-Performance Reconfigurable Dataflow Computers (HPRDCs) is proposed. As a dataflow engine (DFE) of such HPRDC, Maxeler's acceleration card is used. The experiments for examining the acceleration of that algorithm on the Reconfigurable Dataflow Computers (RDCs) are performed with two types of mammogram images with different resolutions. There were, also, several DFE configurations and each of them gave a different acceleration value of algorithm execution. Those acceleration values are presented and experimental results showed good acceleration
- …