19,010 research outputs found
Exploring Task Mappings on Heterogeneous MPSoCs using a Bias-Elitist Genetic Algorithm
Exploration of task mappings plays a crucial role in achieving high
performance in heterogeneous multi-processor system-on-chip (MPSoC) platforms.
The problem of optimally mapping a set of tasks onto a set of given
heterogeneous processors for maximal throughput has been known, in general, to
be NP-complete. The problem is further exacerbated when multiple applications
(i.e., bigger task sets) and the communication between tasks are also
considered. Previous research has shown that Genetic Algorithms (GA) typically
are a good choice to solve this problem when the solution space is relatively
small. However, when the size of the problem space increases, classic genetic
algorithms still suffer from the problem of long evolution times. To address
this problem, this paper proposes a novel bias-elitist genetic algorithm that
is guided by domain-specific heuristics to speed up the evolution process.
Experimental results reveal that our proposed algorithm is able to handle large
scale task mapping problems and produces high-quality mapping solutions in only
a short time period.Comment: 9 pages, 11 figures, uses algorithm2e.st
An efficient hardware architecture for a neural network activation function generator
This paper proposes an efficient hardware architecture for a function generator suitable for an artificial neural network (ANN). A spline-based approximation function is designed that provides a good trade-off between accuracy and silicon area, whilst also being inherently scalable and adaptable for numerous activation functions. This has been achieved by using a minimax polynomial and through optimal placement of the approximating polynomials based on the results of a genetic algorithm. The approximation error of the proposed method compares favourably to all related research in this field. Efficient hardware multiplication circuitry is used in the implementation, which reduces the area overhead and increases the throughput
High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP
This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBMâs Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs
A Survey of Parallel Data Mining
With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons
learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms
Optimisation and parallelism in synchronous digital circuit simulators
Digital circuit simulation often requires a large amount of computation, resulting in long run times. We consider several techniques for optimising a brute force synchronous
circuit simulator: an algorithm using an event queue that avoids recalculating quiescent parts of the circuit, a marking algorithm that is similar to the event queue but that avoids a central data structure, and a lazy algorithm that avoids calculating signals whose values are not needed. Two target architectures for the simulator are used: a sequential CPU, and a parallel GPGPU. The interactions between the different optimisations are discussed, and the performance is measured while the algorithms are simulating a simple but realistic scalable circuit
- âŠ