4 research outputs found

    Efficient hypercube communications

    Get PDF
    Hypercube algorithms may be developed for a variety of communication-intensive tasks such as sending a message from one node to another, broadcasting a message from one node to all others, broadcasting a message from each node to all others, all-to-all personalized communication, one-to-all personalized communication, and exchanging messages between nodes via fixed permutations. All these communication patterns are special cases of many-to-many personalized communication. The problem of many-to-many personalized communication is investigated here. Two routing algorithms for many-to-many personalized communication are presented here. The algorithms proposed yield very high performance with respect to the number of time steps and packet transmissions. The first algorithm yields high performance through attempts to equibalance the number of messages at intermediate nodes. This technique tries to avoid creating a bottleneck at any node and thus reduces the total communication time. The second algorithm yields high performance through one-step time-lookahead equibalancing. It chooses from the candidate intermediate nodes the one which will probably have the minimum number of messages in the next cycle

    Performance analysis of pyramid mapping algorithms for the hypercube

    Get PDF
    Comparative performance analysis of algorithms that map pyramids and multilevel structures onto the hypercube are presented. The pyramid structure is appropriate for low-level and intermediate-level computer vision algorithms. It is not only efficient for the support of both local and global operations but also capable of supporting the implementation of multilevel solvers. Nevertheless, pyramids lack the capability of efficient implementation of the majority of scientific algorithms and their cost may become unacceptably high. On a different horizon, hypercube machines have widely been used in the field of parallel computing due to their small diameter, high degree of fault tolerance, and rich interconnection that permits fast communication at a reasonable cost. As a result, hypercube machines can efficiently emulate pyramids. Therefore, the characteristics which make hypercube machines useful scientific processors also make them efficient image processors. Two algorithms which have been developed for the efficient mapping of the pyramid onto the hypercube are discussed in this thesis. The algorithm proposed by Stout [4] requires a hypercube with a number of processing elements (PEs) which is equal to the number of nodes in the base of the pyramid. This algorithm can activate only one level of the pyramid at a time. In contrast, the algorithm proposed by Patel and Ziavras [7] requires the same number of PEs as Stout\u27s algorithm but allows the concurren simulation of multiple levels, as long as the base level is not involved in the set of pyramid levels that need to be simulated at the same time. This low-cost algorithm yields higher performance through high utilization of PEs. However it performs slightly worse than Stout\u27s algorithm when only one level is active at a time. Patel and Ziavras\u27 algorithm performs much better than Stout\u27s algorithm when all levels, excluding the leaf level, are active concurrently. The comparative analysis of these two algorithms is based on the incorporation of simulation results for some image processing algorithms which are perimeter counting, image convolution, and segmentation

    Fast dynamically reconfigurable architectures for 1-D and 2-D recursive digital filters

    Get PDF
    In this paper, we consider the array processors implementation of the infinite impulse response (11R)1-D and 2-D digital filters that require recursive computations . We use the state space representation to obtain, in a straight forward manner, efficient implementation via dynamically switchable systolic arrays (cylindrical type) of 1 -D direct realisation . This direct description leads to reduce the computation speed and the throughput rate . In order to improve, in a general way, the throughput rate performance of recursive filtering arrays, the solution proposed, in this paper, is based on the CTP decomposition technique of Porter which transforms the matrix-column product on a triple matrix product . It is shown in this work that this technique allows a realisation of IIR filters via dynamically reconfigurable cylindrical architectures that are much faster. However, this throughput improvement is obtained in the cost of a hardware complexity . The use of a sparse matrix of the tridiagonal type with the CTP decomposition permits a significant improvement of the hardware complexity of recursive filter arrays .L'objectif de ce travail consiste à développer des architectures systoliques, aussi performantes que possible, pour des filtres numériques RII 1-D et 2-D nécessitant des calculs récursifs. La mise en oeuvre directe des filtres RII sur les réseaux systoliques (type cylindrique) dynamiquement commutables est obtenue en les décrivant par des opérations matricielles dans l'espace d'état. Cependant, cette réalisation systolique engendre une latence proportionnelle à l'ordre du filtre. Pour améliorer d'une manière générale les performances en débit de données des réseaux de filtrage récursif, la solution proposée dans cet article repose sur la décomposition CTP de Porter qui transforme le produit d'une matrice par une colonne en un produit de trois matrices. Nous montrons que cette décomposition permet de réaliser des filtres RII par des structures cylindriques dynamiquement reconfigurables plus rapides. Néanmoins, le gain en débit de données est obtenu au détriment de la complexité de mise en œuvre. La version améliorée de la technique de décomposition CTP est appliquée aux filtres RII 1-D représentés par des matrices creuses du type fridiagonale dans l'espace d'état. Ce dernier algorithme permet une amélioration significative de la complexité matérielle
    corecore